Methods
Data collection and filtering
Introduction to data
Query key words
We used Wikipedia’s wikidata database to obtain scientific names, English and Indonesian common names, as well as alternative spellings of common names, for the four species we collected data on. We then conducted a pilot data collection phase, after which we semi-automatically identified additional keywords in the text content of small ads, video metadata and comments. We used NLTK (Bird et al. 2012) to extract n-grams (bi- and trigrams) and list their counts, then carried out manual internet searches to evaluate the usefulness of the 20 most common n-grams. The resulting list of key words (Table 1) selects a set of ads, videos and comments for download that pertains to the species