Small ads (OLX.com.id)
We used olxsearch (Fink, 2020) to download data and metadata on small ads published on the Indonesian online market place OLX.com.id. The software package downloads the text and image content, as well as date, location, duration of publication and the advertised retail price by scraping the web page of the online market place.
Olxsearch was specifically developed for this study, but has been published separately to allow the use by others.
Because of the nature of web scraping, we could only retrieve advertisements that were online during the data collection period, that lasted from 1 February to 15 May 2020.
Video metadata and comments (YouTube)
We used metatube (Fink, 2020) to download video metadata and comments from YouTube. The software package saves the downloaded data in a relational database that allows to relate comments to videos and vice versa. In a separate step we downloaded all video thumbnails.
Metatube was specifically developed for this study, but has been published separately to allow the use by others.
Birdwatchers’ observations (eBird)
adsf
Species ranges (IUCN red list)
adsf
Extracting geographic locations from text content
We observed that many of the video titles, descriptions, and comments contained place names, thus allowing indirect georeferencing using techniques from Geographic Information Retrieval \citep{Jones_2008}. Our method comprised of two steps: first, we identified and extracted place names using Named Entity Recognition (NER) tools. NER is a natural language processing method to extract names, including names of geographic places, from text \citep{Leidner_2011,2007}. NER tools are readily available for all major languages, e.g. as part of the Spacy library (QUOTE). The Indonesian language is not covered by main stream tools. Instead we used NERGRIT, a NER dataset for Indonesian developed by Fahmi et. al (2019), that is trained on Wikipedia articles using anago (Lample et al. 2016; Peters et al. 2018), and for which the authors report an F1 score of 0.8. Second, we georeferenced the identified geographic place names, i.e. added a geographic coordinate pair to each place name. We used Geocoder (QUOTE) with a custom Nominatim (QUOTE) instance, based on OpenStreetMap data (data snapshot from 5 March 2020).
Masking precise geographic locations
Kounadi et al!
Results
olx Overall, we collected 502 ads for laughingthrushes, 268 ads for Javan pied starlings, 242 ads for white-rumped shamas, and 74 ads for straw-headed bulbuls.
youtube We were able to download 920 videos and 3,385 comments pertaining to laughingthrushes, 2,906 videos with 9,891 comments of Asian pied starlings, 430 videos, 2,017 comments of white-rumped shamas, and 808 videos, 2,392 comments of strawheaded bulbuls.