3.2. Vocabulary of the news
From the terms referring either to the virus or the disease, 18 terms belonged to the category “pneumonia”, 8 terms belonged to the category “mystery”, 3 terms belonged to the category “coronavirus” (one of them, “coronovirus” being a misspelt form for “coronavirus”), and 7 terms belonged to the category “technical” (Table 2 ).
Before identification of the virus (Dec. 31, 2019 - Jan. 8, 2020), 58.1% (n =317/586) of the COVID-19 terms were from category “pneumonia”, 29.1% (n =159/586) were from category “mystery” and 12.8% (n =70/586) were from category “coronavirus”. From the official identification to the first report of a case outside China (Jan. 09, 2020 - Jan. 12, 2020), 48.5% (n =127/262) of the terms were from category “coronavirus”, 34.7% (n =91/262) were from category “pneumonia”, and 16.8% (n =44/262) were from category “mystery”. From this first report to the confirmation of human-to-human transmission (Jan. 13, 2020 - Jan. 19, 2020), 58.3% (n =196/336) of the terms were from category “coronavirus”, 27.4% (n =92/336) were from category “pneumonia”, 11.3% (n =38/336) were from category “mystery”, and 3.0% (n =10/336) were from category “technical”. From the confirmation of human-to-human transmission to the end of the studied period 62.9% (n =906/1440) of the terms were from category “coronavirus”, 17.4% (n =250/1440) were from category “technical”, 14.1% (n =203/1140) were from category “pneumonia”, and 5.6% (n =81/1140) were from category “mystery” (Figure 1 ).
Among the three compared EBS systems, only ProMED relies on local expert information to alert on health threats. This result suggests the network of local experts rooted in the field is crucial for the detection of EID events and their reporting. On the other hand, HealthMap and PADI-web detected the news on the same day than the official reporting. Thus, it is important to understand their current limitations and stress the role of experts in EBS systems. Further studies should also assess whether the timeliness of automated systems depends on the communication strategies of online media, what their threshold for reporting health events is, and how these features influence the sensitivity of web-based EBS systems.
The three EBS systems included in this study monitor media in multiple languages, thus making it possible to detect news from local media. A further increase in the number of available languages should improve the sensitivity of the EBS systems (Barboza et al., 2014). Our study also showed that the three EBS systems are complementary regarding scope (animal health, animal and public health), moderation (manual, semi-automated, automated), and the number of covered languages.
PADI-web could retrieve news through animal-health related RSS feeds, thus proving useful to detect relevant information for public health risk assessors. For example, many of the news detected by PADI-web compared the magnitude and the economic impact of COVID-19 with avian influenza and African swine fever outbreaks in China. Indeed, before the identification of COVID-19, the pneumonia-like illness was compared to avian influenza zoonotic infections. Some news also presented a summary of several recent disease outbreaks in China, including African swine fever (which is not a zoonotic disease), thus explaining why they were captured.
The ability of EBS tools to embrace a broad scope of health-related topics through a limited number of queries (RSS feeds) is a significant strength compared to formal sources. This capacity largely depends on the intrinsic features of online news in which outbreak-related content is often enriched with additional information, such as comparisons with previous disease outbreaks, thus increasing the probability of being detected by EBS tools. However, the probability of detection of an EID event might be higher for (actually or assumed) zoonotic diseases and countries with ongoing animal disease outbreaks. This is not a strong limitation in practice.
In addition to syndromic feeds, disease-specific feeds equally contributed to the first detection of COVID-19 by PADI-web, thus highlighting the importance of combining both disease-specific and syndromic feeds. Furthermore, the vocabulary used to describe COVID-19 emergence before the virus identification included terms semantically related to “unknown” and “mysterious” events. Therefore, their integration into existing RSS feeds may increase the detection and retrieval of relevant news. We suggest enriching the identification of classic epidemiological entities (e.g. disease, hosts, locations, dates) in the news content with this category of terms.
Our results indicated that the vocabulary changes as the disease spreads. Thus, EBS methods used to retrieve and analyse news from the web should be adapted to the different stages of disease epidemiology. With MERS-CoV in 2014 and SARS in 2003, COVID-19 is the third coronavirus emergence in the past two decades, highlighting the need to monitor closely the emergence of pneumonia-like illnesses using existing EBS systems. Our results highlight the complementarity of existing systems and underline the need for collaborative development. Mutualizing resources from veterinary and public health seems crucial to improve the early detection of unknown diseases in a One Health context. Future work will focus on identifying the most relevant keywords for the rapid detection of unknown threats, in collaboration with other EBS systems. Also, EBS tools may be used in a broader perspective, such as monitoring the implementation of protective and control measures. These efforts invested in improving the timeliness and sensitivity of EBS systems only make sense if their output (alerts of EID events) is designed, supervised, and interpreted by epidemiologists in collaboration with disease experts and reference laboratories. Most importantly, this output should feed the information flow used by the health managers and decision-makers, for an early reaction to control EID events before they can spread.Acknowledgements The authors acknowledge ProMED for data sharing. This work was in part funded by the H2020 “Monitoring outbreak events for disease surveillance in a data science context” (MOOD) project under grant agreement No 874850 (https://mood-h2020.eu/)., the French General Directorate for Food (DGAL), the French Agricultural Research Centre for International Development (CIRAD) and the SONGES Project FEDER and Occitanie. This work was supported by the French National Research Agency under the Investments for the Future Program, referred to as ANR-16-CONV-0004.Data Sharing and AccessibilityThe data that support the findings of this study are openly available in CIRAD Dataverse at http://doi.org/doi:10.18167/DVN1/MSLEFC.Conflict of interests
The authors declare no conflict of interests.
Ethics Statement
The authors confirm that the ethical policies of the journal, as noted on the journal’s author guidelines page, have been adhered to. No ethical approval was required because this study did not involve any experimental protocol on humans or animals, and only uses publicly available online data.