3.2. Vocabulary of the news
From the terms referring either to the virus or the disease, 18 terms
belonged to the category “pneumonia”, 8 terms belonged to the category
“mystery”, 3 terms belonged to the category “coronavirus” (one of
them, “coronovirus” being a misspelt form for “coronavirus”), and 7
terms belonged to the category “technical” (Table 2 ).
Before identification of the virus (Dec. 31, 2019 - Jan. 8, 2020),
58.1% (n =317/586) of the COVID-19 terms were from category
“pneumonia”, 29.1% (n =159/586) were from category “mystery”
and 12.8% (n =70/586) were from category “coronavirus”. From
the official identification to the first report of a case outside China
(Jan. 09, 2020 - Jan. 12, 2020), 48.5% (n =127/262) of the terms
were from category “coronavirus”, 34.7% (n =91/262) were from
category “pneumonia”, and 16.8% (n =44/262) were from category
“mystery”. From this first report to the confirmation of
human-to-human transmission (Jan. 13, 2020 - Jan. 19, 2020), 58.3%
(n =196/336) of the terms were from category “coronavirus”,
27.4% (n =92/336) were from category “pneumonia”, 11.3%
(n =38/336) were from category “mystery”, and 3.0%
(n =10/336) were from category “technical”. From the
confirmation of human-to-human transmission to the end of the studied
period 62.9% (n =906/1440) of the terms were from category
“coronavirus”, 17.4% (n =250/1440) were from category
“technical”, 14.1% (n =203/1140) were from category
“pneumonia”, and 5.6% (n =81/1140) were from category
“mystery” (Figure 1 ).
Among the three compared EBS systems, only ProMED relies on local expert
information to alert on health threats. This result suggests the network
of local experts rooted in the field is crucial for the detection of EID
events and their reporting. On the other hand, HealthMap and PADI-web
detected the news on the same day than the official reporting. Thus, it
is important to understand their current limitations and stress the role
of experts in EBS systems. Further studies should also assess whether
the timeliness of automated systems depends on the communication
strategies of online media, what their threshold for reporting health
events is, and how these features influence the sensitivity of web-based
EBS systems.
The three EBS systems included in this study monitor media in multiple
languages, thus making it possible to detect news from local media. A
further increase in the number of available languages should improve the
sensitivity of the EBS systems (Barboza et al., 2014). Our study also
showed that the three EBS systems are complementary regarding scope
(animal health, animal and public health), moderation (manual,
semi-automated, automated), and the number of covered languages.
PADI-web could retrieve news through animal-health related RSS feeds,
thus proving useful to detect relevant information for public health
risk assessors. For example, many of the news detected by PADI-web
compared the magnitude and the economic impact of COVID-19 with avian
influenza and African swine fever outbreaks in China. Indeed, before the
identification of COVID-19, the pneumonia-like illness was compared to
avian influenza zoonotic infections. Some news also presented a summary
of several recent disease outbreaks in China, including African swine
fever (which is not a zoonotic disease), thus explaining why they were
captured.
The ability of EBS tools to embrace a broad scope of health-related
topics through a limited number of queries (RSS feeds) is a significant
strength compared to formal sources. This capacity largely depends on
the intrinsic features of online news in which outbreak-related content
is often enriched with additional information, such as comparisons with
previous disease outbreaks, thus increasing the probability of being
detected by EBS tools. However, the probability of detection of an EID
event might be higher for (actually or assumed) zoonotic diseases and
countries with ongoing animal disease outbreaks. This is not a strong
limitation in practice.
In addition to syndromic feeds, disease-specific feeds equally
contributed to the first detection of COVID-19 by PADI-web, thus
highlighting the importance of combining both disease-specific and
syndromic feeds. Furthermore, the vocabulary used to describe COVID-19
emergence before the virus identification included terms semantically
related to “unknown” and “mysterious” events. Therefore, their
integration into existing RSS feeds may increase the detection and
retrieval of relevant news. We suggest enriching the identification of
classic epidemiological entities (e.g. disease, hosts, locations, dates)
in the news content with this category of terms.
Our results indicated that the vocabulary changes as the disease
spreads. Thus, EBS methods used to retrieve and analyse news from the
web should be adapted to the different stages of disease epidemiology.
With MERS-CoV in 2014 and SARS in 2003, COVID-19 is the third
coronavirus emergence in the past two decades, highlighting the need to
monitor closely the emergence of pneumonia-like illnesses using existing
EBS systems. Our results highlight the complementarity of existing
systems and underline the need for collaborative development.
Mutualizing resources from veterinary and public health seems crucial to
improve the early detection of unknown diseases in a One Health context.
Future work will focus on identifying the most relevant keywords for the
rapid detection of unknown threats, in collaboration with other EBS
systems. Also, EBS tools may be used in a broader perspective, such as
monitoring the implementation of protective and control measures.
These efforts invested in improving the timeliness and sensitivity of
EBS systems only make sense if their output (alerts of EID events) is
designed, supervised, and interpreted by epidemiologists in
collaboration with disease experts and reference laboratories. Most
importantly, this output should feed the information flow used by the
health managers and decision-makers, for an early reaction to control
EID events before they can spread.Acknowledgements The authors acknowledge ProMED for data sharing. This work was in part
funded by the H2020 “Monitoring outbreak events for disease
surveillance in a data science context” (MOOD) project under grant
agreement No 874850
(https://mood-h2020.eu/)., the
French General Directorate for Food (DGAL), the French Agricultural
Research Centre for International Development (CIRAD) and the SONGES
Project FEDER and Occitanie. This work was supported by the French
National Research Agency under the Investments for the Future Program,
referred to as ANR-16-CONV-0004.Data Sharing and AccessibilityThe data that support the findings of this study are openly available in
CIRAD Dataverse at
http://doi.org/doi:10.18167/DVN1/MSLEFC.Conflict of interests
The authors declare no conflict of interests.
The authors confirm that the ethical policies of the journal, as noted
on the journal’s author guidelines page, have been adhered to. No
ethical approval was required because this study did not involve any
experimental protocol on humans or animals, and only uses publicly
available online data.