Title:
Running head: Detection of COVID-19 emergence in online news.
Sarah Valentin1,2, Alizé
Mercier2, Renaud Lancelot2, Mathieu
Roche1 and Elena
Arsevska2 *†
1UMR TETIS, CIRAD, F-34398 Montpellier, France.
TETIS, Univ Montpellier, AgroParisTech, CIRAD, CNRS, INRAE, Montpellier,
France.2UMR ASTRE, CIRAD, F-34398 Montpellier, France.
ASTRE, Univ Montpellier, CIRAD, INRAE, Montpellier, France.*Corresponding author. †E-mail:
elena.arsevska@cirad.frArticle type: Rapid communicationWords counts: 2,115 wordsAbstract Event-based surveillance (EBS) systems monitor a broad range of
information sources to detect early signals of disease emergence,
including new and unknown diseases.
Following the emergence of a newly identified coronavirus –so-called
COVID-19, in humans in December 2019 in Wuhan, China, we conducted a
retrospective analysis of the capacity of three Event-Based Systems
(EBS) systems (ProMED, HealthMap and PADI-web) to detect early signals
of this emergence. We evaluated the changes in the online news
vocabulary coinciding with the period before / after the identification
of COVID-19, as well as the assessment of its contagiousness and
pandemic potential. ProMED was the timeliest EBS, detecting signals one
day before the official notification. At this early stage, the specific
vocabulary was related to “pneumonia symptoms” and “mystery
illness”. Once COVID-19 was identified, the vocabulary changed to virus
family and specific COVID-19 acronyms.
Our results suggest the three EBS systems are complementary regarding
data sources, and all need improvements regarding timeliness. EBS
methods should be adapted to the different stages of disease emergence
to improve the early detection of future emergence of unknown pathogens.Keywords: epidemic intelligence, online news, emerging disease,
PADI-web, COVID-19, One Health
Introduction
Epidemic intelligence (EI)
aims to detect, monitor and assess potential health threats for early
warning and rapid response (Paquet et al., 2006). In addition to the
indicator-based surveillance of official sources, public and animal
health agencies increasingly integrate an event-based surveillance (EBS)
component to their EI system. The EBS uses unstructured data from
unofficial sources such as online news to improve the early detection of
emerging infectious diseases (EIDs). Since the late 1990s, several
free-access EBS systems support the EI process, such as the Program for
Monitoring Emerging Disease or ProMED (Woodall, 2001), HealthMap
(Brownstein et al., 2008) and since recently PADI-web (Valentin,
Arsevska, et al., 2020).
ProMED is a
human-curated system launched in 1994 by
the International Society for Infectious
Diseases (ISID). The system relies on a large network of experts
worldwide who produce and share verified reports on disease outbreaks in
a common platform (Carrion & Madoff, 2017). HealthMap is a
semi-automated system founded by the Boston Children’s Hospital in 2006.
The tool monitors both official and non-official news sources on the web
(Freifeld et al., 2008). Both HealthMap and ProMED monitor a broad range
of human, animal and environmental known and unknown threats. The
Platform for Automated extraction of animal Disease Information from the
web (PADI-web) was created in 2016 to monitor online animal
health-related news for the French Epidemic Intelligence System (FEIS)
(Arsevska et al., 2018, Valentin et al., 2020). Both HealthMap and
PADI-web automatically retrieve health-related news from Google News
using customized Really Simple Syndication (RSS) feeds. For the
detection of news, the two systems use terms for known diseases and
terms for clinical signs and syndromes (Arsevska et al., 2016). All
three EBS systems monitor news in multiple languages, including Chinese.
On Dec. 31 2019, local health officials of the Chinese city of Wuhan
reported a cluster of 27 cases of ”pneumonia of unknown cause”. These
cases were linked to a wholesale seafood market in the city. In January
2020, the first death was reported, and the aetiology was identified as
a new coronavirus, SARS-Cov-2, and the disease was named COVID-19. The
first epidemiological study on patients with laboratory-confirmed
COVID-19 infection reported an onset of illness as early as Dec. 1 2019
(Huang et al., 2020).
This retrospective study aimed first to evaluate three EBS systems
(ProMED, HealthMap and PADI-web) and their capacity to detect the
COVID-19 emergence in China timely. Secondly, we focused on PADI-web to
understand how an animal health EBS system contributed to the detection
of a human EID. We analysed the RSS feeds from PADI-web that detected
the COVID-19-related news articles (further referred to as “news”).
Thirdly, we assessed the vocabulary in the news detected by PADI-web and
its change in relation to the identification of the pathogen and the
spread of the EID.
Material and methods
2.1. Detection of COVID-19-related news
To assess the timeliness of the three EBS, we searched for news from
Dec. 1st, to Dec. 31, 2019. We compared the first news
regarding publication date, language, and source.
To understand how PADI-web detected the COVID-19 emergence, we further
filtered a second corpus of news published from Dec. 31, 2019, to Jan.
26, 2020, containing at least one of the following words in the title
and body of the news: “pneumonia”, “respiratory illness”,
“coronavirus”, “nCoV” (an early name for COVID-19), and “Wuhan”.
After manual verification of their relevance, we retained for analysis
275 out of 333 news (Valentin, Mercier, et al., 2020).
To assess the link between the animal health RSS feeds from PADI-web and
the detected news, we analysed them according to the RSS feed used for
their retrieval. To this end, we read each news and categorized it into
i) disease-specific RSS feeds (containing specific disease names), and
ii) syndromic RSS feeds (containing combinations of symptoms and animal
hosts).