Title:
Running head: Detection of COVID-19 emergence in online news.
Sarah Valentin1,2, Alizé Mercier2, Renaud Lancelot2, Mathieu Roche1 and Elena Arsevska2 *†
1UMR TETIS, CIRAD, F-34398 Montpellier, France. TETIS, Univ Montpellier, AgroParisTech, CIRAD, CNRS, INRAE, Montpellier, France.2UMR ASTRE, CIRAD, F-34398 Montpellier, France. ASTRE, Univ Montpellier, CIRAD, INRAE, Montpellier, France.*Corresponding author. E-mail: elena.arsevska@cirad.frArticle type: Rapid communicationWords counts: 2,115 wordsAbstract Event-based surveillance (EBS) systems monitor a broad range of information sources to detect early signals of disease emergence, including new and unknown diseases. Following the emergence of a newly identified coronavirus –so-called COVID-19, in humans in December 2019 in Wuhan, China, we conducted a retrospective analysis of the capacity of three Event-Based Systems (EBS) systems (ProMED, HealthMap and PADI-web) to detect early signals of this emergence. We evaluated the changes in the online news vocabulary coinciding with the period before / after the identification of COVID-19, as well as the assessment of its contagiousness and pandemic potential. ProMED was the timeliest EBS, detecting signals one day before the official notification. At this early stage, the specific vocabulary was related to “pneumonia symptoms” and “mystery illness”. Once COVID-19 was identified, the vocabulary changed to virus family and specific COVID-19 acronyms. Our results suggest the three EBS systems are complementary regarding data sources, and all need improvements regarding timeliness. EBS methods should be adapted to the different stages of disease emergence to improve the early detection of future emergence of unknown pathogens.Keywords: epidemic intelligence, online news, emerging disease, PADI-web, COVID-19, One Health
Introduction
Epidemic intelligence (EI) aims to detect, monitor and assess potential health threats for early warning and rapid response (Paquet et al., 2006). In addition to the indicator-based surveillance of official sources, public and animal health agencies increasingly integrate an event-based surveillance (EBS) component to their EI system. The EBS uses unstructured data from unofficial sources such as online news to improve the early detection of emerging infectious diseases (EIDs). Since the late 1990s, several free-access EBS systems support the EI process, such as the Program for Monitoring Emerging Disease or ProMED (Woodall, 2001), HealthMap (Brownstein et al., 2008) and since recently PADI-web (Valentin, Arsevska, et al., 2020). ProMED is a human-curated system launched in 1994 by the International Society for Infectious Diseases (ISID). The system relies on a large network of experts worldwide who produce and share verified reports on disease outbreaks in a common platform (Carrion & Madoff, 2017). HealthMap is a semi-automated system founded by the Boston Children’s Hospital in 2006. The tool monitors both official and non-official news sources on the web (Freifeld et al., 2008). Both HealthMap and ProMED monitor a broad range of human, animal and environmental known and unknown threats. The Platform for Automated extraction of animal Disease Information from the web (PADI-web) was created in 2016 to monitor online animal health-related news for the French Epidemic Intelligence System (FEIS) (Arsevska et al., 2018, Valentin et al., 2020). Both HealthMap and PADI-web automatically retrieve health-related news from Google News using customized Really Simple Syndication (RSS) feeds. For the detection of news, the two systems use terms for known diseases and terms for clinical signs and syndromes (Arsevska et al., 2016). All three EBS systems monitor news in multiple languages, including Chinese.
On Dec. 31 2019, local health officials of the Chinese city of Wuhan reported a cluster of 27 cases of ”pneumonia of unknown cause”. These cases were linked to a wholesale seafood market in the city. In January 2020, the first death was reported, and the aetiology was identified as a new coronavirus, SARS-Cov-2, and the disease was named COVID-19. The first epidemiological study on patients with laboratory-confirmed COVID-19 infection reported an onset of illness as early as Dec. 1 2019 (Huang et al., 2020).
This retrospective study aimed first to evaluate three EBS systems (ProMED, HealthMap and PADI-web) and their capacity to detect the COVID-19 emergence in China timely. Secondly, we focused on PADI-web to understand how an animal health EBS system contributed to the detection of a human EID. We analysed the RSS feeds from PADI-web that detected the COVID-19-related news articles (further referred to as “news”). Thirdly, we assessed the vocabulary in the news detected by PADI-web and its change in relation to the identification of the pathogen and the spread of the EID.
Material and methods
2.1. Detection of COVID-19-related news
To assess the timeliness of the three EBS, we searched for news from Dec. 1st, to Dec. 31, 2019. We compared the first news regarding publication date, language, and source.
To understand how PADI-web detected the COVID-19 emergence, we further filtered a second corpus of news published from Dec. 31, 2019, to Jan. 26, 2020, containing at least one of the following words in the title and body of the news: “pneumonia”, “respiratory illness”, “coronavirus”, “nCoV” (an early name for COVID-19), and “Wuhan”. After manual verification of their relevance, we retained for analysis 275 out of 333 news (Valentin, Mercier, et al., 2020).
To assess the link between the animal health RSS feeds from PADI-web and the detected news, we analysed them according to the RSS feed used for their retrieval. To this end, we read each news and categorized it into i) disease-specific RSS feeds (containing specific disease names), and ii) syndromic RSS feeds (containing combinations of symptoms and animal hosts).