To analyse the vocabulary change through the period ranging from
COVID-19 discovery to its spread outside China, we extracted terms from
the whole corpus. For this purpose, we used a ranking function based on
terms frequency and importance11Using the F-TFIDF-C measure
(Lossio-Ventura et al., 2014) with the support of BioTex, a
text-mining tool adapted to the biomedical area (Lossio-Ventura et al.,
2016). BioTex is based on the use of (i) a relevant combination
of information retrieval techniques and statistical methods, and (ii) a
list of syntactic structures of the terms that have been learnt with
relevant sources (e.g. MeSH). The terms extracted with BioTex can
be simple (e.g. influenza), or compound (e.g. avian
influenza), and are lowercased.
We further identified the terms referring to COVID-19, such as “new
virus” and “mystery pneumonia”. We manually categorized the terms as
“mystery” (terms referring to the unknown threat), “pneumonia”
(terms referring to the clinical signs), “coronavirus” (terms
referring to the virus taxonomy) and “technical” (technical acronyms
for the virus itself). One news can contain terms from different
categories. We calculated the daily proportion of each category,
expressed as the sum of the occurrences of the category divided by the
total number of occurrences.
3.1. Detection of newsProMED was the first to detect and report a news from a Chinese online
source22https://promedmail.org/promed-post/?id=6864153}. The
ProMED report dated from Dec. 30, 2019, one day before the first
official notification of pneumonia-like cases in Wuhan (Wuhan Municipal
Health Commission, 2020). PADI-web and HealthMap respectively detected
three and one COVID-19-related news on Dec. 31, 2019, the same day as
the first official notification of pneumonia-like cases in Wuhan (one
HealthMap news from an English source, three PADI-web news from two
English and one Chinese source). The news detected by the three EBS
originated from five different media sources.
From 275 COVID-19-related news retrieved by PADI-web, 45.5%
(n=125) were retrieved by disease-specific RSS, and the remaining
54.5% (n=150) were retrieved by syndromic RSS feeds
(Table 1).
Content-wise, 31.7% (n=87) of the news compared COVID-19 to five
animal diseases (avian influenza, African swine fever, classical swine
fever, West Nile virus, and Rift Valley fever), 24.4% (n=67) of
the news described the broad range of animal species susceptible to
coronaviruses, 18.2% (n=50) described ruling-out avian influenza
from diagnosis of COVID-19, and 7.7% (n=21) described ongoing
outbreaks in addition to COVID-19 (avian influenza, African swine fever,
classical swine fever, and foot-and-mouth disease), 2.5% (n=7)
referred to animal species present in the Chinese markets as being the
potential COVID-19 source, and 0.7% (n=2) news advised to avoid
contact with animals. Irrelevant keywords matches were found in 12 news
(e.g. finding a host keyword in the name of a source), and no link could
be established between the RSS feed and the article for 29 remaining
news (10.5%).