The coronavirus disease 2019 (COVID-19) broke out quickly in Italy since March 2020 when the epidemic got controlled in China. Reasons of rapid breakout and overall case-fatality rate in Italy have been studied and reported in literature [1, 2, 3]. Obvious differences in epidemic spread and fatality rates among regions exist, but factors related these spatial differences are unclear. It is of interest to study this regional heterogeneity and the related factors.
Global data of COVID-19 have been integrated by researchers and available publicly from R package nCov2019 [4]. We downloaded and extracted the data of Italy by regions for our study. As of May 15, 2020, Lombardy ranks top 1 with 83820 cumulative confirmed cases among the 20 regions, while the number of cumulative confirmed cases in Basilicata is the smallest (389 cases). The number of death ranges from 22 to 15296, corresponding to regions of Molise and Lombardy, respectively. Demographical data including population, area, population density and human development index (HDI) by regions of Italy 2019 were downloaded from https://en.wikipedia.org/wiki/Regions_of_Italy. The case rates (the proportion of confirmed cases among regional population) range from 0.0006 to 0.009 with a median of 0.0025, while the death rate (proportion of deaths among regional population) ranges from 0.00005 to 0.00152 with a median of 0.00026. HDI [5] is an integrated index of healthy long life, education and living standard, measured by life expectancy, expected/mean years of schooling, Gross National Income per capita, respectively. The median HDI is 0.891 with a range from 0.845 to 0.919.
It is reasonable to assume people in the same region are independent and identical with the same probability of being infected and confirmed. Under this assumption, we performed a univariate logistic regression between the cumulative confirmed cases and HDI. We found that HDI is statistically significant (log odds = 28.6476, p-value <2*10-16). If HDI increases by 0.1, the odd of a confirmed case (that is, the probability that a person is a confirmed case against the probability that a person is not a confirmed infected) increase exponentially by exp(2.8648)=17.5448.
Many literatures have studied the case-fatality rate. Case-fatality rate is defined as the proportion of death among the confirmed cases. However, not all infected people are diagnosed and counted into the confirmed cases. It is natural to assume people in the same region has the same probability to get infected and die due to COVID-19 while the death probability are different among different regions. A univariate logistic regression to study the relation between the cumulative death and HDI is also performed. HDI is again significant (log odds = 36.7946, p-value < 2*10-16). An increase of 0.1 in HDI associates with an increase of 39.6230 in odds of death.
In summary, it is interesting to note that high HDI is associated with high case rate and high fatality rate. This may because more old people, more professionals live in regions with higher HDI and more business activities including global business trips occur in those regions.