4. Discussion
The COVID-19 pandemic posed a big threat to global health, as well as a
massive burden on healthcare systems. Besides, the COVID-19 pandemic has
impacted the lives of many people worldwide in recent days and needs a
massive number of screening tests to identify the presence of the
coronavirus. At the same time, the rise of concepts of deep learning
(DL) helps to build a COVID-19 diagnosis model effectively to achieve
maximum detection rate with minimum computation time (22). A precise
forecast of the magnitude of COVID-19 could provide realistic insights
into directing critical hospitalization and treatment decisions to
relieve the burden on the healthcare system. Clinical/demographic
knowledge of COVID-19 disease progression and prognostics will help
diagnose critically ill patients, provide adequate treatment, and
prevent mortality (23). Based on these important data, the purpose of
the research is to build an intelligent model predicting the severity of
the disease by modeling the associations between the severity of
COVID-19 infection and the demographic/clinical properties of
individuals.
Predictive models acquire knowledge about a software project and predict
whether the instances added in the future will be faulty or not by
studying historical software information. Nevertheless, in most
programs, there are many more non-defective (i.e., the majority class)
cases than faulty (i.e., the minority class), which is referred to as
the problem of class imbalance. Conversely, traditional algorithms in
the field of machine learning presume that the numbers of minority and
majority groups are essentially the same. Predictive models built from
such highly imbalanced datasets tend to disregard faulty instances and
predict non-default performance. As a consequence, the models yield
highly skewed results and are not technically applicable. Data
resampling techniques are commonly used to resolve the class imbalance
issue. Two forms of general resampling strategies are oversampling and
subsampling: the former creating new cases and introducing them to the
minority class and the latter eliminating existing cases from the
mainstream. Both techniques strive to balance the distribution of data
sets to enhance prediction models’ efficiency (24). In the current
study, the class imbalance problem arose in the data set in the process
of forming individual models. In order to solve this problem, the
imbalance among the disease categories was resolved by balancing the
classes in the preprocessing stage of the data set. Individual models of
NN, SVM, and QUEST algorithms were constructed to predict the COVID-19
severity categories on the balanced medical records of the patients.
Based on the experimental results of each singular model, the NN model
produced better predictions as compared to the SVM and QUEST algorithms.
Additionally, the most important factors estimated from the NN algorithm
in the classification of COVID-19 severity were age, Favipiravir use,
the presences of dyspnea, cough and smell loss, Lopinavir/Ritonavir use,
the presences of fatigue, fever, frontal type headache, and gender,
respectively.
Ensemble learning approaches use many machine-learning algorithms to
generate poor predictive results based on features derived from a wide
range of data forecasts and merge results to achieve higher performance
than any single constituent algorithm, with different voting or other
mechanisms. Therefore, ensemble learning is extremely expandable,
combined with various machine learning models for different tasks such
as general classification tasks, clustering tasks, etc. In general,
current methods of ensemble learning can be divided into four
categories: supervised classification of the ensemble, semi-supervised
classification of the ensemble, the clustering ensemble, and a
semi-supervised clustering ensemble (25). In the current study,
different ensemble learning algorithms were implemented to combine
individual predictions to classify the severity of the COVID-19
pandemic. Voting strategy, one of the ensemble learning methods, gave
slightly better predictive results than other ensemble techniques (i.e.,
HCWS and CWVS) in predicting the severity of Covid-19 disease.
Recent studies on COVID-19 severity prediction have been reported on the
applications of machine learning algorithms and artificial intelligence
models. A novel study aims to develop a COVID-19 severity prediction
model and explain dynamic changes in key clinical characteristics over
seven weeks. In accordance with this purpose, a support vector machine
model was constructed with a genetic algorithm for feature selection and
achieved an accuracy of over 94% for COVID-19 severity prediction. The
authors report that the proposed model includes 11 routine clinical
features commonly available during COVID-19 management, which may
predict the severity and guide the treatment of COVID-19 patients (26).
In another recently published study, RNA-Seq and high-resolution mass
spectrometry on 128 blood samples from COVID-19 positive and negative
patients with diverse disease severities were performed on 219 molecular
features with high significance to COVID-19 status and severity. The
researchers present an interactive web-based tool (covid-omics.app) to
illustrate its utility by comparing the data published and a machine
learning approach to COVID-19 severity prediction (27). Another research
assesses the predictive accuracy of the severity classification of WHO
COVID-19 and compares its predictive power based on the Bayesian network
analysis with the new prediction model, COVID-19 EPI-SCORE. The selected
variables from the machine learning model were the classification of WHO
severity, acute kidney injury, age, Lactate dehydrogenase levels (LDH),
lymphocytes, and activated prothrombin time (aPTT). The findings of the
study demonstrate that the severity classification of the WHO is
accurate for predicting serious results in patients with COVID-19 (28).
Other newly published work performs a comparative analysis using machine
learning algorithms [i.e., the support vector machine (SVM), decision
tree (DT), k-nearest neighbor (kNN), and convolution neural network
(CNN)] to classify the COVID-19 confirmed patients’ pneumonia level
(mild, progressive, and severe stage). Extensive experiments have been
performed, and the findings show the accuracy values for kNN, SVM, DT,
and CNN of 91.304%, 91.4%, 87.5%, and 95.622%, respectively (29).
Some factors in this study are consistent with other reported
researches. Besides, the calculated performance metrics in the current
study are higher as compared to similar works (29). The results of the
current and above-mentioned research studies demonstrate that machine
learning and statistical learning models can predict the severity of the
COVID-19 pandemic.
In conclusion, the proposed voting ensemble model outperforms other
ensemble and individual machine learning approaches for the severity
prediction of COVID-19 disease. The proposed ensemble learning model can
be integrated into web or mobile applications for classifying the
severity of COVID-19 for clinical decision support.