4. Discussion
The COVID-19 pandemic posed a big threat to global health, as well as a massive burden on healthcare systems. Besides, the COVID-19 pandemic has impacted the lives of many people worldwide in recent days and needs a massive number of screening tests to identify the presence of the coronavirus. At the same time, the rise of concepts of deep learning (DL) helps to build a COVID-19 diagnosis model effectively to achieve maximum detection rate with minimum computation time (22). A precise forecast of the magnitude of COVID-19 could provide realistic insights into directing critical hospitalization and treatment decisions to relieve the burden on the healthcare system. Clinical/demographic knowledge of COVID-19 disease progression and prognostics will help diagnose critically ill patients, provide adequate treatment, and prevent mortality (23). Based on these important data, the purpose of the research is to build an intelligent model predicting the severity of the disease by modeling the associations between the severity of COVID-19 infection and the demographic/clinical properties of individuals.
Predictive models acquire knowledge about a software project and predict whether the instances added in the future will be faulty or not by studying historical software information. Nevertheless, in most programs, there are many more non-defective (i.e., the majority class) cases than faulty (i.e., the minority class), which is referred to as the problem of class imbalance. Conversely, traditional algorithms in the field of machine learning presume that the numbers of minority and majority groups are essentially the same. Predictive models built from such highly imbalanced datasets tend to disregard faulty instances and predict non-default performance. As a consequence, the models yield highly skewed results and are not technically applicable. Data resampling techniques are commonly used to resolve the class imbalance issue. Two forms of general resampling strategies are oversampling and subsampling: the former creating new cases and introducing them to the minority class and the latter eliminating existing cases from the mainstream. Both techniques strive to balance the distribution of data sets to enhance prediction models’ efficiency (24). In the current study, the class imbalance problem arose in the data set in the process of forming individual models. In order to solve this problem, the imbalance among the disease categories was resolved by balancing the classes in the preprocessing stage of the data set. Individual models of NN, SVM, and QUEST algorithms were constructed to predict the COVID-19 severity categories on the balanced medical records of the patients. Based on the experimental results of each singular model, the NN model produced better predictions as compared to the SVM and QUEST algorithms. Additionally, the most important factors estimated from the NN algorithm in the classification of COVID-19 severity were age, Favipiravir use, the presences of dyspnea, cough and smell loss, Lopinavir/Ritonavir use, the presences of fatigue, fever, frontal type headache, and gender, respectively.
Ensemble learning approaches use many machine-learning algorithms to generate poor predictive results based on features derived from a wide range of data forecasts and merge results to achieve higher performance than any single constituent algorithm, with different voting or other mechanisms. Therefore, ensemble learning is extremely expandable, combined with various machine learning models for different tasks such as general classification tasks, clustering tasks, etc. In general, current methods of ensemble learning can be divided into four categories: supervised classification of the ensemble, semi-supervised classification of the ensemble, the clustering ensemble, and a semi-supervised clustering ensemble (25). In the current study, different ensemble learning algorithms were implemented to combine individual predictions to classify the severity of the COVID-19 pandemic. Voting strategy, one of the ensemble learning methods, gave slightly better predictive results than other ensemble techniques (i.e., HCWS and CWVS) in predicting the severity of Covid-19 disease.
Recent studies on COVID-19 severity prediction have been reported on the applications of machine learning algorithms and artificial intelligence models. A novel study aims to develop a COVID-19 severity prediction model and explain dynamic changes in key clinical characteristics over seven weeks. In accordance with this purpose, a support vector machine model was constructed with a genetic algorithm for feature selection and achieved an accuracy of over 94% for COVID-19 severity prediction. The authors report that the proposed model includes 11 routine clinical features commonly available during COVID-19 management, which may predict the severity and guide the treatment of COVID-19 patients (26). In another recently published study, RNA-Seq and high-resolution mass spectrometry on 128 blood samples from COVID-19 positive and negative patients with diverse disease severities were performed on 219 molecular features with high significance to COVID-19 status and severity. The researchers present an interactive web-based tool (covid-omics.app) to illustrate its utility by comparing the data published and a machine learning approach to COVID-19 severity prediction (27). Another research assesses the predictive accuracy of the severity classification of WHO COVID-19 and compares its predictive power based on the Bayesian network analysis with the new prediction model, COVID-19 EPI-SCORE. The selected variables from the machine learning model were the classification of WHO severity, acute kidney injury, age, Lactate dehydrogenase levels (LDH), lymphocytes, and activated prothrombin time (aPTT). The findings of the study demonstrate that the severity classification of the WHO is accurate for predicting serious results in patients with COVID-19 (28). Other newly published work performs a comparative analysis using machine learning algorithms [i.e., the support vector machine (SVM), decision tree (DT), k-nearest neighbor (kNN), and convolution neural network (CNN)] to classify the COVID-19 confirmed patients’ pneumonia level (mild, progressive, and severe stage). Extensive experiments have been performed, and the findings show the accuracy values for kNN, SVM, DT, and CNN of 91.304%, 91.4%, 87.5%, and 95.622%, respectively (29). Some factors in this study are consistent with other reported researches. Besides, the calculated performance metrics in the current study are higher as compared to similar works (29). The results of the current and above-mentioned research studies demonstrate that machine learning and statistical learning models can predict the severity of the COVID-19 pandemic.
In conclusion, the proposed voting ensemble model outperforms other ensemble and individual machine learning approaches for the severity prediction of COVID-19 disease. The proposed ensemble learning model can be integrated into web or mobile applications for classifying the severity of COVID-19 for clinical decision support.