2.2. Data Preprocessing
There were no missing values in the available dataset regarding COVID-19 severity and other attributes of the patients. However, there was a problem of class imbalance between categories of COVID-19 severity. The class imbalance problem affects data when class distributions are strongly imbalanced. Many classification learning algorithms in this context have poor predictive accuracy for the rare class. The proportion of cases in a data set that belongs to each class plays an important role in machine learning. The real-world data, however, often suffer from class imbalances. It is harder to deal with multi-class tasks that require different class misclassification costs than with two-class tasks (8). The Sample (Balance) operator in the RapidMiner software was used to solve the class imbalance problem that emerged in this study. An automated balancing of an example set with labels is achieved by the Sample (Balance) operator, which allows up and downsampling (9). For feature/attribute selection, the most important attributes of the given data set were selected by the Optimize Selection (Evolutionary operator in the RapidMiner software, which uses a Genetic Algorithm. A genetic algorithm (GA) is a search heuristic that imitates the natural evolution process. This heuristic is regularly used to construct helpful solutions to problems of optimization. Genetic algorithms belong to the broader class of evolutionary algorithms (EA), which use techniques inspired by natural evolution, such as inheritance, mutation, selection, and crossover, to produce solutions to optimization problems (10).