Data Preprocessing
Data preparation phase produces the final data sets for models. The main goal of data preparation is to enhance data quality and improve data analysis performance. Data preprocessing needs to be carried out in a more iterative manner until a conclusive outcome is reached. In this study, data preprocessing includes
> Data aggregation - Used dcast from reshape package to combine the train_services opted for data with the remaining data.
> Imputation of the missing values - Imputed the missing values using Amelia
> Type Conversions - Converted the attributes to respective data types.
> Feature selection of the most informative variables - Used Variable Importance from the Random Forest model.
> New variable derivation - Derived No_of_days ( Difference between DOE and DOC),
> Handling Class Imbalance - Used SMOTE and ROSE for balancing the class distribution
Exploratory Analysis