Rui Zhang

and 9 more

Abstract Background There is currently no robust prognostic model for sarcomatous renal cell carcinoma (sRCC), which could help physicians make better decisions. Objectives To build an accurate predictive model for patients who have sRCC by investigating the important characteristics that influence the overall survival of patients. Design and Methods The Surveillance, Epidemiology and Results (SEER) database of the U.S. National Cancer Institute was used for gathering the dataset of sRCC patients. Following data preprocessing, the data was separated into the training set and the test set in an 8:2 ratio. Mann-Whitney U test and Chi-square test were used to verify whether the data set was evenly divided. Univariate Cox proportional hazard model, Kaplan-Meier analysis and machine learning (ML) algorithm were employed to identify the risk features on overall survival (OS). 10 reliable features were selected to construct six ML models. Model performance, predictive accuracy, and clinical benefits were evaluated by the receiver operating characteristic curves (ROC), calibration plots, and decision curve analysis (DCA) respectively. Results After data preprocessing, 692 patients with sRCC from 1975 to 2019 were included in this study. Ten variables including stage group, T stage, M stage, age, surgery, N stage, tumor size, chemotherapy, histological grade, and radiotherapy were selected as reliable features for machine learning model training. All the models show good prediction performance, among which XGBoost has the best prediction accuracy and stability. The DCA showed that all models except Adaboost could be used to support clinical decision-making with the 90-day, 1-, 2-, 3- and 5-year OS model. Conclusions Six machine learning models were developed to predict 90-day, 1-, 2-, 3- and 5-year overall survival in patients with sRCC. Model evaluations showed that the XGBoost model had the best predictive accuracy and clinical net benefit. These models can help make treatment decisions for patients with sRCC.