Split-sample Validation.
Split-sample validation consists of dividing the sample into two parts, with model development in one part of the sample while assessing model performance in the other part of the sample. The splitting is done at random and typical splitting’s are 1/2:1/2 or 2/3:1/3. For example, if 1/2:1/2 split sample is used then the model is developed in 50% of the data and the model is evaluated in the other 50% of the data.
Split-sample is an old classical approach of model validation with several limitations2. As splitting is done fully at random there could be an imbalance concerning the distribution of predictors and outcome in the sample2. Randomly splitting the data does not guarantee that the divided data is representative of the target population. This problem is serious with small samples and a predictor with rare events2. One way to overcome this issue is to stratify the sampling by the outcome and relevant predictors2. Another issue with the split-sample method is, it provides less stable results as only part of the data is used to model development. Also, small validation data provide an unreliable assessment of model performance that can be even biased because we want to know the model’s performance in the full data set, but the assessment was performed only in a part2.
Due to its several drawbacks, split-sample validation is often treated as an inefficient approach of model validation. The performance of this procedure is reasonable when the sample size is large according to some simulation studies2. However, it is suggested to use other efficient model validation procedures to get reliable results.