Split-sample Validation.
Split-sample validation consists of dividing the sample into two parts,
with model development in one part of the sample while assessing model
performance in the other part of the sample. The splitting is done at
random and typical splitting’s are 1/2:1/2 or 2/3:1/3. For example, if
1/2:1/2 split sample is used then the model is developed in 50% of the
data and the model is evaluated in the other 50% of the data.
Split-sample is an old classical approach of model validation with
several limitations2. As splitting is done fully at
random there could be an imbalance concerning the distribution of
predictors and outcome in the sample2. Randomly
splitting the data does not guarantee that the divided data is
representative of the target population. This problem is serious with
small samples and a predictor with rare events2. One
way to overcome this issue is to stratify the sampling by the outcome
and relevant predictors2. Another issue with the
split-sample method is, it provides less stable results as only part of
the data is used to model development. Also, small validation data
provide an unreliable assessment of model performance that can be even
biased because we want to know the model’s performance in the full data
set, but the assessment was performed only in a part2.
Due to its several drawbacks, split-sample validation is often treated
as an inefficient approach of model validation. The performance of this
procedure is reasonable when the sample size is large according to some
simulation studies2. However, it is suggested to use
other efficient model validation procedures to get reliable results.