Figure 5 - Random Forest model fit and Q-Q plot for Office
buildings. R-squared = 0.275
Model for each cluster
\label{model-for-each-cluster}
Initially the idea of finding peer groups was to train the model
separately for each cluster. Nevertheless, when training the model only
within the clusters, at least for some clusters, it was much harder to
keep the model from overfitting the training data. For the random forest
model, although the in-sample r-squared increased from 0.537 to 0.746,
the out-of-sample r-squared dropped from 0.24 to 0.18. Similarly, for
linear model, the out-of-sample r-squared also consistently dropped.
Given those results, the initial idea of training the model separately
on each cluster was abandoned and the peer group separation was used
only for the scoring step, when the ratio of actual to predicted EUI was
only compared within the peer groups.
Score
\label{score}
Figure 6 presents the scatter plot of scores and weather normalized EUI
for 1166 office buildings in 2015. The scores are calculated based on
the random forest model, in terms of actual to predicted EUI ratio.
Generally, buildings with very high EUI receives high grades and
buildings with very low EUI receives low grades, but it is also clear
that for buildings closer to the mean of the EUI distribution (around
200 kBtu/ft2), changes in features can result in
changes in the score.