Figure 5 - Random Forest model fit and Q-Q plot for Office buildings. R-squared = 0.275

Model for each cluster

\label{model-for-each-cluster}
Initially the idea of finding peer groups was to train the model separately for each cluster. Nevertheless, when training the model only within the clusters, at least for some clusters, it was much harder to keep the model from overfitting the training data. For the random forest model, although the in-sample r-squared increased from 0.537 to 0.746, the out-of-sample r-squared dropped from 0.24 to 0.18. Similarly, for linear model, the out-of-sample r-squared also consistently dropped. Given those results, the initial idea of training the model separately on each cluster was abandoned and the peer group separation was used only for the scoring step, when the ratio of actual to predicted EUI was only compared within the peer groups.

Score

\label{score}
Figure 6 presents the scatter plot of scores and weather normalized EUI for 1166 office buildings in 2015. The scores are calculated based on the random forest model, in terms of actual to predicted EUI ratio. Generally, buildings with very high EUI receives high grades and buildings with very low EUI receives low grades, but it is also clear that for buildings closer to the mean of the EUI distribution (around 200 kBtu/ft2), changes in features can result in changes in the score.