Limitations and future work
\label{limitations-and-future-work}
The limitations of the current analysis are mostly related to the model
selected and the data utilized.
Features from the LL87 data set could probably be used to improve the
predictive power of the model built. After exploring some of the
information available in that data set such as heating and cooling
systems present in each building, the group decided not to incorporate
those features in this analysis to avoid a reductions in the primary
dataset. Currently, LL87 data is available only for a subset of about
4.500 buildings compliant with LL84 (about 30%) corresponding to audits
from 2013 to 2015. As the model predicts EUI to calculate the score
within a cluster, the incorporation of these features would have limited
the model since it was not found any fair alternative to incorporate the
information about the buildings that were already audited without
compromising the analysis about the buildings for which that information
is not yet available.
Another limitation was that the number of buildings for some categories
is so small that it is not enough to train a model only using those
observations. For example, there is only one building which is food
service. One solution would be to cluster building types and that have
only a few buildings and fit models to groups of similar typologies.
However, it could still be an unfair approach because even buildings
within the same group might not be totally comparable. For now, we have
decided not to send out scorecards to buildings for which the type only
contain a few buildings and focus only on the ones that represent most
of the buildings reporting to LL84.
Another interesting feature that could be incorporated to improve
predictive power of the model is the measure of the exposure of each
building to the sun light. Since there is a publicly available
georeferenced 3D model of New York City, it is possible to simulate the
shadows in each part of the city throughout the year and thereby
identify which buildings are likely to need more energy to be cooled in
the summer, because of higher sun exposure, and, conversely, which
buildings might need more energy to be heated in the winter because of a
lower sun exposure.
Finally, an important limitation of the model is that it is not very
useful to predict EUI out of the training set. As noted in the results
section, the out-of-sample r-squared was still relatively low even for
the non-linear model. Therefore, this model is probably can probably not
be directly applied to other cities or even to extrapolate the energy
use intensity data available in LL84 to smaller buildings within NYC.
Incorporating other features such as the mentioned above from LL87 and
the sun exposure, could help to build a model robust enough to be
extendable to other buildings in the city. For the application that this
model was designed, which is comparing the energy performance of the
buildings complying with LL84, the model developed a satisfactory
performance, even though it could certainly be further improved with
more dedicated research.