Methodology
\label{methodology}
The goal of the project was to calculate a new score that would capture
the specific characteristics of New York City buildings market. In this
paper, we have developed three different approaches to assess the more
robust and fair scoring system, until the current methodology was
refined. The final procedure to design that score followed the steps
described in Figure 3.
After the initial processing and cleaning of the data, an initial pool
of potentially important features were directly selected from the
datasets and some others were engineered from manipulations of the
original data. From that initial pool, the features were selected based
on domain knowledge to test linear and nonlinear (random forest) models
to predict the weather normalized energy use intensity of each building.
After predicting the energy use intensity, a score is assigned to every
building , finishing with the print out of a scorecard for each.