Methodology

\label{methodology}
The goal of the project was to calculate a new score that would capture the specific characteristics of New York City buildings market. In this paper, we have developed three different approaches to assess the more robust and fair scoring system, until the current methodology was refined. The final procedure to design that score followed the steps described in Figure 3.
After the initial processing and cleaning of the data, an initial pool of potentially important features were directly selected from the datasets and some others were engineered from manipulations of the original data. From that initial pool, the features were selected based on domain knowledge to test linear and nonlinear (random forest) models to predict the weather normalized energy use intensity of each building. After predicting the energy use intensity, a score is assigned to every building , finishing with the print out of a scorecard for each.