Literature review and related previous work

\label{literature-review-and-related-previous-work}

Academia

\label{academia}
The problem of defining an energy performance metric for buildings has been extensively explored in academia and many different approaches have been proposed. Borgstein et al . (2016) [3] made a comprehensive literature review of these methods and divided them into four main groups: Engineering Calculations, Simulations, Statistical methods and Machine Learning methods. The authors also highlight differences between Calculated and Measured energy ratings, as defined by the ISO 16346:2013, and point to the different possible reference systems (baselines) when defining an energy performance metric: Historical energy performance, Typical performance of similar buildings (empirical), Expected energy performance, Potential energy performance and Required performance (norms or standards). The work of Borgstein et al . (2016) [3] characterizes well the landscape in which the performance metric for the Energy Snapshot is being developed and also illustrates the complexity related to that goal.
In order to better understand the main aspects contributing to the energy performance of a building, one can refer to the Buildings and Communities Programme of the International Energy Agency (IEA EBC). Its Annex 53 (2013) [4] investigated the main aspects influencing energy consumption in buildings and grouped them in six major groups: Climate, Building envelope, Building systems, Operations & maintenance, Occupant behaviour, Indoor environmental conditions. Those were the aspects considered to select and engineer features to predict buildings’ energy use intensity, to identify peer groups and to build the energy performance metric.
Kontokosta et al . (2015) [5] gave an important contribution to the understanding of the data from LL84 data by building an interactive web-based visualization for some of its parameters at the level of individual buildings in New York City. The purpose of this work was to provide a better understanding of the energy usage around the city. Besides that, even more important contributions from this work were the models built to extrapolate and generalize the LL84 data to estimate the natural gas and electricity consumption for all buildings in NYC. The assessment of the model’s’ accuracy was made by comparing its results with the aggregate natural gas and electricity consumption, per year, on the zip code level. The median absolute percent error (median APE) for electricity was 10.75, meaning that half of the predicted zip codes were within 10.75% of the correct value. For the natural gas, the median APE was 30.
Kontokosta (2015) [6] refers to Sartor et al. (2000) [7] to separate the approaches for building energy performance metrics in four groups: simulation models, point systems, end-use metrics, and regression models. From those, the author argues that the regression approach might be the most appropriate to build an energy performance metric specific for New York City (NYC). The model proposed takes as reference the typical performance of similar buildings (as predicted by a regression model) and compares it to the actual energy use (Energy Use Intensity) of a given building. For Office buildings reporting their annual energy consumption data for the year 2010 in the standards of NYC LL84, the regression model was able to explain 33% of the variance in the Weather Normalized Energy Use Intensity. In the present work, a similar approach was used not only for Office buildings, but also for Multifamily Housing buildings.
Hsu (2015) [8] explored different regularization techniques to identify key variables and interactions to build a regression model to predict Site EUI of NYC Office and Multifamily Housing buildings. The model included data from LL84, PLUTO, CoStar, U.S. Census and American Community Survey (ACS). The author found that a hierarchical group-lasso regularization significantly outperformed ridge, lasso, elastic net and ordinary least squares approaches in terms of prediction accuracy. Besides that, the results showed that some of the most important variables for the Multifamily Housing model were related to type of energy (percentage of electricity), the age of the building, use of space and ethnicity in the building’s census tract. For Office buildings, the main effects were also from the type of energy source used (electricity, natural gas, steam, etc.), but an interesting finding was that information from the Census and ACS describing the composition of surrounding multifamily units was also included among the main predictors. For the present analysis, the features selection and engineering step to model EUI was also assisted by these findings, however, in order to emphasize individual building characteristics and not characteristics from the neighbourhood, census tract, or zip code level, which could compromise the fairness of the scoring model, only data from LL84 and PLUTO (which are specific to each building) were used in the model.
In another effort to identify the main features determining Energy Use Intensity in Multifamily Housing buildings in NYC, Ma and Cheng (2016) [9] used a Random Forest model to predict Average Site EUI at the level of Census Block Groups. The features analysed came from the PLUTO database, the 2013 ACS 5-year estimates from the U.S. Census Bureau, besides the LL84 dataset. These features were representative of seven different categories: Building, Economy, Education, Environment, Population and household, Surrounding and Transportation. The Mean Square Error of the Random Forest model (with optimized parameters) was 0.773, smaller than for other models tried by the authors such as Multiple Linear Regression (MSE = 0.997), Lasso (MSE = 0.872), and Support Vector Machines (MSE = 0.830). The most influential variables were found to be mostly related to the categories Building, Education and Economy. This work not only helped in defining important features to predict EUI but also in deciding to use a random forest model as a nonlinear approach to that task.

Other Cities

\label{other-cities}
Especially in the context of mitigating GHG emissions to fight climate change, in recent years many city governments have started to implement policies targeting energy efficiency in the built environment. Trencher et al. (2016) [10] examined ten programmes to advance energy efficiency and retrofitting of existing private sector buildings in C40 cities in Asia-Pacific (Melbourne, Sydney, Hong Kong, Singapore and Tokyo) and in the U.S. (Houston, New York, San Francisco, Philadelphia and Seattle). The study identified six distinguished policy models, four mandatory (Benchmarking, Periodical energy efficiency auditing or retro-commissioning, Energy efficiency standards, Cap-and-trade), and two voluntary (Capacity building, Friendly competition). Overall, the environmental impacts of such policies were found particularly slow to emerge and plagued with attribution challenges. The authors found limited evidence of benchmarking programme effectiveness in reducing energy consumption in the short-term, but some indication of mid-term outcomes. Driven by unique local circumstances, the cap-and-trade model stood out by fostering large, sustained and attributable GHG emission reductions and retrofitting. Finally, the authors highlight the complementary aspect of voluntary and mandatory programmes and potential for benchmarking programmes to later transition to models mandating performance improvements, such as cap-and-trade.
Figure 1 shows a map with 24 U.S cities that had some sort of energy benchmarking policy for buildings, as of February of 2017 [11]. It is worth noticing that depending on the city, not all types of buildings are included in the benchmarking policy and also there might or might not be something beyond the benchmarking in place, such as retro-commissioning or auditing. From the cities shown in Figure 1, some like Philadelphia, Seattle and Chicago, similarly to New York, have web visualization tools to make the data reported by each of the compliant buildings publicly available and easily obtainable.