Figure (9). Results from clustering of building’s characteristics, and the mean of those features for the whole dataset.

6. Discussion

6.1 Conclusions

In conclusion, due to the limitation of datasets, the regression model with the best performance is Random Forest regression. Among the regression models, residential buildings outperform the commercial buildings in terms of the regression results. The low R-squared for commercial building may be attributed to the differences lying in occupant density, operating hours, percentage of air-conditioned space or percentage of data center. Features ranked by SVM and Random Forest models differ from each other. The SVM’s features are the building location, whether it is an inside or corner lot, the physical characteristics if the building is detached or semi-detached and the dominant energy type in the buildings. Random Forest, on the other hand, chose the building age, the areas of the tax lot and the building as important features.  Over the four years from 2012 - 2015, there are two yearly trends that we can detect from our sample. Both trends are downward with one drop more significantly. In addition, based on the annual energy consumption and property area, we can group buildings into 6 clusters based on their energy performance.

6.2 Limitations and Future Work

Additional data either on the physical design, occupancy and energy system management will be helpful to differentiate the buildings with different energy performances further and shed a light on the major attributes to energy efficiency. Since we study building energy at the city-wide scale, our models do not capture variation caused by occupants and more detailed building construction characteristics. The occupant behavior also poses another challenge because of it qualitative nature and the requirement to monitor and observe human behaviors over a certain period and the involvement in primary research. Those are the major limitations of our study that we would like to tackle and explore further when we are able to collect more data in the future.  
 

7. Reference

  1. U.S. Energy Information Administration (EIA).  Retrieved May 2, 2017, from https://www.eia.gov/tools/faqs/faq.php?id=86&t=1
  2. Department of Citywide Administrative Services. (2016). Benchmachmarking Local Law 84.
  3. Department of City Planning (DCP). (2016). PLUTO DATA DICTIONARY. http://doi.org/10.1002/ejoc.201200111
  4. Kontokosta, C. E. (2015). A Market-Specific Methodology for a Commercial Building Energy Performance Index. The Journal of Real Estate Finance and Economics, 51(2), 288–316.
  5. sklearn.feature_selection.RFE — scikit-learn 0.18.1 documentation. (n.d.). Retrieved May 2, 2017, from http://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html
  6. CS109 Project - Building Energy Consumption Prediction. (n.d.). Retrieved May 2, 2017, from http://cs109-energy.github.io/
  7. Selecting good features – Part III: random forests | Diving into data. (n.d.). Retrieved May 2, 2017, from http://blog.datadive.net/selecting-good-features-part-iii-random-forests/
  8. Tin Kam Ho. (1995). Random decision forests. Proceedings of 3rd International Conference on Document Analysis and Recognition, 1, 278–282. http://doi.org/10.1109/ICDAR.1995.598994
 
 
 
Contribution of Each Team Member to the Project
Pooneh Famili
Hongting Chen
Trang Tran Linh Dam
Xinran Yu
Random Forest Regression
Random Forest Feature Importance
K-means Clustering
Data Preprocessing
K-nearest Neighborhood
Data Preprocessing
K-means Clustering
Data Preprocessing
SVM Regression
SVM Feature Selection