Figure (2). Results from Random Forest Regression with all features, the green dots is the ratio of  actual value of site EUI over predicted value, and the blue line is to show that the more the dots be closer to it, the prediction is more accurate.

5.1.2 Random Forest Feature Importance

We have used the code exposed in sklearn’s Random Forest implementations and got following result for each of our datasets:
 
Residential Data Set(feature importance)
Commercial Data Set(feature importance)
1. feature BuiltFAR (0.206668) 2. feature DOF Property Floor Area (ft²) (0.185848) 3. feature YearBuilt (0.174020) 4. feature LotArea (0.158609) 5. feature NumFloors (0.089812) 6. feature Electricity (0.086754) 7. feature LotType (0.036418) 8. feature ProxCode (0.028474) 9. feature Gas (0.016607) 10. feature Oil (0.014222) 11. feature Water (0.002569) 12. feature Diesel (0.000000)
1. feature YearBuilt (0.218060) 2. feature BuiltFAR (0.184002) 3. feature LotArea (0.168292) 4. feature DOF Property Floor Area (ft²) (0.153563) 5. feature NumFloors (0.088837) 6. feature LotType (0.051958) 7. feature Electricity (0.045110) 8. feature Gas (0.038505) 9. feature ProxCode (0.036114) 10. feature Oil (0.013307) 11. feature Water (0.002252) 12. feature Diesel (0.000000)
 
Figure (3). Results from Random Forest Importance for both residential and commercial datasets.
  

5.2 K-nearest Neighbors Regression

For residential dataset, we selected the n_neighbors parameters as 250, and the accuracy is 0.0227. For commercial dataset, we selected the n_neighbots parameters as 200, and the accuracy is 0.0042.