Introduction
Greenhouse gas(GHG) emission in NYC, roughly 70 percent of the gas is emitted from energy use in buildings. In order to prevent climate change, New York City is planning to reduce "buildings-based emissions by 30 percent by 2025 from a 2005 baseline and, to achieve this goal, the city demands to have a good benchmark model. Currently, NYC has a bigger goal, reducing 80 percent of buildings emission by 2050 from a 2005 baseline. \cite{efficiency}.  Analytical challenge of making the benchmark model is selected non-correlated variable but highly correlated to energy consumption. This model will be used for regulation building energy consumption and a quick fact-checking for energy saving by building-based.  
Literature Review
Residental level of end-use energy consumption variable requires physical characteristics of a building, historical data, weather data and etc... Most of the data is collected by a government but more detail and correlated data can be collected by survey form \cite{Swan_2009} .  Robust multiple linear regression is one of the popular methods to make energy consumption modeling such as energy star needs Groos Floor Area, a total number of residential living units, number of residential living units in a low-rise setting, a Mid-rise setting, or High-rise setting, and Number of bedrooms\cite{property}  \cite{Howard_2012}. An artificial neural network can be used for forecasting building energy consumption \cite{Neto_2008} .  This paper, using PCA and multivariable regression to make a benchmark model. 
Data and Methods
LL84 is a local law in NYC that collecting data set to apply to ENERGY STAR program \cite{ll84} .  Try to find a correlation between ENERGY STAR score and source energy use intensity.  PLUTO is geographic data at tax lot level and numbers of floor in building level is merged with LL84 dataset on BBL, brought block lot.  \cite{mappluto}.  The original dataset has 13223 Observations and there are two area variables, DOF reported area and self-reported area, and two energy variables, source EUI and SITE EUI. The area variable name itself explains the value but the energy variable is not. SITE EUI is total energy consumption, "which is the amount of heat and electricity consumed by a building as reflected in your utility bills," and source EUI is "total amount of raw fuel to operate the building"  \cite{energy} . EPA recommends using source EUI but the difference between area and energy variable should be checked. The fig.1 and fig.2 are showing the histogram of the difference.