Previous Approaches
In this part of the methodology, the distinct methods that they were
studied but they were not robust enough and lead us to the final
procedure.
Selecting Peer Groups
\label{selecting-peer-groups}
The identification of peer groups is executed by comparing a subset of
features that have an impact on the energy consumption and are easily
understandable by the building owners as characteristics that
differentiate their buildings from others within the same building type
(e.g. Office or Multifamily Housing).
First, we identified variables that can potentially have a significant
correlation with Weather Normalized Source EUI. At this step scatter
plots and correlation coefficients were analyzed to identify the best
potential candidates. The buildings were then clustered based on those
features through K-means Gaussian Mixture methods and the silhouette
score was used to select the number of clusters between 3, 4, or 5. The
reasons for limiting that range was that too many clusters would make it
difficult to describe the characteristics of each cluster and in a
simple and easily understandable manner to the building owners. Besides
that, having too many customized and specific groups of buildings would
not provide as much incentives for improvements in energy efficiency
from the policy standpoint.
The variables selected for the clustering within the Office category
were the following:
-
Age: categorical variable to account for the period when the building
was built;
-
Gross floor area (ft2): continuous variable to
account for size;
-
Computer density (number of computers per 1000 ft2)
: continuous variable to account for occupancy and how intensely the
buildings are used;
-
Building value ($ per built ft2): continuous
variable to account for the technological and luxury level of each
building.
-
Percentage of energy consumption that comes from electricity
(%electricity): continuous variable to account for the main energy
source used by each building.
It is important to notice that none of the variables are derived from
the energy use intensity, they are all assumed to be predictors of EUI.
Those 5 variables were then used to identify peer groups through K-means
and Gaussian Mixture clustering algorithm. The main difference of these
two methods is the way that the distance is taken into account. While k
means calculate the cluster considering Euclidean distance, Gaussian
mixture consider the weighted distance taking into account the variance.
During the tests, the K-means alternative seemed to be more stable in
terms of the size and homogeneity across clusters, which made it more
suitable to the purposes of this project.
For the Multifamily Housing category, the same variables were used for
the clustering. Except that the computer density was replaced by the
units density (number of residential units per 1,000
ft2) as an indicator of occupancy and the
%electricity was used as a categorical separated in 5 bins.