Actually, both two area columns are pretty identical but DOF data is more reliable than self-reported data. Energy difference histogram shows there are huge outliners so this paper is following EPA recommendation, source EUI. Source Energy(kBtu), which is multiplying source EUI(kBtue/ft^2) and DOF area, also cleaned due to either zero value of DOF area or Source Energy(kBtu) column. The only dataset that energy meter is cover by the whole building is selected for analysis.  Multifamily Housing and Office are two most frequent type so the two types are considered its further analysis. All cleaned data set of multifamily housing is 7272 and office is 1049 observations. The total clean data has 9642 observation.