Income integration and Housing market stability
Social Impact Project // Methods and descriptive Analysis
by: Yuwei Lin, Sunglyoung Kim, Fangshu Lin and Dana Chermesh Reshef
Instructor: Prof. Constantine E. Kontokosta
Submission date: Nov 17th 2017
NYU CUSP, Center for Urban Science and Progress, New York
Overview
New York City is well known for its unaffordable housing market, its increasing gap between rich and poor and the unfortunate process of displacement that is related to rapidly gentrified areas (Freeman & Braconi, 2004). Economic segregation is strongly correlated to income inequality (Watson, 2009). Economic segregation has increased during the past three decades across the United States and in 27 of the nation’s 30 largest major metropolitan areas (Fry & Taylor, 2012; Pendall, & Carruthers, 2010).
This study aims to assess the correlation between income integration level and housing market stability. In order to do so, there were several approaches taken in the analysis process. First, we included descriptive statistics of the data, Then linear regression models were developed to analyze the relationships between income integration level and income, rent burden (%) and rent growth separately. All data were in use for the analysis is CENSUS data for years 1990, 2000, 2010.
What is housing stability? How we can define that. - Rent burden
Data Inventory
CENSUS Data
The CENSUS data were collected from the Geolytics Neighborhood Change Database (NCDB). This dataset includes the Census data for 1970, 1980, 1990 and 2000 at the census tract level. Data of population, income levels and rents were selected from the years 1990, 2000, 2010. The data of 1990 and 2000 have been recalculated and normalized according to the 2010 tract ID, in order to conduct comparisons of historic data by the exact same tract boundary definitions. Since 2010, the Decennial Census stopped using long form survey and only includes some basic demographic and housing tenure information. Therefore, income, house value and rent data for this time period were collected from the American Community Survey (ACS) of 2006-2010 instead. Data were cleaned and merged to overcome the different income levels for 1990 and 2000 census data.
The dataset includes both household and family numbers of different income levels. Since there is no household income distribution data published in 1990 census data, we chose to use family income distribution data for analysis. Income boundaries following the NCDB income distribution data (14 groups for 1990 and 16 groups for 2000/2010) were in use. As Galster (2008) mentioned in his work, the HUD income boundaries measured in AMI do not match the NCDB income groups. Therefore Gaster (2008) used interpolation method to divide the NCDB income distribution data into six income groups. To avoid controversy, we didn’t manipulate the NCDB income distribution data which could be done in future work.
Public Use Microdata Areas (PUMA) data
We choose PUMA as the main unit for our neighborhood level analysis, given the data availability. PUMAs are the Census statistical geographies, created by aggregating census tracts and designed to cover 100,000 residents per PUMA, revealing about 40 tracts. Aggregating the 2168 census tracts data into PUMAs was a straightforward task, with no boundaries errors when merging data due to the fact that both geographic units are used by US Census Bureau. Additionally, the 55 PUMAs in NYC are almost exactly equivalent to the city’s 59 Community districts; except for 8 districts in Manhattan and Bronx that are combined into 4 PUMAs due to their small population with each PUMA comprising two districts. Every Community district corresponds to a community board, the local representative body, and is in accordance with the groups of neighborhoods in NYC. Hence, it was reasonable to use PUMA data for neighborhoods’ level analysis.
Data were cleaned and arranged in a final dataset to be analyzed. The final dataset contains the following information of 55 PUMAs:
  1. Median Family Income (1990, 2000, 2010) // Annually, divided by income level default groups.
  2. Median Rent (1990, 2000, 2010) // Monthly
  3. Rent burden (%); 1990, 2000, 2010 // Calculated by dividing Median Rent (multiplied by 12, for annual median rent) by Median Income.
  4. Income integration level (Range of 0-1); 1990, 2000, 2010 // Calculated by Entropy Index as explained in the following methods section.
  5. Rent growth (%); 1990-2000, 2000-2010
  6. Income integration level change (%); 1990-2000, 2000-2010
Methods
Entropy index calculation
Entropy index is a commonly used method for segregation measurement (Galster, Booza & Cutsinger, 2008, Kontokosta, 2014). In order to obtain the income integration level of each PUMA in the two decades we used the Entropy Index formula, returning a score of 0-1 range, when 0 is fully segregated (only one group of income is represented) and 1 is fully integrated (all income groups represented evenly):