Abstract
In this project, I want to create a multivariable regression model to predict the low-income housing density in each NYC census tract in a 7-year time span (2009-2015). Based on the result, I could tell how the neighborhood characteristics determine low-income housing density overall in NYC.
Data Collection and Data Cleaning
The data I used contains demographic and housing features (this can be found in ACS data on social explorer) and the number of low-income housing units in each census tract (this can be found on LHIT website). I also calculated racial diversity (based on 4 racial groups: White, Black, Hispanic, Asian) in each census tract by using entropy index formula.
The number of total housing unit variables from the American FactFinder data were merged with the number of affordable housing units from the LIHTC database. Through this merging, the percentage of low-income housing in each census tract was obtained. Initially, there were 15,227 observations over the seven-year time frame for census data. Through eliminating empty rows, 13,640 records remained. From this data, the entropy index was calculated while focusing on the four specific races—Asians, Blacks, Hispanics, and Whites. Following, the census data was merged with public housing density data and the average value of each feature was calculated over the seven-year time frame. Then, observations with missing values have been eliminated. This resulted in 1,140 observations. At this point, each row represents one unique census tract’s performance.