Data
Data used in this project include:
- Eviction data, available from NYC Open Data here. The dataset lists pending, scheduled, and closed evictions from 2017 to the present. For this analysis, I only looked at 2017 evictions. To filter the data, I converted the date (Case Open Date) to a date time object with pandas and extracted the year. Then, I aggregated evictions at the zip code level to get number of evictions per zip code. Data includes both commercial and residential evictions; commercial evictions make up less than 10% of the data.
- Litigation data, available from NYC Open Data here. The dataset lists properties that HPD’s Housing Litigation Division has initiated action against to enforce compliance with the housing code. It also includes actions tenants have taken against the property owners. The dataset contains lawsuits beginning in 2000 but does not appear to have complete data for all years.
- Median household income in the last twelve months at the zip code level from the 2017 5-year American Community Survey data. Using the API and my Census API key, I downloaded all zip codes for New York state and then filtered by NYC zip codes (>= 10000 & <= 11500).
- Number of people who identified as white per zip code from the 2017 5-year American Community Survey data. I performed the same wrangling for these data as with median income. I retrieved the data through the Census API. I created a ‘non-white fraction’ by subtracting 1 from the total percentage of white to get a pseudo metric on diversity per zip code.
- Primary Land Use Tax Output (PLUTO) data, available from NYC Open Data here. I downloaded and then concatenating data for all five boroughs, dropping all columns except for zip code and total units. Then, I summed the number of units over all zip codes to get total units per zip code. The metric includes residential and commercial units.
- NYC zip code shape file, available here. Wrangling was minimal here, but it was necessary to convert zip codes to integers to merge with the other data.
I created four variables for my analysis:
- Eviction percentage: total evictions per zip code divided by total units per zip code, multiplied by 100.
- Litigations percentage: total litigations per zip code divided by total units per zip code, multiplied by 100.
- Income per person: income divided by total population per zip code.
- Non-white fraction: the total number of white households divided by total households, subtracted from one.