- Eviction data, available from NYC Open Data here. The dataset lists pending, scheduled, and closed evictions from 2017 to the present. For this analysis, I only looked at 2017 evictions. To filter the data, I converted the date (Case Open Date) to a date time object with pandas and extracted the year. Then, I aggregated evictions at the zip code level to get number of evictions per zip code. Data includes both commercial and residential evictions; commercial evictions make up less than 10% of the data.
- Litigation data, available from NYC Open Data here. The dataset lists properties that HPD’s Housing Litigation Division has initiated action against to enforce compliance with the housing code. It also includes actions tenants have taken against the property owners. The dataset contains lawsuits beginning in 2000 but does not appear to have complete data for all years.
- Median household income in the last twelve months at the zip code level from the 2017 5-year American Community Survey data. Using the API and my Census API key, I downloaded all zip codes for New York state and then filtered by NYC zip codes (>= 10000 & <= 11500).
- Number of people who identified as white per zip code from the 2017 5-year American Community Survey data. I performed the same wrangling for these data as with median income. I retrieved the data through the Census API. I created a ‘non-white fraction’ by subtracting 1 from the total percentage of white to get a pseudo metric on diversity per zip code.
- Primary Land Use Tax Output (PLUTO) data, available from NYC Open Data here. I downloaded and then concatenating data for all five boroughs, dropping all columns except for zip code and total units. Then, I summed the number of units over all zip codes to get total units per zip code. The metric includes residential and commercial units.
- NYC zip code shape file, available here. Wrangling was minimal here, but it was necessary to convert zip codes to integers to merge with the other data.
Creation of Variables
I created four variables for my analysis:
- Eviction percentage: total evictions per zip code divided by total units per zip code, multiplied by 100.
- Litigations percentage: total litigations per zip code divided by total units per zip code, multiplied by 100.
- Income per person: income divided by total population per zip code.
- Non-white fraction: the total number of white households divided by total households, subtracted from one.
Missing Values
After first merging the eviction data with the zip code shape file, I merged the PLUTO data, income, litigations, and finally the race data. In the end, I was left with 183 zip codes out of 263 NYC zip codes. Some zip codes contained 0 values for income or units, or very low numbers. I dropped any zip codes with income less than $500, and units less than 100. Most significantly, areas like the Rockaways are missing from my datasets, and that section of New York does not appear on my maps.