PUI2017 Extra Credit Project Proposal <Marium Sultan, MariumS, mas1300>
Problem Description:
I wonder if the median income of a neighborhood correlates with the accessibility of books, measured as the presence of bookstores and/or public libraries. Neighborhoods may be defined as PUMA region, census tract, or community district (not yet decided).
I want to test how strong the correlation is neighborhood income and amount of bookstores in that neighborhood, and how strong the correlation is neighborhood income and amount of public libraries in that neighborhood. Depending on how many there are of these two institutions overall I may do a binary rather than quantity designation of presence, for example 1 for that there is a library and 0 for that there is not.
I predict that neighborhoods with higher median incomes will be more likely to contain bookstores and libraries. I also wonder if there is a difference in which neighborhoods contain libraries versus which contain bookstores, and if one of these two institutions is more equitably distributed.
Data:
I will gather the income data from either the American Community Survey or Furman Center's CoreData portal. The locations of the public libraries are on NYC OpenData, but I envision the bookstore data will be harder to find. One can search for bookstores on Google Maps, and I may use some form of webscraping to collect the location data. I plan to remove unnecessary columns and do other general cleaning. Then I will group the information in the the datasets by neighborhood, and do a join. The grouping could be done by a function or API that returns neighborhood from coordinates.
Analysis:
In order to do my correlation analysis I will run a OLS regression on income vs bookstores, one on income vs public libraries, and one on income versus both grouped together. I will also use Moran's I and Ripley's K for spacial correlation. For these methods I may not need neighborhood boundaries but instead test income information of areas surrounding bookstores and libraries and compare that to areas further away from these institutions, and with the city average.
References:
What sparked this idea is my memory of reading an article saying that there are no bookstores in the Bronx. The Bronx is known overall as a lower income than the other boroughs so I wondered if that was the key factor in this lack (
1). Steven Melendez examined over 60 years of bookstore data and noted the decreases. There is also a point map of bookstores on the page about their disappearance, so I know there are enough sources to make a map of bookstores like I envisioned (
4). I wonder how Melendez collected this location data and there is way to replicate his technique.
Deliverable:
The deliverable will be a statistical conclusion, and three maps. They will all be choropleth maps of income. One will be overlaid with points mapping bookstores, one with points mapping public libraries, and one with points mapping both (in different colors).