Abstract
The visual appearance of urban streets has a strong connection to citizens’ behavior. People’s feeling about the outdoor environment is hard to measure and predict. Moreover, the feeling is not equal to fact. In this paper, I used linear regression method to analysis whether the perceived safety score from citizens reflects the real crime amount. In this case, I failed to find any statistically significant relationship between the perceived safety score and violence related crimes.
Introduction
MIT Media Lab published a dataset of perceived safety score on urban streets, what am I wondering is, are people able to sense the potential crime in the location which they regarded as safe? In other words, I am curious about the relationship between the distribution of violence related crimes and people’s safety evaluation. I collected the perceived safety score of urban streets, violence related crimes statistics in 2011 and New York neighborhood shapefile. Used Jenks natural break classification method to classify the perceived safety score and calculation the proportion of each class’s location in the neighborhood. The factual “safety” is the amount of violence related crimes per square feet in the neighborhood. Considering the small size of the dataset, the machine learning algorithm is inappropriate in this case. I used linear regression method the measure the relationship.
Data
I collected Public Use Microdata Areas (PUMA) from NYC Open Data, New York street perceived safety score data from MIT Media Lab website, and NYPD Complaint Data Historic data from NYC Open Data. Researchers in MIT Media Lab collected the original dataset of how people think about the safety of a street view image through an online survey and then used image features to train an algorithm to predict the perceived safety. In their New York dataset, there are 290673 points marked with a score. The higher score means, the more people think it is safe and vice verse (Naik, Philipoom et al. 2014). The NYPD Complaint Data Historic dataset records the crime related to violence from 2006 to 2017, including the time and location.
The first step of data wrangling is converting the DataFrame of perceived safety score and the NYPD Complaint Data to GeoDataFrame. As the perceived safety score is collected in 2011, thus, I only used felony data in 2011.
The perceived safety score range from -4 to 44, considering the low score points still only have a small proportion in the whole dataset but might play the leading role, I used the Jenks Natural Break to classify the score. Jenks natural breaks classification is a classification method designed to determine the best arrangement of values into different classes. This method seeks to decrease the variance within classes and increase the variance between classes (Jenks 1967). I grouped the original data into five groups which can be interpreted as most safe location, relative safe location, relative dangerous location, most dangerous location.