Abstract
The subject of the study is the hygiene quality of NYC restaurants. The question is that whether there is any pattern or trend existing in the hygiene grading of NYC restaurants. To answer this question, factors like the type of cuisines, the type of violation, inspection time are taken into consideration. After examining the dataset, I found that while among all five boroughs, Manhattan have the highest number of grade A restaurants, all five boroughs have almost identical percentage distribution of restaurants grading. Among top 10 popular cuisines, the percentage of Grade A restaurants in Latin, Caribbean, Mexican, Chinese and Japanese cuisines are below 80%, and Latin restaurants have the lowest percentage of Grade A and the highest percentage of Grade C. In order to understand the spatial relation among hygiene grading, I also use cluster analysis to investigate areas that have most cleanest restaurants. As a result, midtown and upper Manhattan, Staten Island, Greenpoint and Bushwick in Brooklyn, and Sunnyside in Queens have lowest violation scores.
Introduction
Most of people in New York City dine out frequently, which makes the hygiene quality of restaurants become extremely important. The Department of Health and Mental Hygiene has been working on a grading system that grades restaurants as A, B, or C to let citizens be aware of the hygiene quality of every restaurants. The grading system can influence restaurants' business, and also their customers' decision-making. Therefore, it is essential to understand if the grading system is graded fairly, and to investigate if there is any factor relating to it.
Data
The data used in this project contains two parts that are both obtained from NYC Open Data. One is a New York City map including each zipcode area and its population. Second one is the data from New York City Restaurant Inspection Results released by Department of Health and Mental Hygiene. This dataset contains every sustained or not yet adjudicated violation citation from every full or special program inspection conducted up to three years prior to the most recent inspection for restaurants and college cafeterias in an active status. It has a total of 385K rows and features including:
- Cuisine Description: the column contains 85 types of cuisines. Because the top 10 cuisines take over the 65% restaurants in New York City, the project will only use the top 10 types of cuisines.
- BORO: the column contains six entries including five boroughs and one missing. Since the missing only takes over a very little portion, the missing entry will be dropped.
- Violation Code: violation code is used to categorized the violation descriptions. The project will use the top 5 types to see the main reasons for restaurants to get a grade lower than A.
- Score and Grade: the grading system uses a score board to calculate the score of a restaurant, and converts it to the grade based on the rule: A: 0 -13 points , B: 14 - 27 points, C: 28+ points
- Inspection Date: the inspection date contains month and year of each inspection.
Methodology
1. Exploratory Analysis
To get an overall sense of restaurants in New York City and grading, the percentage of restaurants with different grade in each boroughs.