Data
The data used in this project contains two parts that are both obtained from NYC Open Data. One is a New York City map including each zipcode area and its population. Second one is the data from New York City Restaurant Inspection Results released by Department of Health and Mental Hygiene. This dataset contains every sustained or not yet adjudicated violation citation from every full or special program inspection conducted up to three years prior to the most recent inspection for restaurants and college cafeterias in an active status. It has a total of 385K rows and features including:
- Cuisine Description: the column contains 85 types of cuisines. Because the top 10 cuisines take over the 65% restaurants in New York City, the project will only use the top 10 types of cuisines.
- BORO: the column contains six entries including five boroughs and one missing. Since the missing only takes over a very little portion, the missing entry will be dropped.
- Violation Code: violation code is used to categorized the violation descriptions. The project will use the top 5 types to see the main reasons for restaurants to get a grade lower than A.
- Score and Grade: the grading system uses a score board to calculate the score of a restaurant, and converts it to the grade based on the rule: A: 0 -13 points , B: 14 - 27 points, C: 28+ points
- Inspection Date: the inspection date contains month and year of each inspection.
Methodology
1. Exploratory Analysis
To get an overall sense of restaurants in New York City and grading, the percentage of restaurants with different grade in each boroughs.