PUI2016 Extra Credit Project Proposal
<Ci He, github username: hcpenguin, NYU ID: ch3183>
Problem Description:
In recent years, these is a great business increase of app based For-Hire-Vehicle such as Uber and Lyft. Due to their special business characteristic, those cars can reach out to much more neighborhoods outside Manhattan, and the no-cash transaction reduces their possibility of being targeted by crimes. So they are more willing to go to pick up customers from some low income neighborhoods, some of those drivers might from those neighborhoods as well.
So does Uber really provide more transportation accessibility to all the other four boroughs where some people relatively far from subway or has less traditional cab availability? Does Uber pick up more customers in low income areas?
Raw Data:
- Traditional Taxi Trip Info
NYC Taxi and Limousine Commission provides trip data of all traditional yellow cab/green cab trip information.
*This data source contains Uber trip data in 2014 (April - September), separated by month, with detailed location information and 2015 (January - June), with less fine-grained location information. I am still working on finding more data on Uber trips for more recent years. They suppose to provide open data for trips date and pick up zones.
- New York City Area income Info
SocialExplorer provides US census and ACS tables, and they have more convenient way to identify table codes and filter information for downloading.
- Taxi zone boundary shapefile and zone names
- NYC zipcode boundary shapefile
Data Wrangling
Combine traditional taxi datasets in different months.
Combine Uber datasets in different months.
Take the 'pickup time' and 'pickup location' information from traditional taxi data and Uber data . Change the time columns into day format and then combine the two datasets by time column.
Then replace all pickup location information in coordinate format(latitude and longitude) with taxi zone information using taxi zone shapefile.
Since a big portion of the Uber pick up location information are formatted as taxi zones, to find income per capita information for each taxi zone, I will first get all income per capita by zip code from IRS tax return file and then use the taxi zone shapefile to compare with zip code boundary shapefile, the taxi zone income will be calculated proportionally with incomes of zipcode areas it sits in.
Analysis:
Will use both statistics result and graphics to answer the question we have: does Uber provide more trips to places outside Manhattan and to lower income areas. Also we can use ArcGIS to do the local Moran's I(Uber activities) and see changes from year to year.
First, will compare the traditional taxi pickup and Uber pickup in locations in&outside Manhattan.
Then Will generate a bar chart plot to show both groups' pickup amount in all the boroughs outside Manhattan for each month..
And Will put information of percentage changes of both traditional taxi and Uber in all taxi zones for the time period on the New York taxi zones map .
Also will use NULL hypothesis test analysis to test the following ideas. ( each group's inside/outside Manhattan pickup percentage )
1.Uber provides more customers pickup outside Manhattan than traditional taxi does.
Since we are trying to compare two proportions with large sample size, so we can use chi square test.
2.Uber picks up customers from zones with lower income than the zones traditional taxi picks up customers.
z-test.
References:
Deliverable:
- The project will deliver statistic results of the two hypothesis on pickup comparisons outside Manhattan.
- Provide graphical tool to show the general pickup trend by Taxi or Uber.
- Provide graphical tool to show the each taxi zone's pickup percentage change with Taxi or Uber.