https://s3.amazonaws.com/nyc-tlc/misc/taxi_zones.zip
https://s3.amazonaws.com/nyc-tlc/misc/taxi+_zone_lookup.csv
https://data.cityofnewyork.us/Business/Zip-Code-Boundaries/i8iw-xf4u/data

Data Wrangling

Combine traditional taxi datasets in different months.
Combine Uber datasets in different months.
Take the 'pickup time' and 'pickup location' information from traditional taxi data and Uber data . Change the  time columns into day format and then combine the two datasets by time column.
Then replace all pickup location information in coordinate format(latitude and longitude) with taxi zone information using taxi zone shapefile.
Since a big portion of the Uber pick up location information are formatted as taxi zones, to find income per capita information for each taxi zone, I will first get all income per capita by zip code from IRS tax return file and then use the taxi zone shapefile to compare with zip code boundary shapefile,  the taxi zone income will be calculated proportionally with incomes of zipcode areas  it sits in.
Analysis:
Will use both statistics tool and geographic tools to answer the question we have: does Uber provide more trips to places outside Manhattan and to lower income areas.
First, will compare the traditional taxi pickup and Uber pickup in locations in&outside Manhattan. 
Then Will generate a  bar chart plot to show both groups' pickup amount in all the boroughs outside Manhattan for each month..
And Will use graphics to show percentage changes of both traditional taxi and Uber in all taxi zones for the time period. 
Also will use NULL hypothesis test analysis to test the following ideas. ( each group's inside/outside Manhattan pickup  percentage )
1.Uber provides more customers pickup outside Manhattan than traditional taxi does.
2.Uber picks up customers from zones with lower income than the zones traditional taxi picks up customers.
References: 
http://toddwschneider.com/posts/analyzing-1-1-billion-nyc-taxi-and-uber-trips-with-a-vengeance/#update-2016
Deliverable: