Yellow Cabs and Weather  <Emily Padvorac, ep2247, ep2247>
Abstract:
            This project examined the relationship between yellow taxi cabs and weather data.  The purpose is to see if shorter taxi trips were taken when adverse weather conditions happen.  Previous work has examined the relationship between taxi ridership and weather conditions, mostly with the conclusion that during adverse weather, taxi drivers make more money. This research will further investigate the relationship between taxi ridership and weather to see if any further results can be determined.
Introduction:
            This research seeks to explore the relationship between taxi ridership and weather in New York City.  The goal is to understand if during events of adverse weather, more people are prone to take taxi trips for a short distance. This analysis could prove helpful to taxi companies and city planners as it can shed light on when a higher volume of taxi traffic and ridership would be expected.   
            Previous work has sought to look at relationships between taxi ridership and weather events.  “During adverse weather, taxi drivers tend to make more money” (Kamga et al., 2013).    Kamga et al. also found that during days when adverse weather took place, there was a higher demand for taxis in Manhattan compared to the other boroughs, and that more people were prone to take shorter taxi trips.
            To do this, data will be used from the hottest and coldest month on average during the year of 2016.  The hypothesis is that people are more prone to take shorter taxi trips on adverse weather days, compared to days when there is no adverse weather taking place.  In this research, adverse weather is defined as days were the temperature was below 35 degrees, or above 90 degrees, or days when rainfall was present. 
 Data:
Weather Data
            The months used in this study were determined by New York City’s climate. Climate data was viewed from the National Weather Service, New York City office online to obtain the hottest and coldest monthly average temperatures, as measured by the Central Park weather station for the year of 2016.
            National Climatic Data Center archives climate and historical weather data. These data are archived from FAA operated weather stations located in various areas around the country. KNYC (the Central Park Station) was selected for this study the since the only other two stations available were in outlying New York City Boroughs (LaGuardia and JFK). Since central park was located in the city, it was determined to be the most representative of the weather conditions in the immediate New York City area. The weather station from which the data was fetched had data available in hourly, daily and monthly time scales. The daily observations were chosen since hourly data was too precise for the scope of this study and the taxi data was formatted into a daily time scale. For this research, daily summary reports were obtained for the months of study.  In this dataset, the daily maximum dry bulb temperature (daily high temperature), daily precipitation amount, and daily snowfall amount were the variables looked at.
Taxi Data
            New York City Taxi & Limousine Commission(TLC) issues annual taxi ridership data.  Taxi data was available for every month for multiple years, but 2016 was the most recently available data and was chosen for this reason.  For this analysis, yellow taxi cabs were considered.  Data was obtained for January 2016, and August 2016, the coldest and hottest months of the year respectively.  This data included a variety of variables, pick up and drop off times, passenger count, etc.  For the purpose of this research, trip distance was the main variable looked at from this dataset, since this directly relates to the stated hypothesis. The data was filtered into “short” and “long” distances, with 2 miles being the delimiter between the two.
Taxi Zone Shapefile
            New York City TLC also issues a shapefile that contains all of the taxi zones in all five boroughs.  This shapefile was used to plot the pickup locations for August 2016.
Data weaknesses
            Weaknesses and limitations were found in the taxi data.  The January 2016 data, did not have any information for the pickup location ID, and the drop off location ID’s.  Due to this, the research does not allow for the use of geopandas in order to plot the pickup locations for taxi data for the month of January.  The August 2016 data, did not have any latitude or longitude information available on both the pickup and drop off locations.
Data Wrangling
            Multiple merges were done in order to get the data in its proper form.  Both taxi datasets, were filtered for trip distances less then 2 miles.  The first merge came from merging the August taxi data with the taxi zone shapefile on the pickup locations, and location id.  After this geopandas was used to plot the trip distances, as seen in figure 1.  For the month of August, most of the trips that were less then 1 mile took place in Manhattan.  Most of the trips that were between 1 and 2 miles took place in Queens.  Most of the pickup locations were located in Manhattan, as previous research discussed.  Geopandas was also used to plot the trip distances in January, as seen in figure 2.  Like August, most of the trip distances less then 1 mile took place in August.