The next dataset to process was the yellow taxi dataset in the same time period as traffic data. There were several assumptions we had to make about the data. There was no unique identifier so we had to assume each taxi was unique. A  unique id was given to each taxi corresponding to its position in the dataset. There were also no route attributes given so another assumption was that a trip would take the optimal route to its destination. The Google Directions API was used to determine the optimal route. Unfortunately the earliest data we could use is from October 2012 and so roads could have changed since then. The Google API also has a limit of 2500 requests per day. In order to stay under the limit we used a random sample of 1000 yellow taxi trips. We also made sure that all trips selected were located in Manhattan. The API gave us the routes in order to reach a trip's destination. Each step in the route was a road that was traveled. These steps we treated as line segments. We then took the line segments and created a new dataset that made each row one that contained the unique taxi id and the line segment for a trip. This dataset is then combined with the traffic count data in ArcGIS. This gives the number of taxis that crossed a particular road segment, the time of the trip, and the ground truth data. We could then use the taxis to predict the number of vehicles on a road segment.