4. Vehicle Classification Counts
This is a similar data set than that of the traffic volume counts; although significantly smaller in records, it contains vehicle type classification for each traffic count. The data, for October 2012, will be used to determine the ratio between the different vehicle types which will serve as input for a better prediction outcome by the model.
Methods:
Before vehicle emissions could be predicted, the amount of traffic in the city has to be estimated. The traffic would be estimated by comparing the number of taxis on a road segment to the actual traffic count of vehicles on that road segment.\cite{m2016} To do this the volume counts data was first merged with the LION base map in order to determine where the traffic counts were being taken. The segment ID field in the volume count data had many values that had less than 7 digits. Our assumption was that the volume count data removed the trailing zeroes. We added trailing zeroes to the segment ID and then looked at several IDs to see if they were also contained in the base map. Looking at the roadway name is the traffic counts and the street name in the base map we saw that they were the same and thus concluded that the new segment IDs were correct. After creating the segment IDs we observed a significant problem with our traffic count data set. Much of the data recorded by the DOT was missing. Only several months actually had any data about the traffic counts for the year 2012. In addition to that, the month of October had 4422 rows. Nearly 75% of the data belonged to the month of October. This means in order to make sure our dataset was sufficiently large we could only work with this month. The expected records we would have for a month would be the number of detectors * the number of days in the month. There were 364 unique segment IDs in the month of October and so we would expect to see 11,284 records. Twice that if we considered northbound and southbound traffic counts individually. Thus even within the month of October we still had much missing data. To overcome this issue we looked for a time period within October that had a large and consistent amount of data. The amount of recorded data is given in figure 2.