Data was also filtered for trip distances that were greater than 2 miles. The data for the long taxi trip distances was also filtered to the adequate weather datasets.
Methodology
In this analysis, multiple linear regressions and multivariate regressions were performed on all of the variables for the two considered months. An alpha of 0.05 was set to define our significant threshold. Other methods that should have been considered for this analysis are logistic regression, and clustering.
The first set of linear regressions was run between adverse weather for the two months for short trips only. The first regression was between August temperature and August precipitation, with temperature being the dependent variable and precipitation being the regressor. The R-squared value for this linear regression was 0.305. The next set of linear regression was run between the adverse weather days for January temperature and January precipitation. In this linear regression, temperature was the dependent variable, and precipitation was the regressor. The R-squared value was 0.203, this was slightly less than the R-squared value for the month of August.
The next set of linear regressions were run between long trips vs. short trips on the adverse weather days for the two months. Running the linear regression between short distances (less than 2 miles) and long distances (greater than 2 miles) for both January and August, we get an R-squared value of 0. There is no correlation between the two variables for either of the months being studied.
Linear regressions were also performed for both long and short distances seeing if either temperature or precipitation had any effect on who takes taxis. For the month of August, running the regression between long distance and temperature, an R-square value is 0.001, in January the R-square was 0. Regression for short distance and temperature in both August and January, the R-square is 0.
Multivariate regression for both months was performed with the dependent variable being short distance and the regressors being temperature and precipitation. In January a R-square value of 0 was found, and in August a R-square value of 0.002 was found.
Conclusion:
For this analysis, models showed no indication that there is any significance or correlation between short taxi distances and temperature and precipitation during adverse weather days. Starting this project, the hypothesis assumed that there would be some relationship on people taking taxi’s on adverse weather days, but there ended up not being any.
Future work:
For future studies, we can consider more variables in the datasets to improve our analysis. One weather variable to consider adding to the adverse weather condition would be wind speed. For the taxi data, it would be helpful to explore in the future if maybe there is an increase in passenger ridership during adverse weather conditions, or if people pay more money to take shorter trips. For short distances, more people in Manhattan were prone to take taxis so it would be helpful to study only Manhattan. It also might be useful to look at Uber and Lyft data to see if more people are prone to take an Uber or Lyft during adverse weather conditions. It would also be good to consider income data, however due to the fact that people don’t stay in their homes during the day due to work, school, other activities it might not be best to use that data.
Links:
References:
Kamga, C., Yazici, M., Singhai, A. 2013. Hailing in the Rain: Temporal and Weather-Related Variations in Taxi Ridership and Taxi Demand- Supply Equilibrium. ResearchGate
https://www.researchgate.net/publication/255982467