Abstract:
In this project I use the 201502 Citi Bike data to do a hypothesis test and state the null hypothesis which is the proportion of subscribers biking on weekends is the same or higher than the proportion of customers biking on weekends. After applying z-test, I reject my null hypothesis since p value is significantly smaller than significance level 0.05. Hence, the alternative hypothesis can be accepted that the proportion of subscribers biking on weekends is less than the proportion of customers biking on weekends. Also, I check the robustness of my answer by using 201507 Citi Bike data and to see if my result holds in the summer. The result is the z statistic is even larger in the summer.
Introduction:
Citi Bike is public bicycle sharing system serving New York City and its trip data is public and can be downloaded from Citi Bike website which includes Trip Duration, Start Time and Date, Station ID and Bike Id etc. Since there are two types of users, it is interesting to analyze whether subscribers are more likely than customers to choose biking for commuting on weekend.
Data:
I use the 201502 Citi Bike data and drop the irrelevant columns except user type and date. Then I count the number of each user type for each day of the week and using the histogram to show the distribution.