Abstract
The idea for this Citi Bike mini project is to test if customers are less likely to ride Citi Bike comparing to subscribers during weekdays in March 2015. The null hypothesis I proposed is that the portion of customers riding Citi Bike on weekdays is the same or higher than the portion of subscribers riding Citi Bike on weekdays in March 2015. The significance level that I use for this mini project is 0.05. I've adopted z-test to test my null hypothesis and get an extremely small p-value so the result is that I reject the null hypothesis and state that customers are less likely to ride Citi Bike than subscribers during weekdays in March 2015.
Introduction
Citi Bike is a public bike sharing system operated by Motivate and named after its lead sponsor Citigroup. There are two types of user type - customers and subscribers. Subscribers are those who have bought an annual membership and can ride unlimited 45-minutes rides throughout the year. Customers are those who pay every time they ride. Since there have different kinds of user type for Citi Bike riders, whether or not there a significant difference in the number of rides for different user type during weekdays and weekends would be an interesting point to look at.
Data
The dataset that I've used for the statistical test is from
https://s3.amazonaws.com/tripdata. More specifically, I look into the data in March 2015. I've grouped the number of rides by user types and days of a week to get the rides proportion of customers and subscribers during weekdays and weekends. I've also calculated the errors for the counts. In order to visualize the data better, I create a bar plot to see the normalized distribution of bikers as in Fig. 1 and indicates the fraction of bikers for each user type as in Fig. 2.