Abstract:

The Citibike  users have two user types,  including subscribers and customers. Subscribers have long-term(several months or even a year) membership, using citibike very often and paying renting fee monthly while customers make one-off consumption,  relatively  seldom using Citibike and paying for ride every time they use the bike.  Based on the above features of the two user types, I want to figure out whether there is  difference between the two types in term of the using date. 
This project specifically sets the null hypothesis that the ratio of subscribers biking on weekends over subscribers biking on the whole week is the same or higher than the ratio of customers biking over weekends to customers biking on the whole week.  I use the proportion z-test to test the proportions of the two samples, getting z-score= -252.09274 and p-value=0(significance level=0.05). Since p-value is 0, the alternative hypothesis could be accepted that  the ratio of subscribers biking on weekends over subscribers biking on the whole week is lower than the ratio of customers biking over weekends to customers biking on the whole week.   

Introduction:

Citibike is a privately owned public bicycle sharing system serving New York City and Jersey City, New Jersey.  As of March 2016, the total number of annual subscribers is 163,865. Citibike riders took an average of 38,491 rides per day in 2016. The system reached a total of 50 million rides in October 2017(Wikipedia).  The user types of Citibike are subscribers and customers. I assume that subscribers relatively more rely on Citibike as their transportation tools while customers may usually take advantages of other tools and use Citibike for some special preference. For lots of people, they drive cars or take subways to get to the workplace on weekdays and go for a ride on weekends so that they do not need to be subscriders. So it is interesting to find whether the ratio of  subscribers biking on weekends over subscribers biking on the whole week is lower.

Data:

I use the dataset "201705 Citibike" available from the CUSP data facility(DF). The dataset contains the information about all rides records in New York City on May,2017 and I process the data by jupyter ipython notebook.
Step 1 : Access to the data source and read it and get all the columns of the dataset.