Time Series Analysis for the Different Citibike Usage Pattern Between Genders in New York
<Hongkai He, hh1827>
Abstract 
this project aims to examine the Citibike ridership of different genders during different time periods (daytime vs night). The New York Citibike data set of March, 2017 is selected and processed to filter redundant information, and the Statistics Test, Z test, is used to verify whether to accept the null hypothesis that men are less likely to ride Citibikes than women during the night. 
 Introduction 
Citibike is the largest bike sharing system in the U.S. which is located in New York City and funded by Citibank \cite{nyc}. A growing fleet of specially designed bikes are available anytime  for residents in Manhattan, Brooklyn, Queens and Jersey City. People can unlock a bike from a docking station and return it to any stations at the end of the trip. Citibike has become a very popular choice for Commuting, city sightseeing, and any other short-to-middle distance trips in New York. 
 Safety issue is the primary concern for any forms of transportation. Some of the natures of Citibike, such as availability at night, slow speed, and easy blending into urban fabrics, etc. make the riders of Citibike more susceptible to street crime than other forms of transportation. In addition, it is widely accepted that street crimes pose large threats to women than men. This project focuses on exploring whether this conventional idea influences the Citibike usage between different genders, especially at night when people have higher chance to encounter crime. 
Data
The data set used for this project is the ridership records of Citibike all over New York City. They are provided on the CUSP Data Facility Platform, and can also be obtained from the following website: https://s3.amazonaws.com/tripdata/. The data set is available as monthly subsets starting from July, 2013 to July, 2017. the Monthly data of March, 2017 is chosen for the project because the bike fleet and the number of membership of Citibke have been growing in the last several years so it is reasonable to deduce that the latest year has the largest data set and can be more representative. However, there are much redundant information that are irrelevant to our project, such as station names, station locations and user types, etc. Hence irrelevant columns are dropped and only gender and trip time are retained.