Data
Trip histories of Citi Bike riders were obtained through
the NYC Citi Bike System Data portal.
Data from March 2015 was used having
the appropriate size for this exploratory analysis. The task of
processing the data required identifying relevant variables, filtering
for the appropriate records and then calculating the correct gender and
age groups. Python was used to run the analysis trimming the dataset to
just the following fields: tripduration, usertype, birth year and
gender. The data set was then filtered for the “Subscriber” user type
to remove data from one-time users identified as “Customer”. This was
a remedial step to ensure that the analysis focus on more frequent
users. In order to calculate the ratio of riders by age and gender,
male and female groupings were each further grouped by birth year. The
age of 45 was selected for this analysis which placed the birth year
cutoff at 1971. Those born after 1971 were counted and labeled as above
45 for both genders. Below are the Python scripts:
Remove fields not required.
df.drop([’starttime’, ’stoptime’, ’start station id’, ’start station
name’, ’start station latitude’, ’start station longitude’, ’end station
id’, ’end station name’, ’end station latitude’, ’end station
longitude’, ’bikeid’], axis=1, inplace=True)
Filter data set to remove one-time users.
df1 = df[df.usertype != ’Customer’]
Identify and count number of male riders above 45
df_m_above45 = (df1[’birth year’][df1[’gender’] ==
1]).groupby(df1[’birth year’] < 1971.0).count()
Identify and count number of female riders above 45
df_w_above45 = (df1[’birth year’][df1[’gender’] ==
2]).groupby(df1[’birth year’] < 1971.0).count()
The null hypothesis was set as “the ratio of man above age 45 to man
aged 45 or below riding a bike is the same or greater than the ratio of
woman above age 45 to woman age 45 or below riding a bike.” The
alternative hypothesis is that “the ratio of man above age 45 to man
aged 45 or below riding a bike is the smaller than the ratio of woman
above age 45 to man aged 45 or below riding a bike”. Furthermore, the
significance level was set at alpha=0.05.