Abstract
This analysis investigates whether or not young CitiBike riders (defined as ranging from 18 to 35 years old) travel longer distances than their older counterparts (>35 years old). This is done by first pulling the data using CitiBikes API for a given month, and then cleaning up and prepping the data for statistical testing. Next, I chose a Welch's t test to compare the two groups. This test reveals that though the distance traveled by each group is significantly different, older riders in fact ride longer distances than young ones.
Introduction
CitiBike is a shared bike service in New York City, and has been in operation since 2013. CitiBike gives access to bike trip information through an API, which allows users to pull data on a month by month basis going back to its start. The schema includes start and stop times, start and stop locations, date, gender, user type (subscriber or not), and year born (bold items are fields used in the analysis). After looking at the data, I wondered if there was a significant difference in the distance traveled between old and young CitiBike riders.
Data
To answer this question, I first pulled the data for January of 2018. I then calculated distance traveled by applying the Haversine formula (straight-line distance) using the start and stop locations. This is one area that the analysis can be improved, since it is not the actual route taken by a bike. Next, I calculated 'Age' by subtracting 'year born' from 2018. I then separated the data into my two groups (young & old) using the newly create Age field. I then sorted by two data sets by distance traveled so that I could create histograms of each to infer what their underlying distribution was.