Data - CitiBike data is offered open source providing information on user birth year, user type (subscriber to the system or one off user), trip start time and end time, start station ID and end station ID. We used data from two years 2015 and 2016 in the months of January and August. We used two years to smooth out the data to account for unobservables year to year. We used these two months to smooth out seasonal differences as well. However, these are two of the most extreme months. A further study would perhaps try to use samples from all months of the year. We did not do this simply because of computational limitations and time limitations.
Once downloading the data, we cleaned it up, removing extraneous variables to enhance the speed of computation and creating an age variable from birth year by subtracting the birth year from the year the data was created. By creating a histogram of the age of users in year, we can see already that the distribution is almost Poissonian, heavily favoring younger users, peaking around 30 and declining from thereon.