Intuitively, we can assume that young people have more stamina and are more active than older people. In this experiment, we examine whether this assumption is true, using Citibike data, comparing the average trip distance of young and old riders.
Carrying out t-test, we found out the difference is significant. However, we also found out that the older group has longer average trip distance than that of younger, which is not intuitive. This implies that older group has significantly longer average trip distance than that of younger.
As this is not intuitive, we tried the same approach using different dataset. As a result, we did not see the significant difference between two sample means. I didn’t change the initial Alternative hypothesis (the younger, the more active) to be hypothesis-driven.
Introduction -
Generally speaking, young people are more active than the old. However, the older are becoming more and more active than ever because of rising health consciousness and high-quality healthcare system. In order to give useful implication for these discussions, we will examine whether young Citibike riders have longer average trip distance than that of older riders.
Citibike is docked-sharing bike dotted around NYC and the usage data is open to the public. Based on the calculated ages of the riders, we set
Null hypothesis – H0:
Older riders (age 31~) have same or longer average trip distance than that of younger riders (age 0 ~ 30).
Alternative hypothesis – H1:
Younger riders have longer average trip distance than that of older riders.
Significant level : 0.05
Data –
Citibike data includes information of users’ birth year, start and end station, date, time, etc. We use Jan 2015 data. Then we split the data into two groups, e.g. Young: age same or under 30, Old: age over 30. We dropped the data which doesn’t contain age information. Trip distance is calculated by using Pythagorean theorem as we know only start and end station latitude and longitude instead of distance.
As the result of calculation, the average trip distance for Over30 is 0.01488 and Under30 is 0.01439.