AbstractThis report is based on my PUI Assignment 2 on Homework3 about a Citibike analysis with the python tool. The goal is to explore the Citibike trip duration difference between one-time customers and subscribers in terms of the MTP (Mean Trip Duration). The idea is that to prove that the average trip duration of single time customers is more than that of the subscribers, and further concludes that a single time customer would make better maximize the utilization than a subscriber. A hypothesis is established below. As the samples are not equal, a further Two-sided T-test is implemented, the results support the hypothesis.Keywords: CitiBike Data, Data Wrangling, Null Hypothesis, Alternative Hypothesis, Statistical Significance Level, Two-sided t-testHypothesis Null HypothesisH0: T(customer) <= T(subscriber)The mean trip duration of single time customers over a week is less than or equal to the mean trip duration of the subscribers over a week Alternative HypothesisH1: T(customer) > T(subscriber)The mean trip duration of single time customers over a week is more than the mean trip duration of the subscribers over a week. Statistical Significance LevelSignificance level: α = 0.05A significance level alpha(α) is chosen here to reflect how significant the hypothesis testing will be at the end of the test. Data AnalysisThe data was collected from the CitiBike_Data_Website for the trip duration from both one-time customer and subscriber. Later, this data was used to clean, organize, select, analyze, plot and visualize. First, the Null and Alternative hypothesis were established with a statistical significance level at 0.05, and then the data was collected, tabulated, cleaned, and reshaped.The analysis is conducted by applying Pandas and DataFrames to the Python to get the mean trip duration for Customers(one-time user) and Subscribers respectively. The figures are plotted by using Matplotlib accordingly. Meanwhile, as t-test applies for testing the difference between the samples when the variances of two normal distributions are unknown, which fit in the situation, the distribution of data is subjected to a two-sided t-test.
AbstractThis study was designed to investigate the idea about Customers riders being more frequent users of citibikes on weekends proportionally to Subscribers riders. If proved so, this finding can be useful to Citibike operations team to make the user experience better and to provide better infrastructure to the riders due to a particular operation during weekends. The study was made investigating Usertype and Date data from March, 2015 and measured through z-test.
AbstractThis report aims to find whether mean trip duration of young people is longer than middle-aged people. Z-test is performed to determine whether this hypothesis is true. After performed the test, it's very likely that young people ride longer than middle-aged people.DataThe data source is citibikedata, which contains the information of people using citibikes. Columns of birth years and trip durations of the data are used. To classify young and middle-agedpeople, people whose age is between 21 to 40 are defined as young and whose age is between 40 and 59 are defined as elder.AnalysisNull hypothesis test was performed. The null hypothesis is: Total trip duration of people who are in the age of (20,40] is equal or shorter than those in the age of (40,60] in the year 2015 in NYC. (with 5% significance) The normalized total trip duration of 5 different age range is shown below. It shows each certain age range's total trip duration divided by all age ranges' total trip duration.
Abstract This project intends to examine if there is any difference between male bikers and female bikers in the day and night time. More specifically, my initial hypothesis is that men are more likely to bike than women in the night time due to safety concerns. Women are more sensitive to the potential safety risks when traveling at night than men do. Data I use the information of bikers in February, 2015 as the sample for my study. The time they started biking will be the determinant of the time they biked. I divided my sample into two groups, men and women based on the gender information. The day and night times are categorized as the followings: - Day time: from 7am - 7pm - Night time: from 7pm - 7 am Based on the available data, I calculated the normalized ratios of bikers in the day and night times for each gender for illustrative graphs and statistical analysis. Analysis Data Overview The graphs below show that there may be some differences between men and women in terms of the hours they are most likely to bike. In figure 1, the fractions of men riding bike after 6pm and before 8am are higher than those of women riding bike. In figure 2, which illustrates the fraction of each gender at day and night, the fraction of female riders is higher than that of male riders at day and lower at night.
PUI2016 Citibike Project Summary ABSTRACT: In this project we looked at whether on average older individuals (over 40 years old) used Citibikes for shorter trips than younger individuals(less than 40 years old). Using information on trip duration and rider age for the month of February 2015, we ran a Z-test test for the proportions grouped by trip duration, yielding at statistic of 26.09. In this case we will reject the null hypothesis and conclude that older individuals are more willing to take shorter trips. DATA: We used the zip file on the Citibike's website corresponding to the month of February 2015. The data can be downloaded here: https://s3.amazonaws.com/tripdata/201502-citibike-tripdata.zip The corresponding .csv file contained entries for the start and stop station location, trip duration, customer type, birth year and gender of each rider during the month. We extracted age by subtracting the birth year of subscribers from the then current year 2015, and dropping all entries except trip duration and age. We split the pandas dataframe into those over and under 40 to create 2 samples. Then we divided the trip duration into two categories as short trip(less than 10 mins) and long trip(more than 10 mins) (see Figure 1). At last we normalized the distribution(see Figure 2).
Do men take longer CitiBike trips than women?The goal of this project was to test whether men take longer CitiBike trips than women. The null hypothesis: The number of women taking longer trips on Citi Bike is the same or higher than the number of men taking Citi Bike trips.The significance level was set at 0.05.The null and alternative hypotheses could be represented with the following formulas:H0 = W(time of the trip) > = M(time of the trip)H1 = W(time of the trip) < M(time of the trip)
ABSTRACT New York City keeps records of Citi Bike services, including demographics of users and statistics on bike use. Here, we performed a statistical analysis to determine the relationship between biker age and trip duration, testing the alternative hypothesis that Citi Bike users under age 35 are more likely to bike for longer durations than the average user. Through a simple Z-test, we were able to reject our null hypothesis, concluding that trip duration of bikers under 35 is significantly greater than the average user. DATA For this project, our research question was: _Are Citi Bike users under 35 years of age significantly more likely bike for longer durations compared to the average user?_ For this analysis, we formed the following hypotheses: _Null Hypothesis:_ The mean trip duration of Citi Bike users under the age of 35 is the same or less than the mean trip duration of an average user, significance level = 0.05. _Alternative Hypothesis:_ The mean trip duration of Citi Bike users under the age of 35 is more than the mean trip duration of an average user, significance level = 0.05 To test these hypotheses, we chose Citi Bike data from December 2015. The information downloaded from the data facility contained more variables than needed to compare age and trip duration. Additionally, it was not organized in columns, which could led to errors, such as interpreting variable names as observations. As such, we first organized our data into columns, then dropped 13 of the 15 categories. We were left with “birth year” as our independent variable, and “trip duration” in seconds as our dependent variable. After plotting both variables, we identified several outliers of impossibly old users, i.e., those born before 1910. Plot 1 shows a scatter plot of the raw data, plotting birth year against trip duration. Histogram 1 shows the raw distribution of age across the data set. In Histogram 3, the distributions of trip duration for the entire data set (in blue) and for the group of those 35 and under (in green) are compared. ANALYSIS Our peer reviews suggested we perform a Z-test to compare the information of users under 35 and the total population. This test is possible because we know the population parameters (since dataset itself represents the entire population of Citi Bike users). Given the size of our sample, and the fact that we know the mean and standard deviation for both both groups, we chose to test our hypothesis with a Z-test. As such, we first had to calculate the mean and standard trip duration for the two groups. These values were plugged into the Z-test formula. RESULTS From our Z-test, we obtained a Z-statistic of 17.79. From the Z-Table, this gave an area of over 0.9998. Thus, our p-value is (1 - 0.9998), or 0.0002, meaning there is a 0.02% probability that the difference observed between the two groups is due to chance alone. Specifically, this p-value is much smaller than our alpha level of 0.05, meaning we can reject our null hypothesis, and can conclude that trip duration times of Citi Bike users are longer for those under age 35 compared the average user. LINK TO ORIGINAL NOTEBOOK https://github.com/jc7344/PUI2016_jc7344/blob/master/HW6_jc7344/HW6_Assignment2.ipynb
ABSTRACT In this analysis, we explore whether if there is a difference between the number of CitiBike rides during the rush hours of New York City and during non-rush hours. We define the rush hours of New York City to be the hours between 7 to 9 A.M. and 4 to 9 P.M during business days. We state our hypothesis and test it using a two-sided t-test. The test indicates that there is indeed a difference.
INTRODUCTION We want to interpolate SN lightcurves with gaussian processes. The interpolation will be used to construct templates, which are then used to construct bolometric lightcurves, filling in for missing data. The lightcurves have diverse sampling, diverse noise, generally a smooth behavior, with more variability at early times, rather than late times (after the ⁶⁴Ni decay starts dominating). Here is an example of a very good, very well sampled lightcurve, with photometry from 2 telescopes, which is not consistent within the tiny tiny, probably underestimated observational uncertainties:
When we discuss under-represented minorities (URDs) in the academic sciences, we often mention the importance of mentors and of providing role models to minorities. However, other than anecdotal evidence, there is no measure of whether having a minority role model actually facilitates the academic path. This work tries to answer the simple question: are minorities more likely to co-author papers within their minority circle?