\documentclass[10pt]{article}
\usepackage{fullpage}
\usepackage{setspace}
\usepackage{parskip}
\usepackage{titlesec}
\usepackage[section]{placeins}
\usepackage{xcolor}
\usepackage{breakcites}
\usepackage{lineno}
\usepackage{hyphenat}
\PassOptionsToPackage{hyphens}{url}
\usepackage[colorlinks = true,
linkcolor = blue,
urlcolor = blue,
citecolor = blue,
anchorcolor = blue]{hyperref}
\usepackage{etoolbox}
\makeatletter
\patchcmd\@combinedblfloats{\box\@outputbox}{\unvbox\@outputbox}{}{%
\errmessage{\noexpand\@combinedblfloats could not be patched}%
}%
\makeatother
\usepackage[round]{natbib}
\let\cite\citep
\renewenvironment{abstract}
{{\bfseries\noindent{\abstractname}\par\nobreak}\footnotesize}
{\bigskip}
\titlespacing{\section}{0pt}{*3}{*1}
\titlespacing{\subsection}{0pt}{*2}{*0.5}
\titlespacing{\subsubsection}{0pt}{*1.5}{0pt}
\usepackage{authblk}
\usepackage{graphicx}
\usepackage[space]{grffile}
\usepackage{latexsym}
\usepackage{textcomp}
\usepackage{longtable}
\usepackage{tabulary}
\usepackage{booktabs,array,multirow}
\usepackage{amsfonts,amsmath,amssymb}
\providecommand\citet{\cite}
\providecommand\citep{\cite}
\providecommand\citealt{\cite}
% You can conditionalize code for latexml or normal latex using this.
\newif\iflatexml\latexmlfalse
\providecommand{\tightlist}{\setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}}%
\AtBeginDocument{\DeclareGraphicsExtensions{.pdf,.PDF,.eps,.EPS,.png,.PNG,.tif,.TIF,.jpg,.JPG,.jpeg,.JPEG}}
\usepackage[utf8]{inputenc}
\usepackage[english]{babel}
\begin{document}
\title{Citi Bike Ride Counts for Different User Type}
\author[1]{Tingyu Chang}%
\affil[1]{New York University CUSP}%
\vspace{-1em}
\date{\today}
\begingroup
\let\center\flushleft
\let\endcenter\endflushleft
\maketitle
\endgroup
\sloppy
\textbf{Abstract}
The idea for this Citi Bike mini project is to test if customers are
less likely to ride Citi Bike comparing to subscribers during weekdays
in March 2015. The null hypothesis I proposed is that the portion of
customers riding Citi Bike on weekdays is the same or higher than the
portion of subscribers riding Citi Bike on weekdays in March 2015. The
significance level that I use for this mini project is 0.05. I've
adopted z-test to test my null hypothesis and get an extremely small
p-value so the result is that I reject the null hypothesis and state
that customers are less likely to ride Citi Bike than subscribers during
weekdays in March 2015.
\par\null
\textbf{Introduction}
Citi Bike is a public bike sharing system operated by Motivate and named
after its lead sponsor Citigroup. There are two types of user type -
customers and subscribers. Subscribers are those who have bought
an~annual membership and can ride unlimited 45-minutes rides throughout
the year. Customers are those who pay every time they ride. Since there
have different kinds of user type for Citi Bike riders, whether or not
there a significant difference in the number of rides for different user
type during weekdays and weekends would be an interesting point to~look
at.~
\par\null
\textbf{Data}
The dataset that I've used for the~statistical test is
from~\url{https://s3.amazonaws.com/tripdata}. More specifically, I look
into the data in March 2015. I've grouped the number of rides by user
types and days of a week to get the rides proportion of customers and
subscribers during weekdays and weekends. I've also calculated the
errors for the counts. In order to visualize the data better, I create a
bar plot to see the normalized distribution of bikers as in Fig. 1 and
indicates the fraction of bikers for each user type as in Fig. 2.
\par\null\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Screen-Shot-2018-11-06-at-9-03-15-PM/Screen-Shot-2018-11-06-at-9-03-15-PM}
\caption{{Normalized Distribution of Citi Bike bikers by user type in March 2015
{\label{980027}}%
}}
\end{center}
\end{figure}
\textbf{Methodology}
The test that I choose is z-test. Since the sample size is quite large,
according to the central limit theorem, the sample follows the~normal
distribution and z-test can be used. Furthermore, since the sample size
is larger than 30, I choose the z-test over Mann-Whitney U test
(Wilcoxian test) suggested by urm699 (my reviewer for HW4).~\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Screen-Shot-2018-11-06-at-9-03-23-PM/Screen-Shot-2018-11-06-at-9-03-23-PM}
\caption{{Fraction of Citi Bike bikers per user type in March 2015 for week days
on the left and weekends on the right.
{\label{269271}}%
}}
\end{center}
\end{figure}
\textbf{Conclusions}
From the z-test I used, I get a test statistics of 46.42 and the
corresponding p-value should be smaller than 0.0002 according to the
z-table. My significance level is 0.05 so I reject the null hypothesis
and conclude that during customers are less likely to ride Citi Bikes
than subscribers during weekdays.~
\par\null
\href{http://github.com/tingyuc3/PUI2018_tc1767/blob/master/HW8_tc1767/Assignment2_tc1767.ipynb}{My
GitHub notebook}
\selectlanguage{english}
\FloatBarrier
\end{document}