\documentclass{report}
\usepackage{fullpage}
\usepackage{graphicx}
\usepackage[space]{grffile}
\usepackage{latexsym}
\usepackage{textcomp}
\usepackage{longtable}
\usepackage{tabulary}
\usepackage{booktabs,array,multirow}
\usepackage{amsfonts,amsmath,amssymb}
\providecommand\citet{\cite}
\providecommand\citep{\cite}
\providecommand\citealt{\cite}
\usepackage{url}
\usepackage{hyperref}
\hypersetup{colorlinks=false,pdfborder={0 0 0}}
\usepackage{etoolbox}
\makeatletter
\patchcmd\@combinedblfloats{\box\@outputbox}{\unvbox\@outputbox}{}{%
\errmessage{\noexpand\@combinedblfloats could not be patched}%
}%
\makeatother
% You can conditionalize code for latexml or normal latex using this.
\newif\iflatexml\latexmlfalse
\providecommand{\tightlist}{\setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}}%
\AtBeginDocument{\DeclareGraphicsExtensions{.pdf,.PDF,.eps,.EPS,.png,.PNG,.tif,.TIF,.jpg,.JPG,.jpeg,.JPEG}}
\usepackage[utf8]{inputenc}
\usepackage[english]{babel}
\author{sunglyoung Kim\\ NYU Center for Urban Science \& Progress }
\title{Spatial analysis of from 16 to 34 year old New Yorker's Living Location}
\begin{document}
\maketitle
\subsection*{~Introduction~ ~}
{\label{221780}}
As college~students in NYC, where to live are one of the most important
problems since NYC is one of the highest popular density cities in the
states and also, there are so many distinct neighborhoods. Waldo Tobler
claims that ``everything is related to everything else, but near things
are more related than distant things,''~ location of NYC college student
living place would have an autocorrelation.~~\cite{Tobler_1970}~.~ It's
difficult to extract only student population from census data so 16 to
34 years old New Yorker's living place is considered in this paper.
Neighborhood Tabulation Areas(NTAs) is chosen for this study because the
NTAs is created by New York City to tracking population from 2000 to
2030 for ``long-term sustainability plan for'' the city. The area is
smaller than NYC Borough but bigger than census tract and the area is
shown in Fig. 1.~~\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.28\columnwidth]{figures/NYN-neighbor/NYN-neighbor}
\caption{{Neighborhood Tabulation Areas(NTAs)
{\label{655903}}%
}}
\end{center}
\end{figure}
Using Global Moran's I and Local Moran's I to find autocorrelation. Null
hypothesis of Global Moran's I is the data set doesn't have
autocorrelation.~
\subsection*{Population}
{\label{165805}}
16 to 34-year-old population and Median income per NYC Neighborhood
Tabulation level are~acquired from census 2010 the data is downloaded
from GeoLytics.inc.~~\cite{1970-2010} The fig.2 shows the population
in NTAs level. While joining the census data 13 neighborhoods don't have
the~data that matched to NTAs level so the place is shown white color on
Fig.2. Randomness is tested by Global Moran's I and the value is 0.083
which indicates weak autocorrelation. The P value is 0.032 so the null
hypothesis, the population is randomly~distributed on NTAs, is rejected
and fig.3 corresponds the information.~~\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.28\columnwidth]{figures/Median-rent-ratio/NYN-16-34-pop}
\caption{{NYC population 16 to 34 year old population~
{\label{629931}}%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.28\columnwidth]{figures/YN-MoranI/YN-MoranI}
\caption{{Moran's I value is 0.083 and P-value is 0.032~
{\label{230077}}%
}}
\end{center}
\end{figure}
Local Moran's I has tested and Fig.4 shows where High-High, High-Low,
and Low-Low district.~
\par\null\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.28\columnwidth]{figures/Yn-Local/Yn-Local}
\caption{{Local Moran's I for NYC population 16 to 34 year old
{\label{780437}}%
}}
\end{center}
\end{figure}
\par\null
The people spatial correlation could be modeled by several variables and
could be quantified to understand why 16 to 34 New York City people
chose the neighborhoods. Rent price, subway station location,
restaurants location, business location, and University location data is
examined in this study to compare the real data.
\par\null
\subsection*{Median rent}
{\label{490690}}\par\null
Rent fee is one of the most important variables when people look up next
housing place. The median rent per month~in NTAs level is collected from
2010 census data.~\cite{1970-2010} The~median rents in each
neighborhood~are normalized by the maximum~value of median rent per
NTAs.~\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.28\columnwidth]{figures/Meidan-Rent-ratio/Meidan-Rent-ratio}
\caption{{Normalized Median Rent Price per NTAs~
{\label{573096}}%
}}
\end{center}
\end{figure}
Moran's I value is 0.074 and the P-value is 0.053 so the null hypothesis
can not be rejected. the Fig.6 corresponds the data.
\par\null\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.28\columnwidth]{figures/Rent-I-value/Rent-I-value}
\caption{{The Moran's I value is 0.074 and P value is 0.053 so Null hypothesis can
not be rejected
{\label{267269}}%
}}
\end{center}
\end{figure}
However, local rent Moran's I could give a cluster and it shows
High-High, Low-Low locations. Fig. 7 shows the local Moran's I value.
Most of Low-Low cluster is located in the Bronx and High-High area is
located in Manhattan and it shows~Manhattan rent is expensive and the
Bronx is inexpensive than other boroughs.~
\par\null\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.28\columnwidth]{figures/Local-rent/Local-rent}
\caption{{Local Moran's I value of median rent value per NTAs
{\label{437242}}%
}}
\end{center}
\end{figure}
\par\null\par\null
\subsection*{Restaurant}
{\label{118391}}
New York City has known for good restaurants. When looking up new place
what restaurants are nearby is a good indication. Also, more restaurants
give more selection and better quality in the same area. Therefore, a
number of restaurants in each area are selected to calculate living
score.~{The restaurant data is gathered from ~Miles Grimshaw's blog
~}\cite{database}{. The original data is 2013 restaurants health
rating from NYC open data. The addresses~are extracted and geocoded~by
Google API. The Raw data is transformed to~point data and find an
intersection with NTAs by using Pysal~package. Fig. 8 shows the raw
data.}
\par\null\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.28\columnwidth]{figures/R-raw/R-raw}
\caption{{Raw Data of 2013 NYC Restaurants~
{\label{385371}}%
}}
\end{center}
\end{figure}
The Restaurant data is normalized by the same manner as median rent is
treated.~\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.28\columnwidth]{figures/R-ratio/R-ratio}
\caption{{NYC 2013 Restaurants Ratio in NTAs level~
{\label{257411}}%
}}
\end{center}
\end{figure}
The Moran's I value is 0.044 so weak autocorrelation but the P-value is
0.14 so the null hypothesis can not reject. Fig.10 shows~Moran's I~value
and P-value. However, Local Moran's I could reveal cluster and the
result is shown in Fig. 11. It's interesting that Williamsburg~area is
High-High clustering but most of the~area is insignificant. Fig. 11 show
Local Morans'I.\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.28\columnwidth]{figures/R-Moran/R-Moran}
\caption{{Restaurants Global Moran's I~
{\label{964537}}%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.28\columnwidth]{figures/Local-R/Local-R}
\caption{{Restaurants Ratio Local Moran's I
{\label{746635}}%
}}
\end{center}
\end{figure}
\subsection*{Subway Station}
{\label{792654}}
MTA Subway is New Yorkers' the most common public transportation so the
number of the subway station in each neighborhood is studied in this
paper. The data is acquired from NYC open data.~\cite{station}. This
data is also handled in the same manner as the previous technique.~
Fig.12 and Fig.13 shows raw data set and the subway ratio.\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.28\columnwidth]{figures/Subway-raw/Subway-raw}
\caption{{Raw Data of MTA Subway in NYC
{\label{609369}}%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.28\columnwidth]{figures/subway/subway}
\caption{{Number of Subway Station Ratio in NTAs level
{\label{837047}}%
}}
\end{center}
\end{figure}
Morans'I value is -0.060, which is weak negative autocorrelation.
Brooklyn and Queens are not well covered by subway station and there is
no subway in Staten Island, so negative autocorrelation makes sense.
However, the P-value is 0.11, which is bigger than 0.05. Therefore, we
can not reject the null hypothesis. Local Moran's I shows subway station
ratio can be clustered.~\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.28\columnwidth]{figures/Subway-ratio/Subway-ratio}
\caption{{Global Moran's I value of Subway Station
{\label{323238}}%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.28\columnwidth]{figures/Local-subway/Local-subway}
\caption{{Local Moran's'I of Subway Station
{\label{616809}}%
}}
\end{center}
\end{figure}
\subsection*{Business~}
{\label{372948}}
How many companies operate in a neighborhood could be an indicator
to~find a new place. Legally~operating business list is collected from
NYC open data.~\cite{data} Whole data set is converted to point
and plot on Fig. 16.\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.28\columnwidth]{figures/Business-raw/Business-raw}
\caption{{Legally Operating Business by Nov 17, 2017
{\label{581858}}%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.28\columnwidth]{figures/B-ratio/B-ratio}
\caption{{Normalized Business ratio per NTA
{\label{118038}}%
}}
\end{center}
\end{figure}
The data is normalized and plot on Fig. 17. Global Moran's I is -0.043,
weak negative autocorrelation, but the P-value is 0.187 so the null
hypothesis can not reject. The Moran's I value shows on Fig. 18.\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.28\columnwidth]{figures/B-Moran/B-Moran}
\caption{{Global Moran's I value of Business Ratio
{\label{994512}}%
}}
\end{center}
\end{figure}
Local Moran's I shows most of the area is insignificant but East New
York is High-High.~\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.28\columnwidth]{figures/Local-B/Local-B}
\caption{{Local Moran's I of Business Ratio
{\label{153718}}%
}}
\end{center}
\end{figure}
\par\null
\subsection*{}
{\label{305073}}
\subsection*{Crime}
{\label{506148}}
People want to live in a safe area so NYC crime data is pulled out from
NYC open data.~\cite{date} The crime data is collected only
felony crime from Jan 2017 to Oct 2017.~ The data is plot on Fig. 20\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.28\columnwidth]{figures/Crime-Raw1/Crime-Raw}
\caption{{Felony Crime in NYC from Jan 2017 to Oct 2017
{\label{307104}}%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.28\columnwidth]{figures/Crime-ratio/Crime-ratio}
\caption{{Felony Crime Ratio from Jan 2017 to Oct 2017
{\label{676584}}%
}}
\end{center}
\end{figure}
Global Moran's' I value is -0.0036 and P-value 0.448 so the crime ratio
is most likely don't have autocorrelation.~\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.28\columnwidth]{figures/Crime-Moran/Crime-Moran}
\caption{{Crime Ratio Global Moran's I value
{\label{777225}}%
}}
\end{center}
\end{figure}
Local Crime Moran's I shows Low-Low,~High-High, and Low-High area. The
Bronx has higher crime cluster.~\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.28\columnwidth]{figures/Local-crime/Local-crime}
\caption{{Local Moran's I Value for Crime Ratio
{\label{495842}}%
}}
\end{center}
\end{figure}
\subsection*{University}
{\label{729766}}
All colleges and University location are collected from NYC open
data.~\cite{universities} The University and Colleges are effects on
multiple neighborhoods rather than only located area so the point data
is converted to buffer. Fig. 24 and Fig. 25 shows the buffer map and
University and Colleges ratio map.~
\par\null\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.28\columnwidth]{figures/University/University-cover}
\caption{{University and College Buffer Map
{\label{932273}}%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.28\columnwidth]{figures/University-cover/University-ratio}
\caption{{University and Colleges ratio map
{\label{195788}}%
}}
\end{center}
\end{figure}
The Moran's I value is 0.046 and P-value is 0.13 so it has very weak
autocorrelation but we can not reject the null hypothesis. Fig. 26 shows
the Moran's I value and Fig. 27 shows Local Moran's I.\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.28\columnwidth]{figures/Uni-Moran/Uni-Moran}
\caption{{University and Colleges Moran's I Value
{\label{482995}}%
}}
\end{center}
\end{figure}
Local Moran's I shows High-High and Low -High area in the Staten Island
and The Bronx. Low-Low area is East New York.\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.28\columnwidth]{figures/Local-Uni/Local-Uni}
\caption{{Local Moran's I value of University and Colleges~
{\label{945835}}%
}}
\end{center}
\end{figure}
\subsection*{Live Score}
{\label{174997}}
The six variables are weighted to duplicated the real data.
- 20.0 * NYN\_grouped\_shp{[}'rent\_ratio'{]}~ +~ 4.0 *
NYN\_subway\_shp{[}'Subway\_ratio'{]} - 3.0 *
NYN\_crime\_shp{[}'crime\_ratio'{]} +~ NYN\_U\_shp{[}'Uni\_Area'{]} +~
1.0 * NYN\_R\_shp{[}'R\_ratio'{]} + 1.0 *
NYN\_B\_shp{[}'Business\_ratio'{]}
This weight shows that rent ratio is the most important factor to chose
the housing neighborhood, follow by subway station and crime rate.\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.28\columnwidth]{figures/Live-score/Live-score}
\caption{{Live Score on NYC Map
{\label{586596}}%
}}
\end{center}
\end{figure}
Moran's I value is 0.073 and P-value is 0.065, so it has very weak
autocorrelation but the null hypothesis can not reject.~~\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.28\columnwidth]{figures/LS-Moran/LS-Moran}
\caption{{Global Moran's I value of Live Score
{\label{997568}}%
}}
\end{center}
\end{figure}
Local Moran's I value shows High-High area in the Bronx and East New
York and most of Low-Low area are in Manhattan.
\par\null\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.28\columnwidth]{figures/Local-LS/Local-LS}
\caption{{Local Moran's I value of Live Score
{\label{518756}}%
}}
\end{center}
\end{figure}
\subsection*{Conclusion}
{\label{924383}}
New York City has multiple variables to chose where you want to stay or
live. This study is generalized common goods as the number of MTA subway
station, the number of restaurants, and the~number of business places.
However, each neighborhood's distance between centroid to each subway
line and give a weighted on each line could give a better idea. NYC has
so many MTA lines and all lines don't offer the same convenience to get
a destination. Also, if there is a spatial regression than I would like
to use the regression to find multiple variable weight to find a
correlation between 16 to 34 years old New Yorker living place and the
six variables.~
\selectlanguage{english}
\FloatBarrier
\bibliographystyle{plain}
\bibliography{bibliography/converted_to_latex.bib%
}
\end{document}