(Erener et al, 2012).
Exploiting NDVI has enabled researchers to gain high-frequency estimates of crop productivity to inform farming decisions, and to build early warning systems for deforestation in regions such as the Amazon (Michaelson 1994). As the field of land use classification through machine learning has become increasingly established, researchers have gravitated towards Random Forest as the algorithm of choice. Advantages highlighted in the literature include computational efficiency - which is greater than Support Vector Machines. Landsat has also been used alongside night-light data to predict income and poverty levels \cite{Jean_2016}

2.3 Applications to urban extent and density

Building upon land use classification studies in ecology and earth science, a growing research literature applies it to urbanization. Multi-spectral imagery is well-suited to detect urban built-up areas: although impervious surfaces lack the same distinctiveness of absorptive pattern on the visible and near-infrared wavelengths that makes vegetation easy to detect, increased reflectivity at the thermal imaging ends of the spectrum help to detect surfaces such as concrete and brick \cite{Ward_2000}. Studies in cities such as Kolkata and Ho Chi Minh City have used time series of satellite imagery to track changes in urban extent over time (Goldblatt et al, 2016). These researchers used supervised classification methods.
A key challenge faced in this literature was to establish the training data required for a supervised classification exercise. Goldblatt et al addressed the challenge two ways: firstly by taking Ho Chi Minh City's property tax database and deriving a land use map from it; and secondly by hand-classifying a gridded map of the city's extent, pixel-by-pixel, with the categories "urban residential", "urban non-residential", and "non-urban." The first effort was abandoned as the city's land use database was deemed insufficiently true with regard to actual land utilization. The second method proved effective, albeit time-consuming. This method allowed researchers to train a classifier on the training image, where pixel values correspond to land use category, and to predict new pixel values, using the bands of Landsat's multiple spectrums as input values. 

3. Methods

3.1 Reference Data on Land Use

In this research, we evaluated several methods to acquire reference data for land-use classification of urban extent and density in the United States. We initially constructed a land use map of New York City based upon the Department of City Planning's zoning shapefiles. In Geopandas, we reclassified all city areas from their detailed zoning code (eg. R4 for mid-density residential; P for park) into three categories: residential, urban non-residential, and park/non-urban. However, the classes were unsatisfactory because New York's zoning codes are not reliable indicators of actual land utilization, while the three categories were seen to have limited utility for planning decisions given the largely static city boundaries and high prevalence of mixed use.
Having established that land use classification is of particular value in the context of fast-growing cities, we turned our attention to a list of the 10 fastest growing US cities, which are concentrated in sunbelt areas such as Texas, Nevada and Arizona. Other studies such as Goldblatt et al have generated original training data for urban land use classification through hand-labeling of large raster files. In our research, we instead searched for existing detailed land-use maps in a fast-growing US region. A detailed land-use raster covering the greater Houston area for 2015 was acquired from the Texas GIS website. In Python, the pixel values were reclassified from the existing set of 10 categorical values (where 1-10 represented categories from open water through to dense urban areas, including varieties of non-urban land use such as wetlands and forest) to four categorical values: (1) open water; (2) urban: high density; (3) urban: low density; (4) non-urban land.