For the analysis first the American Community Survey data has been read into a data frame. The relevant variables have been extracted from the data and irrelevant information has been dropped. Next the house price data has been stored in data frames for different boroughs. The data is filtered to only include sales for one family dwellings, two family dwellings, three family dwellings, condos elevator apartments, condos walk up apartments and co ops. The data also contained missing information and needed cleaning. Data for different boroughs has been stored in a single data frame. A feature of the data the building class category has been factorized to covert it to a categorical variable that can be used by the machine learning model. The American Community Survey data, Rolling sales data and zip-code has been merged using zip-code. The final data-frame features have been scaled. This concludes the data pre- processing and data wrangling part of the analysis.