Problem Description

Now that 'Winter is Here', we have another factor that adds to the pollution of New York City. Every year as soon as the weather gets cold, we see a stream of black smoke arising from the buildings in New York and since many building still use age old oil fuels to heat their buildings, it would be a good idea to study the air quality in the winter months and see how these boiler emissions affect the environment. Coming up with a solution for this question is not easy as it depends on multiple factors such as the type of oil being used, what building type use these boilers (different laws apply for different building types), if the residents can afford the fuel prescribed by the government etc. For my analysis, I will study the relationship between some of these variables such as correlation between oil boilers and GHG emissions, oil boilers and median household income of the building, oil boilers and asthma rate. It would be interesting to see if the pollution caused by boiler emission contributes to the asthma rate in the city. 

Data

For my analysis, I intend to use the following data:
The Oil Boilers data provides the type of fuel being used, its compliance with the standards set by government, building type, BIN, BBL etc to identify the specifics of the building where boiler is installed. The NYC Building Energy Benchmarking data gives me the green house gas emissions at BIN and BBL level in Metric Tons CO2e whereas the ACS data provides data for median household income normalized by census tract. The last dataset is of Asthma that I've collected from New York State, Department of health website. 

Analysis

Initially I intend to perform correlation between the variables individually and see if they are linearly correlated or not. Once I know how the variables are correlated, I'll add features to my regression one at a time to see which variables is good indicator of prediction and which variable can be omitted form the model. After studying the model, I intend to show on map of New York which areas have high clusters of high boiler emission and high GHG emissions at zip level. It would be interesting to see if the median household income plays any role with the boiler emissions and also if it plays any role in  high rate of asthma in the city.

References

There are a number of studies performed on the type of fuel to be used in boilers in major cities such as New York. The most relevant studies to my project are: