Ben Steers bensteers bs3639
Problem Description
Excessive noise pollution + health blah citation. For that reason, gaining an understanding of the level of noise in a city context can provide information about the health of citizens who live and (attempt to) sleep there. A predictive noise model will be created using traffic and business data in order to be used to estimate noise levels where sensors are not available to measure it. This model attempts to estimate noise levels using measures of just anthrophonic and automotive noise sources, ignoring other sources like biophonic and geophonic noise. 
Data
For base truth noise levels, SONYC sound pressure level (SPL) in the for of LAeq (equivalent A-weighted sound level) will be used. This data is densely available around Washington Square Park, but is also available in various other locations around Manhattan and Brooklyn. Historical data is available for approximately the last two years, but the amount of data available depends on the specific deployment date. Once I finally get access to the SPL data server, I can get exact timeframes for each sensor. 
Traffic counts will be used as a measure of traffic flow and are available via NYC Open Data.
Business location will be gathered using Yelp data, based on the assumption that business density can be used as a proxy for estimating the level of human activity. Because business density is a constant value regardless of the time of day/year, business customer flows may be incorporated using Google Maps popular times data to give temporal characteristics to the business activity. 
Analysis
A regression model will be constructed to estimate the sound level on an hourly basis using the data described above. Because it is anticipated that the data being used may not be able to provide an accurate measure at hourly intervals, another regression will be attempted to predict only the maximum LAeq for that day. 
References
King et al. performed a statistical assessment of road traffic noise to approximate Lmax over a defined time period, for which Monte Carlo analysis was used. 
Deliverable
The end deliverable of this project will be a predictive noise model that can estimate the noise level based on traffic and business data. The results of the noise model will be displayed on a map showing locations of higher and lower noise levels.