Problem Description: The question to be answered is what factors affect New York City students’ English test academic performance in 2015 and to what extent. I will first define what are the appropriate proxies for academic performance and collate a list of suitable factors. Based on the proxies and factors collated, a regression will be conducted to help determine if the factors are significant at α = 5%, and quantify the impact of each factor. While the factors affecting students academic performance has been a well studied one- with the impact of factors such as poverty (Battistich et al., 1995), teacher’s qualifications (Boyd et al., 2008), attendance in after-school programs (Shernoff, 2010) understudied previously, it is still interesting to conduct this research again in the context of newly available data such as school budgets and iZone programs to see if new policies affected academic performances. 
Data:
Data Name
Why is Data Suitable
Processing conducted
Mean english score for each school as a proxy for academic performance
Need to narrow dataset to 2015 only
An unsafe school might be more disruptive for learning
Need to narrow dataset to 2015 only
Smaller class size has been shown to result in better quality learning by various studies
Need to combine this dataset with the rest of the data using school names
Schools in this list have access to funds to use newer softwares for teaching
This data is likely to be converted to a dummy variable for regression
When students are absent from school, they are not likely to be able to learn more.  
Need to narrow dataset to 2015 only and need to combine dataset to a yearly average instead of monthly
Schools with such literacy programs might improve english test scores
This data is likely to be converted to a dummy variable for regression
Schools with more funding might be able to provide a better quality education
Funding is location based not school based, thus need to convert it to the latter
Provide geospatial coordinates to plot all the schools and the factors
Need to merge all cleaned factors to this dataset