The other limitation lies in
DCA's dataset. In order to identify the PUMA in which each bank is located, the latitude and longitude of each bank must be given in the data set. Some values in the data set did not have the geographic location and had to be dropped.
Methodology:
After processing the data, I analyzed the correlation between all variables to identify relationship patterns. Then, I selected the independent variables with the strongest correlation coefficients, and ran a regression analysis against the dependent variable, which is the ratio of unbanked population.
The dependent variables in this model are strongly co-variate, and it would have been a better option to go with more complex models such ridge regression. However, the linear regression model had an r-squared of 0.859, which means that the linear regression model is probably a good fit. It is noteworthy, that regression analysis was also used in the Urban Institute's analysis to create their prediction model.
Conclusions:
The correlation analysis showed that there is a strong positive correlation between banking status on one hand, and unemployment on the other. Similarly, there is a strong negative correlation between banking status and median income. These findings are in-line with Urban Institute's results. The Urban Institute's report identifies the Bronx as the borough with the highest unbanked population, highest poverty and unemployment level and lowest median incomes.