The OLS regression using independent variables including year, month, population per zip code and cluster label to predict the violation score. The R-square is 30%, which is mostly contributed by population and cluster labels.