The random forest method has its strengths and weaknesses. The strength is that it was able to weigh out which factor was most prevalent in delayed projects. This is very useful when making predictions for which attributes will show up on delayed projects in the future. A weakness is that this analysis is only as good as the attributes fed into the model. If irrelevant attributes are entered it could spit out incorrect information. Another weakness is that this type of plot can produce different results when run several times based on how the decision tree is formed.
Conclusion:
The random forest regression has demonstrated that there is a 62 percent significance level for the size of a park (Acreage) with respect to whether it is delayed or not. This is a plausible outcome given that larger parks have larger corresponding contracts that tend to be more complex in nature. Added complexity to these jobs results in more paperwork and approvals to approve these contracts as well as a smaller pool of eligible contractors. The latter results in slower processing times because there is less incentive for these large contracts to begin.
The data still might have many clerical errors which would need to be systematically weeded out through better data integrity practices.
The factors entered were the cleanest available. However, with more hours dedicated to data munging, other factors could be unlocked from the publicly available datasets. Further, much more information exists with respect to contractor information, the specific nature of each contract, and a variety of proprietary Parks information, such as ratings, previous funding, and maintenance records to name a few. The largest opportunity exists to pull in information specific to each bid on each contract so that information could be analyzed on an item-per-item basis. For example, each contract that used the standard item of a specific type of bench could be tracked and compared across the city. Similar random forest regressions could be run to indicate which item in the bid sheet had has the most significance to delayed projects in the NYC Parks system.
While the finding of this study is enlightening, it is merely scratching the surface of a further study into the causes of delays within the capital process.