An evolution to these techniques may include building a convolutional neural network and run each of our images through this network.
Results and implications
The SQUID Bike project has demonstrated in this paper, an automated data workflow to interrogate the general conditions of New York City's Bike lane infrastructure. The authors have collected bike lane imagery data in excess of 50 miles consisting of over 5000 distinct images and over 500,000 accelerometer readings. The a
The results show that with further work, a robust standard to measure bike lane infrastructure for any city in the world can be made possible using a low-cost approach assuming that bike lane imagery is collected in a purposeful manner. Furthermore advances in Computer Vision may also lead to a fully automated inspection process.
The implications of a fully automated inspection process is that cities can collect high quality, ground truth data about their bike lane infrastructure quickly in a cost-effective manner. We argue that adopting such a practice empowers a city to enter into an anticipatory paradigm for bike lane maintenance and allow transportation agencies be more responsive to the 2-wheeled commuter.
Limitations and improvements
During our analysis we faced many challenges. From the data collection to to the data analysis, we had to deal with different hardware and software constraints. Despite being able to overcome most of them, we had to accept some limitations. Image Quality, battery life, and accelerometer discrepancies across phone models were the top ones on the data collection side, whereas inaccuracies in snapping points to bike lanes, among other general GPS inaccuracies, were the top ones on the data analysis side. In the following paragraph we will explain how these limitations affected our analysis, how we overcame them and, in the rare cases we didn't, how we plan to solve them in the future.
Limitations to data collection
The most difficult part of our data collection process was the collection of images. Even though the stability of a cellphone mounted on a bike was good thanks to the mount we used, we happened to run into some problems when biking on a particularly bad stretch of bike lanes. When biking on a smooth surface, the mobile tended to remained steady and the image quality was good. However, when the bike lane started to deteriorate, issues began to arise. Hitting close successions of deep potholes tended to create many vibrations for our mount and, because of the weight of the phone, made the mount bend forwards (or backwards) on the bike handlebars.
This bending consequently caused phones to change its shooting position and to begin taking pictures of surfaces we were not interested in. We called this situation a "black potholes succession", or a succession of potholes that weren't photographable. However, it didn't constitute a big problem as it didn't happen very often. Furthermore, in the rare case it happened, the mount was always easily repositioned. But when it did happen, we found ourselves with a few feet of unusable imagery. Consequently, we ended up having a database with extremely small amount of pictures that were missing bikes lanes.
We estimate that of the 100 miles of bike lane imagery we collected, about 1,000 feet has not been correctly photographed. Indeed, the first pothole of the black potholes succession was always correctly photographed and the accelerometers always covered the rest of the missing part of the bike lanes with its data. Nonetheless, we are comfortable saying that with more time and resources we could have build a more stable mount and had also this (small) problem solved. Once the "black potholes" issue was overcome, we faced a second, and currently unsolved problem regarding the battery life of the phones collecting data. During our analysis we realized that recording images through OpenStreetCam was an extremely battery-intensive activity. A 30 min bike trip drained about half of the battery of a standard phone. This didn't force us to take breaks between one trip and another in order to charge our phones, but it made us think that most of the people probably won't trade their daily battery life for recording bike lanes data. For this reason, the crowdsourcing approach for data collecting may be limited by this constraint. In addition to that, as we went forward into the data analysis, we realized that different phones on different bikes tend to generate different accelerometer readings for the same pothole. However, the inconsistencies weren't high and the magnitudes for the accelerometer readings tended to be similar. However, on singular point we happened to measure small discrepancies. This was probably due to the fact that different bikes may react slightly different to the same pothole as much as different smartphones may have different accelerometers systems. In our case, the discrepancies weren't large enough to compromise the analysis. However, in case where we continue with collecting crowdsourced data, it will be necessary to perform some controlled test to measure how and when those discrepancies arise.
Limitations to analytics
Once the data was collected, the data analysis was performed with few issues. The only issue we faced arose when bike trips were recorded in high traffic bike lane intersections, or in a bad cellphone coverage area. In really rare cases, a few GPS points were hard to classify due to large error bars on its GPS position. Those extreme edge cases happened both because of GPS errors or the fact that the direction of the bike trip was unclear. We happened to find a few point's that were recorded far from the NYC area (middle of the Atlantic) and we filtered these points from the analysis. We noticed that this phenomena was extremely rare and concerned only one or two points out of thousands. We guessed that this was probably due to brief moments of missing cellular signal, causing the OpenstreetCam app to record those points at GPS position "0,0". Even when the whole bike trip was correctly recorded, we found some small problems on snapping particulars points to the correct bike lanes. Indeed, when more than two bike lanes intersected, it became extremely hard to determine which bike lanes a singular point should be associated to. Our initial approach was simply to snap a point to the closest bike lanes.However, when many bike lanes intersected one another, this approach failed. Because of small GPS uncertainties, some points were snapped to the bike lane on the other side of the square instead of the correct one. To overcome this problem, we had to develop a statistical algorithm that, before snapping a point to a certain bike lane, looks at the whole trip and determines which path was the most likely one that the bicyclist followed. To do that, we applied a statistical algorithm which looks at where most of the points are uniquely associated, and based on that, predicts where the uncertain points should lie. Indeed, if each trip had about 1000 points, the average number of misassigned points was less than 1 or less.
As this project has been conceived, we didn't have too many limitations, other than issues dealing with time constraints. The above paragraphs show how there is a need (and desire) to take this analysis a few steps further. Solving image quality and clarifying the nature of accelerometers' discrepancies are the most import.
Conclusion and future research
Author contributions
Felipe Gonzalez worked on data engineering (data gathering, cleaning, snapping function, overall coordination of python's processes)
Nicola Macchitella worked on data engineering (data gathering, GIS analysis and overall coordination of python's processes)
Geoff Perrin worked on the computer vision code (lane markings and defects classification), Tableau visualization, as well as setting up the amazon web service database and the code that pushed our data to the database.