* v1 = custom vision
   * v2 = opencv

Results and implications

Limitations of the analysis and ways to address these in the future

During our analysis we faced many challenges. From data collection to data analysis we had to deal with different hardware and software constrains. Even though we overcome most of them, we had to accept some limitations. Image Quality, battery life and accelerometers discrepancies were on the top ones of the data collection side, snapping point and GPS inaccuracy were the data analysis's ones. In the following paragraph we will explain what did these limitations meant to our analysis, how we overcame them and, in the rare cases we didn't, how we plan solve them in future development. 

Data collection side

The most difficult part of our data collection process  was images collection. Even though mobile's stability on the bike was really good thanks to the mount we used, we happened to run into some problems when biking on particularly bad bike lanes.Indeed,whenWhen biking on a smooth surface the mobile tent to remained steady and the images quality perfect. However, when the bike lane started becoming very deteriorated some issues arose. Hitting close successions of  deep potholes tent to create many vibrations for our mount and, because of the weight of the mobile, made the mount bend forwards (or backwards) on the bike handlebar. This bending consequently caused our mobile change its shooting position and starting taking pictures of surfaces we were not interested in. We called this situation  a "black potholes successions".  A successions of potholes that weren't photographable. It didn't constitute a big problem and it didn't happen so often. Furthermore, in the rare cases it happened the mount was always easily repositioned.However, in the few times it happened we found ourself with about 1 meters of not satisfying images. Both because blurred or simply not relevant, some images were not usable for our analysis. Consequently, we end up having a database with extremely small missing pictures of bikes lane. Like black (pot)holes we couldn't look at. Really small black potholes. We estimated that of the 100miles database we collected, about 300m haven't been correctly photographed.The overall analysis wasn't really weaken by this lack of photos. Indeed, the first pothole of the black potholes succession was always correctly photographed and the accelerometers always covered the rest of the missing part of the bike lanes with its data.Nonetheless, we are comfortable saying that with more time and resources we could have build a more stable mount and had also this (small) problem solved.Once the "black potholes" issue was overcame, we faced a second a yet not unsolvable problem regarding battery life of our mobiles. During our analysis we realized that recording images through OpenStreetCam was an extremely battery demanding activity. A 30 min bike trip tent to drain about half of the battery of a standard smartphone. This didn't forced us to take breaks between one trip and another in order to charge our mobiles.Obviously, it didn't constitute a problem at all for all. However, it made us think that most of the people probably won't trade their daily battery life for recording bike lanes data and for this reason the crowdsourcing approach for data collecting would probably limited by this constrain.  In addition to that, as we went forward into the data analysis, we realized that different that different phones on different bikes tent to generate different V-values for the same potholes. The inconsistencies weren't high and the average V-values tent to be the same. However, on singular point we happened to measure small discrepancies. This was probably due to the fact that different bikes may react slightly differently to the same pothole as much as different smartphones may have different accelerometers systems.In our case the discrepancies weren't big enough to compromise the analysis, however,  in case the data collection had been done crowdsourced, it would have been necessary to perform some controlled test to measure how and when those discrepancies arise.

Analysis Side

Once the data were collected, the data analysis was performed without relevant problems. The only issue we had to face arose when bike trips were recorded in highly density bike lane intersections or in bad mobile's coverage area. Indeed, in really rare cases, few GPS pictures' points were hard if not impossible to be classified because of the unclear position they were associated to. Those extreme edge cases happened both because of GPS's mobile errors or unclarity of the direction of the bike trips.On one hand, we happened to find a few point's that were recorded far from the NYC area (middle of the Atlantic) and because of that we had to discard those points as useless. We noticed that this phenomena was extremely rare and regards only one or two point during trips of thousands of point. We guessed that this was probably due to short moments of missing signal of  the mobile's GPS and that the OpenstreetCam app recorded those moments as happening in a arbitrary position that we called "position 0,0". Our hypothesis was also strengthen by the fact that all those outlier points were basically in the geographical position. On the other hand, even when the whole bike trip was correctly recorded, we found some small problems on snapping particulars points into the correct bike lanes positions.Indeed, when more than two bike lanes intersected on a corner, it became extremely hard to determine which bike lanes a singular point should be associated to. Our initial approach was simply to snap a point to the closer bike lanes.However, when more bike lanes intersected this approach failed. Because of small GPS uncertainty, some points were snapped to the bike lane on the other side of the square instead of the correct oneTo overcome this problem we had to develop a statistical algorithm that, before snapping a point to a certain bike lane, looks to whole trip and determines which path was really followed by the biker. To do that, it simply apply a statistical learning seeing where most of the points are uniquely associated and based on that predicts where the doubtfully points should be associated to. This approach left us with a non-relevant amount of points wrongly assigned and definitely solved this issue. Indeed, if each trip had about 1000 points, the average wrongly assigned point per trip is less than 1 points, less than one point per trip, which means less than 1%. 

As this project as been conceived it didn't really had too many limitations and the issues we faced were mostly due to time constrains. However, as the intendanalysisup,intetedanalysisintetentintent of this analysis is to someanalysios issue have to be faced and solved. The above paragraphs show how there is need (and desire) to take this analysis one step further as much as which steps have to be done to do that. Solving image quality and clarifying the nature of accelerometers' discrepancies are the most import.important one if we want to reach a more appropriate and complete view of what bike lanes' conditions really are.  

Conclusion and future research

Author contributions

Felipe Gonzalez worked on data engineering (data gathering, cleaning, snapping function)