The limitations we have on my data collection is this:
  1. The data being collected is # Instagram data for each restaurant. Thus, it does not give the ground reality of the restaurant. It does not mean that this is the number of people that went to this restaurant. Thus, making this data just a perspective of “trending areas”.
  2. The data from TLC is nowhere related to the Instagram data. So, when we run a regression between the Instagram data and TLC data, we do not aim to see how many people took a taxi but just to compare whether there is a correlation between the same.
  3. The Instagram posts of those restaurants are usually taken in the restaurants. But sometimes, people tend to click photographs and then upload them on Instagram later on. This can change the times of the analysis. But, there seems to be some advantages, as people usually post them at the same night, thus allowing us to take that analysis into effect.