The limitations we have on my data collection
is this:
- The data being collected is # Instagram data for each restaurant.
Thus, it does not give the ground reality of the restaurant. It does
not mean that this is the number of people that went to this
restaurant. Thus, making this data just a perspective of “trending
areas”.
- The data from TLC is nowhere related to the Instagram data. So, when
we run a regression between the Instagram data and TLC data, we do not
aim to see how many people took a taxi but just to compare whether
there is a correlation between the same.
- The Instagram posts of those restaurants are usually taken in the
restaurants. But sometimes, people tend to click photographs and then
upload them on Instagram later on. This can change the times of the
analysis. But, there seems to be some advantages, as people usually
post them at the same night, thus allowing us to take that analysis
into effect.