All the data is contained within a single ~70GB text file and the goal of processing this data is to visualize it using graphs. Therefore it is crucial to be able to query the data efficiently to speed up the visualization process, which is why using a database is necessary. I'm using MongoDB because of its flexibility and because all the intermediate results are also persisted to the database. Furthermore, we use communicate with the database through Python using PyMongo library.
MongoDB installation
I installed MongoDB on linux subsystem within Windows 10. The installation process is identical to installing it on Ubuntu 16.04, which is done using the following terminal commands: