I added some print functions to make sure that the array was initialized correctly.
Unfortunately, upon inspecting the gifs I realized that something had gone wrong. Despite the local worlds being initialized correctly, the gather operation on rank 0had not collected the local worlds correctly. The fact that incoming local arrays need to be arranged non-contiguously in the global array proved to be extremely difficult to solve. I tried to solve this problem using custom MPI_Types (MPI_Type_create_subarray) and the Gatherv function for displacements of the small arrays into the large arrays memory. However, in the end I could not make it work. This was frustrating as the ghostline communication itself should have been fairly easy once everything was setup. Moreover, it is not recommended to "gather" local operations on a single process (rank 0) when it's possible to do the data saving itself in parallel using MPI-IO, but I assumed the gather operation was supposed to be used in this assignment.
The improvements in running time based on running the 1D parallelized code on MODI is shown in Figure 1. The results are fairly dissappointing with weak scaling. Despite the frustrations with C++, and the failure to make 2D parallelization work due to stumbling on the "Gather" operation, this was a great learning experience!