Attempt 2 - Parallelization in both dimensions
For parallelization in both dimensions using MPI (the 2D grid is divided into both subcolumns and subrows) the issue is complicated by memory not being contiguous anymore, which means that e.g. when communicating the outer columns, the data cannot be sent using a normal MPI_Isend command.
For the 2D case, we could utilize a Cartesian topology which is a built-in functionality in MPI. This has at least two potential benefits: 1) getting the functions N/S/W/E neighbours is automatic, and 2) it gives MPI the chance for some optimizations regarding which processes are assigned as neighbours. While it's true that 1) is trivial to implement manually and 2) is unlikely to lead to tangible improvements in DAG as this is presumably a homogenous cluster where all the computers can talk to each other with similar latency, I decided to nevertheless get acquinted with the Cartesian communicator in MPI.
The initialization of local worlds remains fairly trivial: