The authors are grateful for the thoughts of colleagues at Nesta, the Economic Statistics Centre of Excellence and the Office for National Statistics on this work. Particular thanks are due to Hasan Bakhshi for his comments on early drafts.
Appendices
Appendix 1: Overview of community detection algorithms
Stochastic Block Modelling (SBM) involves fitting a generative model of a graph to data \citep*{peixoto_bayesian_2017}. Under SBM, nonparametric statistical inference is applied to partition the graph in such a way as to maximise the explanatory power of a fitted model given the observed edges. From the candidates, the minimum description length model (i.e. the simplest model) is selected to prevent overfitting. SBMs have been found to produce some of the best results on real-life networks and are capable of identifying several types of network structures in addition to communities. SBMs can also detect hierarchical structures in networks and can be extended to overlapping communities. For the purposes of our analysis, we use a degree-corrected SBM that employs a Markov chain Monte Carlo (MCMC) algorithm \citep*{peixoto_efficient_2014} as implemented in the python igraph library.
The Louvain multilevel community detection algorithm identifies communities in the network that maximise the quality of the partitioning \citep*{blondel2008fast}. The established metric used to measure the quality of communities is modularity. Modularity varies between [-1,1] and refers to the concentration of edges within communities as opposed to the distribution of edges that would be observed in a random graph with the same vertex degree distribution. The Louvain algorithm is hierarchical; it starts with individual vertices belonging to their own communities and then iteratively groups the vertices in such a way as to increase the overall modularity score. This algorithm is intuitive and one of the most commonly used for identifying network communities. Louvain was found to be the second best-performing method in the comparative analysis of algorithms conducted by Lancichinetti and Fortunato \citep*{fortunato2016community}. The criticism of the modularity optimisation algorithms focuses on the limitations of these methods in identifying an appropriate level of resolution. The algorithms may split large communities or merge smaller ones. They may also underperform as compared to other methods if the true number of clusters is not known.
Infomap is a dynamics based community detection algorithm, which identifies communities in the network by measuring the flow of information through the network using random walks \citep*{rosvall2008maps}. The rationale behind the method is that due to the higher density of edges within communities, the random walkers will be trapped and spend a longer time inside communities. Infomap further improved the early implementations of the dynamics based algorithms by using information theory to define the most parsimonious way to describe graph community structure. Infomap is especially effective when applied to directed networks, where it can identify communities that would not be detected by modularity optimisation algorithms.
Stochastic Block Modelling (SBM) involves fitting a generative model of a graph to data \citep*{peixoto_bayesian_2017}. Under SBM nonparametric statistical inference is applied to partition the graph in such a way as to maximise the explanatory power of a fitted model given the observed edges. From the candidates the minimum description length model (i.e. the simplest model) is selected to prevent overfitting. SBMs have been found to produce some of the best results on real-life networks and are capable of identifying several types of network structures in addition to communities. SBMs can also detect hierarchical structures in networks and can be extended to overlapping communities. For the purposes of our analysis, we use a degree corrected SBM that employs a Markov chain Monte Carlo (MCMC) algorithm \citep*{peixoto_efficient_2014} as implemented in python igraph library.
Appendix 2: Tables