Reconstructing developmental trajectories from micro-clusters
Upon the construction of a new dataset with micro-clusters, we sought to construct a developmental hierarchy based on gene expression. Given that a hematopoietic stem cell can differentiate into cells of all possible lineages, we used the minimum spanning tree (MST) algorithm for hierarchical reconstruction. An MST seeks to find a subgraph that will span all the vertices, in this case micro-clusters, of a connected graph with the minimum sum of edge lengths. It has been previously applied to several trajectory-finding methods such as Monocle and SPADE (Qiu et al., 2011; Trapnell et al., 2014).
Prior to MST construction, we pre-processed our micro-cluster dataset using the same variable gene selection, normalization, and cell-cycle regression strategy as with our original single cell dataset. We reduced the dimensionality of this 5,000 x 997 micro-cluster profile using diffusion maps, implemented in the diffusionMap R package (using one minus Pearson correlation as an initial distance metric for the diffusion map). We then constructed a distance matrix between micro-clusters, based on diffusion distance across 12 dimensions, although in practice we obtained very similar results even with as few as five dimensions. We chose an MST layout by computing t-Distributed Stochastic Neighbor Embedding (t-SNE), run on the same distance matrix that was used for MST construction. Notably, the t-SNE here is used only for visualization of the hierarchy. In Figure S2C, we present an alternative visualization of the MST hierarchy, with a modified layout based on multidimensional scaling (MDS), that allows for easy visualization of the tree structure.