Reconstructing developmental trajectories from micro-clusters
Upon the construction of a new dataset with micro-clusters, we sought to
construct a developmental hierarchy based on gene expression. Given that
a hematopoietic stem cell can differentiate into cells of all possible
lineages, we used the minimum spanning tree (MST) algorithm for
hierarchical reconstruction. An MST seeks to find a subgraph that will
span all the vertices, in this case micro-clusters, of a connected graph
with the minimum sum of edge lengths. It has been previously applied to
several trajectory-finding methods such as Monocle and SPADE (Qiu et
al., 2011; Trapnell et al., 2014).
Prior to MST construction, we pre-processed our micro-cluster dataset
using the same variable gene selection, normalization, and cell-cycle
regression strategy as with our original single cell dataset. We reduced
the dimensionality of this 5,000 x 997 micro-cluster profile using
diffusion maps, implemented in the diffusionMap R package (using one
minus Pearson correlation as an initial distance metric for the
diffusion map). We then constructed a distance matrix between
micro-clusters, based on diffusion distance across 12 dimensions,
although in practice we obtained very similar results even with as few
as five dimensions. We chose an MST layout by computing t-Distributed
Stochastic Neighbor Embedding (t-SNE), run on the same distance matrix
that was used for MST construction. Notably, the t-SNE here is used only
for visualization of the hierarchy. In Figure S2C, we present an
alternative visualization of the MST hierarchy, with a modified layout
based on multidimensional scaling (MDS), that allows for easy
visualization of the tree structure.