Identifying branch points in the developmental hierarchy
The MST computed on diffusion distances represents an un-rooted and
multi-branched developmental hierarchy. To parse this model, we first
assigned a root node to the MST, choosing the micro-cluster with the
highest expression of AVP , a gene most highly upregulated in our
stem cell cluster (C6, Table S2). We note our downstream results are
highly robust to the exact choice of root, as long as we choose a root
node corresponding to an HSC cluster. After root assignment, the
‘developmental progression’ of each downstream node, can be represented
as the length of the shortest path connecting it to the root.
Next, we identified terminal leaves in the tree, representing nodes with
only parents and no children. We calculated the path lengths from the
root node to all terminal nodes, and selected the four terminal nodes
with the longest path length to represent the four ‘endpoints’ for
hematopoietic differentiation into distinct hematopoietic lineages. Each
of these terminal nodes represented micro-clusters corresponding to
distinct cell states as determined in Figure 1, specifically erythroid,
eosinophil/basophil/mast, neutrophil/monocyte, and lymphoid progenitors.
Therefore, we can treat the four terminal nodes as ‘end points’ of
developmental progression towards four distinct hematopoietic lineages.
We next identified the ‘branch points’ in our proposed hierarchy, which
can be directly determined from the MST structure. To obtain these, we
identified the shortest path along the MST between all pairs of terminal
nodes. The point on each shortest path that is closest to the root node
represents a transcriptomic bifurcation, or ‘branch point’, in the
model. Across all comparisons, we identified three transcriptomic
‘branch points’, representing a developmental hierarchy summarized in
Figure 2C.
Lastly, we assigned each ‘micro-cluster’ a branch identity. To do this,
we divided the MST into a series of ‘segments’. These can be easily
visualized in Figure S2C. This figure shows the same MST structure as
Figure 2B, but on a different layout, which is based on multidimensional
scaling of the MST-based distance matrix, ensuring that the different
segments of the MST, and the branch points which connect them, can be
easily visualized.
Cells located prior to the first bifurcation (the branch point closest
to the HSC) are annotated as HSC/MPP, and segments leading to terminal
nodes were named based on their downstream lineage (i.e. ‘Er’,
‘Eo/Ba/Ma’, ‘Lym’, ‘Neu/Mo’). For intermediate segments, which were
downstream of the first bifurcation but did not lead to terminal nodes,
we assigned names based on lineage potential of cells downstream,
including an ‘erythro-myeloid’ progenitor (EMP) which gives rise to the
first two lineages, and a lymphoid-primed multipotent progenitor (LMPP),
based on previous knowledge of this cell state which can give rise to
both lymphoid and select myeloid populations (Kohn et al., 2012).