Identifying branch points in the developmental hierarchy
The MST computed on diffusion distances represents an un-rooted and multi-branched developmental hierarchy. To parse this model, we first assigned a root node to the MST, choosing the micro-cluster with the highest expression of AVP , a gene most highly upregulated in our stem cell cluster (C6, Table S2). We note our downstream results are highly robust to the exact choice of root, as long as we choose a root node corresponding to an HSC cluster. After root assignment, the ‘developmental progression’ of each downstream node, can be represented as the length of the shortest path connecting it to the root.
Next, we identified terminal leaves in the tree, representing nodes with only parents and no children. We calculated the path lengths from the root node to all terminal nodes, and selected the four terminal nodes with the longest path length to represent the four ‘endpoints’ for hematopoietic differentiation into distinct hematopoietic lineages. Each of these terminal nodes represented micro-clusters corresponding to distinct cell states as determined in Figure 1, specifically erythroid, eosinophil/basophil/mast, neutrophil/monocyte, and lymphoid progenitors. Therefore, we can treat the four terminal nodes as ‘end points’ of developmental progression towards four distinct hematopoietic lineages.
We next identified the ‘branch points’ in our proposed hierarchy, which can be directly determined from the MST structure. To obtain these, we identified the shortest path along the MST between all pairs of terminal nodes. The point on each shortest path that is closest to the root node represents a transcriptomic bifurcation, or ‘branch point’, in the model. Across all comparisons, we identified three transcriptomic ‘branch points’, representing a developmental hierarchy summarized in Figure 2C.
Lastly, we assigned each ‘micro-cluster’ a branch identity. To do this, we divided the MST into a series of ‘segments’. These can be easily visualized in Figure S2C. This figure shows the same MST structure as Figure 2B, but on a different layout, which is based on multidimensional scaling of the MST-based distance matrix, ensuring that the different segments of the MST, and the branch points which connect them, can be easily visualized.
Cells located prior to the first bifurcation (the branch point closest to the HSC) are annotated as HSC/MPP, and segments leading to terminal nodes were named based on their downstream lineage (i.e. ‘Er’, ‘Eo/Ba/Ma’, ‘Lym’, ‘Neu/Mo’). For intermediate segments, which were downstream of the first bifurcation but did not lead to terminal nodes, we assigned names based on lineage potential of cells downstream, including an ‘erythro-myeloid’ progenitor (EMP) which gives rise to the first two lineages, and a lymphoid-primed multipotent progenitor (LMPP), based on previous knowledge of this cell state which can give rise to both lymphoid and select myeloid populations (Kohn et al., 2012).