To derive cell state annotations, we performed graph-based leiden clustering across different resolutions (from 0.5 to 20). We found that unbiased metrics to select the optimal number of clusters such as the silhouette score were highly correlated with the number of clusters and inappropriate for use. Instead, we employed a flexible cluster assignment approach, wherein leiden clusters assigned at different resolutions were evaluated for their concordance with biological priors and assigned as specific cell states accordingly, with boundaries between cell states being informed by this prior information. These priors were dependent on the specific lineage being annotated:
● For differentiated cell populations including T cells, NK cells, B cells, Plasma cells, stromal cells, megakaryocytes, eosinophil/basophil/mast cells, pDCs, and cDCs, we leveraged existing annotations from bulk Bone Marrow references including Hay et al (Exp Heme 2018), Seurat Azimuth (Cell 2021), Granja et al (Nat Biotech 2019), and van Galen et al (Cell 2019).
● For Hematopoietic stem and progenitor cell populations (HSPCs), we performed SingleR scoring from a bulk RNA-seq reference dataset from stringently purified HSPC populations (Xie et al, Blood Cancer Discov 2021) alongside AUCell scoring of gene expression signatures of quiescent and activated HSCs (Garcia-Prat et al, Cell Stem Cell 2021). HSPC population annotations were also refined based on external scRNA-seq validation datasets of sorted HSPCs and an in-house dataset of sorted immunophenotypic LT-HSCs.
● For populations along B cell development, we performed SingleR scoring from an in-house bulk RNA-seq reference dataset (Iacobucci et al, Nat Cancer 2025) from purified B cell precursors (CLP: CD34+CD38+CD10+CD19-; Pre-Pro-B: CD34+CD38+CD10-CD19+; Pro-B: CD34+CD38+CD10+CD19+; Pre-B: CD34-CD38+CD10+CD19+; Mature B) and utilized previously reported marker genes across stages of B cell development.
● For populations along erythroid development, we performed SingleR scoring from a bulk RNA-seq reference dataset (An et al, Blood 2014) with purified erythroid populations (Pro-Erythroblast, Basophilic Erythroblast, Polychromatic Erythroblast, Orthochromatic Erythroblast), and utilized previously reported marker genes across stages of erythroid development.
● Established bulk RNA-seq profiles from sorted human populations between GMP and Monocytes or cDCs are not available, with the primary basis of these populations (e.g. early and late pro-monocytes) being morphological. Given the absence of clear ground truth reference profiles, we placed an emphasis on existing transcriptional annotations of myeloid development, evaluating the labels from Granja et al (Nat Biotech 2019) and projecting labeled BM data from van Galen et al (Cell 2019).
● As an additional layer of validation, whole transcriptome + Abseq single cell profiles from Triana et al (Nat Immunol 2021) were projected onto the reference map to evaluate the concordance between assigned cell types and surface marker profiles.