The expression of myosin heavy chain 7 is a canonical marker for Type 1 muscle fibers. Due to expression of MYH7 the cluster was annotated as type I muscle cell.
Raw BCL files were demultiplexed with the cellranger mkfastq software (v4.0.0) (Zheng et al., 2017). The demultiplexed libraries were then aligned to the human assembly GRCh38.p13 genome (National Library of Medicine, 2019) and counted using Cell ranger with pre-mRNA intronic regions included in the library (10X Genomics 2020).
The mapped cellranger count matrices were decontaminated for ambient RNA using SoupX (Young & Behjati, 2020) with the automatically detected threshold. For quality control, we removed cells which identified as outliers in either a high percentage of mitochondrial gene counts, a low number of features or low total counts. The quality control was run in R (v 4.1.0) using scran 1.22.1 (Lun et al., 2016), scater_1.22.0 and scuttle_1.4.0 (McCarthy et al., 2017). The adjusted count matrices were then used as the basis for the forward analysis in R (v 4.2.0) using Seurat (v4) (Hao et al., 2021). During the initial clustering of the data, it was clear that two clusters contained the majority of doublets due to the presence of multiple marker genes. These two clusters were removed manually prior to down-stream analysis.
For each individual sample library, we ran normalization (NormalizeData), feature selection (FindVariableFeatures), scaling (ScaleData) and PCA (RunPCA) separately before hierarchically merging the datasets using Seurat. The integration was done using the Seurat function FindIntegrationAnchors and IntegrateData. First, the data were integrated into groups based on training status and whether the samples were from the control or diabetes group. Then, the control samples and diabetes samples were merged separately. Finally, all the data were integrated for joint downstream analysis.
Using the Seurat functions FindNeighbors and FindClusters, the combined data set was clustered, and using RunUMAP and DimPlot, the dataset was visualized as a UMAP. Cluster labels were assigned using the Seurat function FindAllMarkers and then manually curated.