Skip to main content
Fig. 3 | Genome Biology

Fig. 3

From: SCA: recovering single-cell heterogeneity through information-based dimensionality reduction

Fig. 3

SCA recovers subtly defined cellular populations from a set of 307 cytotoxic T cells profiled using Smart-seq 3 [14]. a UMAP embedding computed from a 20-dimensional SCA representation using Euclidean nearest neighbors, with Leiden clusterings (left) and inferred cell types (right). Gamma-delta, MAIT, and T helper populations cleanly separate. b Dot plot of key marker gene groups in each SCA-derived Leiden cluster. Gamma-delta, MAIT, and T helper cells are clearly identifiable from their known marker genes. c Scatter plots of leading principal, independent, and surprisal components, colored by log-TPM (transcript per million) expression of key marker genes: the delta-receptor TRDV2 marks gamma-delta T cells [11], and SLC4A10 marks MAIT cells [26]. The leading surprisal components cleanly separate the gamma-delta and MAIT subpopulations, whereas the leading PCs and ICs blur these distinctions. d UMAP plots derived from 20-dimensional PCA, ICA, scVI, and SCA, and diffusion map embeddings of the data, as well as the PHATE embedding (Methods). CD8 T cells, CD4+ T helper cells, TRDV2+ gamma-delta T cells, and SLC4A10+ MAIT cells form distinct regions of the SCA-derived UMAP plot. e F1 scores for recovery of major T cell populations by various clustering schemes (Methods). For PCA, ICA, SCA, scVI, and diffusion maps, we assess Leiden clusters from the Euclidean 15-nearest neighbors graph with resolution 1. Leiden clusterings computed on the SCA representation consistently capture these cell types with highest accuracy. f Robustness analysis for cell type recovery with respect to the size of the neighborhoods used to compute SCA’s surprisal scores and the number of iterations. Performance improves with more iterations, and is stable across a wide range of neighborhood sizes

Back to article page