Skip to main content
Fig. 2 | Genome Biology

Fig. 2

From: Schema: metric learning enables interpretable synthesis of heterogeneous single-cell modalities

Fig. 2

Synthesis of RNA-seq and ATAC-seq information leads to more accurate cell type inference. a Leiden clustering [34] of per-cell profiles results in greater agreement (measured as the adjusted Rand index, ARI) with ground truth cell type labels when featurizing cells by RNA-seq profiles alone compared to featurizing with ATAC-seq profiles alone. ATAC-seq does provide relatively more information when distinguishing PT cells. b Ground truth labels from Cao et al. [6]. ce To assess the ground truth accuracy of Leiden clustering, we assigned each cluster to the cell type most frequently seen in the ground truth labels of its members. Clusters where labels are more mixed will thus have lower accuracy. Clustering on RNA-seq profiles alone (c,d) results in many PT cells assigned to such clusters. Schema synthesis of RNA- and ATAC-seq features, followed by Leiden clustering (e), results in significantly greater concordance with ground truth on PT cell types when compared to Leiden clustering on the RNA-seq features alone (one-sided binomial test, p = 6.7 × 10− 15). f ARIs of clusters from Schema-synthesized data are higher, especially for PT cells. Synthesizing the modalities using canonical correlation analysis (CCA), totalVI (an autoenconder-based deep learning approach), or a “pseudocell” approach described in the original study (see Methods) results in lower ARI scores

Back to article page