Skip to main content
Fig. 5 | Genome Biology

Fig. 5

From: Schema: metric learning enables interpretable synthesis of heterogeneous single-cell modalities

Fig. 5

Schema reveals the locations and amino acids important in preserving binding specificity of T cell receptor CDR3 regions (https://help.biorender.com/en/articles/3619405-how-do-i-cite-biorender). a We analyzed a multimodal dataset from 10x Genomics [5] to understand how a T cell receptor’s binding specificity relates to the sequence variability in the CDR3 regions of its α and β chains. The primary modality consisted of CDR3 peptide sequence data which we correlated with the secondary modality, the binding specificity of the cell against a panel of 44 epitopes. We optionally synthesized an additional modality, proteomic measurements of 12 cell-surface marker proteins, as a use-case of incorporating additional information (Methods). b We performed two Schema analyses: (B.1) To infer location-wise selection pressure, each feature of the primary modality corresponded to a location in CDR3 sequence; (B.2) To infer amino acid selection pressure, the primary modality was the Boolean vector of residues observed at a specific sequence location; we aggregated over an ensemble of Schema runs across various locations. c, d Schema identifies sequence locations 3–9 (α chain) and 5–12 (β chain) as regions where sequences can vary with a comparatively modest impact on binding specificity. We compared Schema’s scores to statistics computed from motifs in VDJdb. Here, we have inverted the orientation of Schema’s weights to align them with the direction of VDJdb weights. e Schema and VDJdb agree on the relative importance of amino acids in preserving binding specificity (Spearman rank correlation = 0.74, two-sided t-test p = 2 × 10− 4). The low weight assigned to cysteine is likely due to its infrequent occurrence in the data

Back to article page