Skip to main content
Fig. 5 | Genome Biology

Fig. 5

From: Improved modeling of RNA-binding protein motifs in an interpretable neural model of RNA splicing

Fig. 5

AM motifs perform better at predicting splicing, eCLIP peaks, MPRA activity, and knockdown data than FM motifs. The SAM model performs improves substantially at end-to-end prediction over existing RBNS-derived FM motifs, though it underperforms SpliceAI. In addition, the AM model, unlike SpliceAI, can also perform other motif-specific tasks, in all cases better than the FM motifs. a End-to-End Accuracy: Bars represent top-k accuracy of prediction of 3′SS (dark green) and 5′SS (light green) for different models trained on the SpliceAI training set and tested on the last 45% of the SpliceAI test set (the rest is used for validation. This 45% constitutes 37.14M nt across 730 genes). Error bars indicate the minimum and maximum performance among 5 replicates. The MaxEnt model is deterministic so does not have an error bar. b eCLIP peak prediction: (i) Schematic of a genomic region showing real eCLIP peaks (cyan) and randomly generated control peaks (purple). The control peaks are generated by shifting the locations of peaks for a given RBP to random positions in the same transcript. AM motifs (red) and sparsity-matched FM motifs on the sequence are also shown. (ii) We count peak overlap for each of the four comparisons depicted (in practice, we collect this separately per motif and average all 4 values over the set of motifs). We compute enrichments for each model, and then calculate the % increase from one to the other. (iii) Plotted are the mean relative enrichments across motifs, separately for exon and intron. AM-E models are provided as a reference for the maximum improvement we could hope to achieve using a motif model on eCLIP data. By this standard, the AM models achieve about 10% of the theoretical maximum performance improvement over FMs on introns and about 50% of the theoretical maximum performance improvement on exons. c MPRA activity prediction: (i) Schematic of the experimental setup from [36]. Reporters were used involving a pair of alternative 5′ (donor) splice sites, or a pair of alternative 3′ (acceptor) splice sites, such that the degenerate region in red is either included in the exon or spliced out (triangular lines depict possible introns). We ignore the other degenerate region (in orange). To compute the Relative Intron Inclusion (RII) score, we subtract the read count of the splice site that indicates the red region is in the intron from the one that indicates it is not. (ii) We compute the Baseline Motif Activity by looking at just FM sites and non-sites, as we are confident that FMs indicate binding. We compute the advantage gap for AMs by looking at the difference of uniquely AM and uniquely FM slices: note that this is a symmetric measurement. (iii) The relationship between these two metrics is significantly positive, on both the 3′ and 5′ data. We highlight the three well-established motifs from Fig. 6 in blue. d Knockdown modeling: (i) We use an existing dataset of skipped exons and corresponding in vivo knockdown results. (ii) Filtered for reliability both by looking at low FDR and by looking at high counts. (iii) We run our own model to compute in silico knockdowns. (iv) Our metric of interest is predictive accuracy of using the in silico values to predict the knockdown values, both using sign and magnitude above/below median. (iv) Results at predicting experimental sign and magnitude from in silico sign and magnitude. Different AM models indicate different replicates; error bars indicate 95% bootstrap confidence intervals

Back to article page