Skip to main content
Fig. 2 | Genome Biology

Fig. 2

From: Improved modeling of RNA-binding protein motifs in an interpretable neural model of RNA splicing

Fig. 2

Overview of Sparse Adjusted Motif architecture: components and flow of information. Panels a through e summarize our splicing architecture using the baseline RBNS PSAM models (FM model); panels d and e describe the “Aggregator” component of our model, inspired by [28]; panels f and g summarize training of the “AM” models, which then replace the FM motifs shown in panel b. a The LSSI model processes the sequence and produces an annotation of the core 3′ and 5′ motifs. b The motif model processes the sequence and produces an estimate of RBP binding affinity at each site. c We enforce sparsity on the motif binding affinities, only allowing through high-affinity sites. d We compute influence scores for each position in the sequence; by multiplying with the sparse input, we ensure that these influence values are only used to increase or decrease the strengths of known binding sites. e We then run a long-range processor across the sequence to score potential splice sites. We multiply these with the outputs of the core motifs to produce our final predictions. We use an LSTM here as the structure of an LSTM’s dataflow graph is identical to that of the forward-backward algorithm, which is commonly used to find the marginal probabilities of states in an HSMM. f In our AM model, we first run our FM model and then sparsify it to a level of density k times denser than we intend to output (typically k = 2). g We also predict increase/decrease scores at each position. These scores are then added only to sites that were plausible binding sites. We then resparsify the output. This allows changing both the magnitudes of the sites arbitrarily as well as changing which sites are selected, while guaranteeing that all the produced AM sites are among the sites scored highly by the FM. Specifically, we compute AM’s output as AM(x) = FM(x)  Adj(x), where uv = 1(u ≠ 0)(u + v). h Table represents a high-level view of the two metrics of accuracy we consider. Both the FM and AM models are novel, though under the FM model, the Aggregator is novel while the motifs are previously described [27]. Our AM model improves over the FM model in all features, trading some accuracy versus SpliceAI in favor of being able to predict relevant RBP binding positions, and is the best at binding motif prediction. N/A indicates that SpliceAI is not capable of binding motif prediction

Back to article page