Skip to main content
Fig. 6 | Genome Biology

Fig. 6

From: QAPA: a new method for the systematic analysis of alternative polyadenylation from RNA-seq data

Fig. 6

Modeling the APA regulatory code using random forests. a Hexbin scatterplot comparing PPAU predictions made by random forests model on genes in the ND RNA-seq dataset [29] to the observed QAPA-assigned PPAU values. Only data on held-out genes not used in the training the model are shown here. Higher values indicate increased usage and vice versa. Bins are colored by number of data points. The dashed line indicates the reference diagonal. The blue line represents a polynomial spline of best fit to the data. b Dot plot showing the top six features from the model. The x-axis indicates the importance of each feature (see “Methods”), scaled between 0 and 100. Higher values indicate that the feature has stronger predictive value than lower values. Note that the Conservation, Cis RBP motifs, and Upstream AAUAAA-like cis RBP motifs features shown are the sum of the importances from all the corresponding binned conservation-related and motif-related features. c Zoom-in dot plot showing the importances of the top eight motif features from the Cis RBP motifs set. This set consists of RBP motifs that are not similar to the AAUAAA poly(A) signal. Each motif is labeled according to the corresponding RBP, IUPAC motif, and bin region. d Zoom-in dot plot showing the importances of individual Upstream AAUAA-like RBP motifs. These features are likely predictive due to their similarity to the canonical poly(A) signal AAUAAA. e Distribution of 18 poly(A) signals in mouse, grouped by poly(A) site type: proximal (poly(A) site closest to stop codon), distal, and single (genes with one poly(A) site). f Similar to e, distribution of 16 poly(A) site dinucleotides, grouped by poly(A) site type

Back to article page