Skip to main content
Fig. 1 | Genome Biology

Fig. 1

From: Statistical learning quantifies transposable element-mediated cis-regulation

Fig. 1

craTEs uncovers cis-regulatory TE subfamilies from RNA-seq. A Overview of the craTEs model. Differences in expression [log(TPM)] for protein-coding genes between treatment and control samples (columns of matrix E) are modeled as a linear combination of the per-subfamily TE counts found in the cis-regulatory region (shaded beige) of each gene (columns of N). Differences in cis-regulatory activities for each treatment vs. control experiment (columns of A) are estimated by least squares. The cis-regulatory regions of each gene are defined as 50-kb long stretches of DNA 5′ and 3′ from promoter regions. Cis-regulatory regions exclude the exons (gray boxes) and promoters (orange boxes) of the genes they are assigned to. Gray bold lines: gene introns. Sequences of introns and exons: transcripts. B Proportion of integrants remaining at each step of the construction of N with respect to the original number of TEs present in the annotation (indicated in gray). “All TEs” refers to all integrants found in the TE database “Repeatmasker RELEASE 20170127” (number of unmerged TEs are indicated in gray). “cis-TEs” refers to integrants found in cis-regulatory regions before (“unfiltered”) and after (“filtered”, numbers indicated in red) removing those overlapping exons and promoters of the corresponding gene. C Seven case studies exemplifying the estimation of the cis-regulatory activities of TE subfamilies from RNA-seq data. Black dots are TE subfamilies with statistically significant (BH-adj. p-value <0.05, t-test) differences in activities between the treatment and control groups. 95% confidence intervals for the estimated cis-regulatory activities are shown as gray bars. Gray dots are TE subfamilies with non-significant differences in activities. Subtitle: p-value from the F-test of overall significance in regression. From left to right: CRISPRi-mediated repression of LTR5-Hs and SVA integrants in naïve hESCs, gRNA #1 (g#1) \(n=3\) (3 treatment samples vs. 3 control samples) [33]; CRISPRi-mediated repression of LTR5-Hs and SVA integrants in naïve hESCs, gRNA #1 (g#1) \(n=3\); CRISPRi-mediated repression of LTR5-Hs/A/B integrants in an embryonal carcinoma cell line (NCCIT), \(n=2\) [35]; CRISPRa-mediated activation of LTR5-Hs/A/B integrants in NCCIT, \(n=2\); CRISPRi-mediated repression of LTR2B integrants in K562, \(n=2\) [34]; overexpression of the pluripotency TF KLF4 in primed hESCs, \(n=4\) [33]; overexpression of the SVA-targeting KZFP ZNF611 in naïve hESCs, \(n=2\) [33]. D Proportion of variance of E explained by craTEs for each experiment in C

Back to article page