Skip to main content
Fig. 3 | Genome Biology

Fig. 3

From: Evaluation of commonly used analysis strategies for epigenome- and transcriptome-wide association studies through replication of large-scale population studies

Fig. 3

a CpG overlaps. The three 4-way Venn diagrams on top indicate the overlap in CpGs for each of the individual cohorts. These are based on the base model, using Bonferroni correction. The four diagrams below indicate the overlap between the strategies for each step, shown here for age, BMI, and smoking. These are the same strategies as shown in Fig. 2a. Yellow always represents the base model, and the green, red, blue, and purple colors belong to alternative strategies. (A) Beta values dataset in green, M-3IQR in blue, M in red, and RIN in purple. (B) LMM in green and RLMM in red. (C) Houseman6 imputed cell counts in green, Houseman3 imputed cell counts in red, and no cell count correction in blue. (D) Hidden confounder (HC) correction: model 1 (HCs independent of the exposure of interest, age, sex, known technical covariates, but not measured differential cell counts) in purple, model 2 (HCs independent of the exposure of interest, age, sex, measured differential cell counts, but not known technical covariates) in green, and model 3 (independent of the exposure of interest, age, sex, known technical covariates, and measured differential cell counts) in red. b Gene overlaps. The three 4-way Venn diagrams on top indicate the overlap in genes for each of the individual cohorts. These are based on the base model, using Bonferroni correction. The four diagrams below indicate the overlap between the strategies for each step, shown here for age, BMI, and smoking. These are the same strategies as shown in Fig. 2b. Yellow always represents the base model, and the blue, green, and red colors belong to alternative strategies. (A) DESeq normalization in blue and edgeR in red. (B) Removing very low-expressed genes (blue), low-expressed genes (red), or medium-expressed genes (green). (C) A limma linear model Fit in red, a standard GLM in blue, and the edgeR GLM adaptation in green. (D) Correcting for only technical covariates (blue) and only cell counts (red), adding five hidden confounders (purple), or replacing both for the first five principal components (green)

Back to article page