Skip to main content
Fig. 1 | Genome Biology

Fig. 1

From: Comprehensive enhancer-target gene assignments improve gene set level interpretation of genome-wide regulatory data

Fig. 1

Workflow for generating and evaluating EnTDefs. A Enhancers were defined by ENCODE ChromHMM UCSC tracks, ENCODE DNase-seq hypersensitive sites (DHSs), Cap Analysis Gene Expression (CAGE) experiment-derived enhancers from the FANTOM5 project, and/or distal and non-promoter DHS within 500 kb of the correlated promoter DHSs from Thurman et al. B The enhancer-target gene links were defined by ChIA-PET interactions from ENCODE ChIA-PET data (ChIA), DNase signal correlation-based links from Thurman et al., expression correlation-based interactions from FANTOM5, and/or interactions between enhancers and genes within loop (L) boundaries of ChIA-PET with convergent CTCF motifs (L1 [one gene], L2 [≤ two gene], or L3 [≤ three genes] were allowed). An enhancer can be assigned to multiple genes. To increase the genome coverage, we allowed the extension of enhancers to 1 kb (i.e., enhancer extension), and assigned other regions outside of 5 kb from a TSS to the nearest gene (i.e., “nearest_All” additional links). All combinations of the above, allowing multiple at a time, defined the possible enhancer-to-target gene definitions (EnTDefs). C Left: 1860 EnTDefs were generated and GOBP GSE testing was performed on 87 ENCODE TF ChIP-seq datasets using each of the EnTDefs. By comparing the significant GOBP terms identified by GSE with each EntDef to those assigned to the TF by the GO database (“GO annotation”), the F1 score was calculated for each EnTDef-TF pair. Right: the EnTDefs were ranked by average F1 score across TFs in descending order. TF paired Wilcoxon sum-rank test was performed between the top ranked EnTDef and each of the sequential ones to identify the set of best EnTDefs (top 1 until the rank with p-value < 0.01)

Back to article page