Skip to main content
Fig. 1 | Genome Biology

Fig. 1

From: CREaTor: zero-shot cis-regulatory pattern modeling with attention mechanisms

Fig. 1

Accurate gene expression prediction with CREaTor. a Schema of CREaTor. The model predicts target gene expression from the flanking cCREs with a hierarchical transformer structure. Localization of cCREs was obtained from ENCODE consortium. A combination of genomic sequences, chromatin accessibility, and a collection (3–13 types) of ChIP-seq profiles was used as input features for each cCRE. b Visualization of data split strategy: we trained our model on gene expression of 19 autosomes from 19 different cell lines and tissues respectively. Genes on chr16 from the 19 cell lines and tissues were used for parameter tuning (validation), while genes on chr8, 9 were used for model evaluation (in-cell type test chromosomes). Genes from all autosomes in K562 (cross-cell type test chromosomes) were detailly evaluated to demonstrate the model’s ability on cross-cell type gene expression and regulation modeling (Additional file 1: Fig. S2). c Pearson r between observed and predicted expression of genes. Left: Pearson r between observed and predicted expressions of genes on cross-cell type test chromosomes. Right: Pearson r between observed and predicted expressions of genes on in-cell type test chromosomes. Green and blue dots indicate chr8 and 9 respectively. See Additional file 2: Table S3 for results with different random seeds. d Clustering map of predicted and observed expression of K562 specific genes (calculated with RSME; see the “Methods” section) in different cell types. The leftmost column is the predicted value, which is clustered with the K562 observed gene expression data using the hierarchical clustering method. Expression values were transformed with log1p. Observed gene expression profiles from different sources (with different experiment IDs on ENCODE) for the same cell type are calculated independently

Back to article page