Skip to main content

Advertisement

Table 5 Known and novel predicted regulatory elements, obtained when applying FastCompare to H. sapiens and M. musculus

From: Fast and systematic genome-wide discovery of conserved regulatory elements using a non-alignment based approach

Sequence Rank DATG WATG Orientation U/C Experiment TRANSFAC Comments
(a) Known regulatory sequences
CCCGCCC 1 256 - - 2.26 8(7/1) Sp1, GC box Known Sp1 site, transcription from pol II promoter (p < 10-5)
GCCCCGCCC 2 165 - - 4.64 9(9/0) Sp1, GC box Known Sp1 site, variant from above
CCGGAAG 4 160.5 [0;700] - 2.37 - Ets1, Elk1 Known Ets site, RNA metabolism (p < 10-6)
CACGTGAC 18 122.5 [0;600] - 4.90 - USF, GBP, SREBP-1 Known Myc/Max site
TGACGTCA 19 107 [0;1000] - 4.24 - CREB Known CREB site
CGCATGCG 24 132 [0;1600] - 4.26 - - Known palindromic octamer sequence (POS)
CCAATCAG 37 239 [0;700] - 2.85 4(0/4) NF-Y, CCAAT Known CAAT box and CCAAT enhancer binding protein site
CGGAAGTGA 51 94 [0;1000] - 3.96 - STAT3 Known GA-binding protein (GAB) site
CCGCCTC 78 632 [0;500] - 4.26 9(8/1) - Known insulin response element
CACGTGG 82 429.5 [0;300] - 2.09 - USF, Myc-Max Known Myc/Max site, different from above
TAATCCCAG 119 1258 [100;2000] ← (p < 10-14) 7.06 3(1/2) - Similar to Bicoid (Drosophila), RNA processing (p < 10-5)
CACCTGC 227 925 [0;600] - 1.64 1(1/0) E47, Lmo2 Known ZEB site in vertebrates, Zfh-1 in Drosophila
ATTTGCAT 234 729 [0;300] - 1.95 - Oct-1 Known Oct-1 site, chromatin assembly/disassembly (p < 10-8)
CCAAGGTCA 242 801 [0;1800] - 1.59 - - Known HRE site
GGAAGTCCC 253 124.5 [0;300] - 2.60 - NFκB Known NFκB site
CAGCTGC 256 850 [0;1600] - 1.03 - AP-4, HEN1 Known AP-4, MyoD site
TTTCGCGC 275 245 -   2.42 - E2F Known E2F site
(b) Novel predicted regulatory sequences
CGCAGGCGC 6 127 - - 2.76 - - Unknown site
GCGCCGC 13 311 [0;1900] ← (p < 10-5) 1.41 - - Unknown site
TCTCGCGA 17 116 [0;1700] - 4.45 - StuAp Unknown site, similar to E2F
TTAAAAA 52 1142 [100;2000] - 2.19 21(0/21) - Unknown site
CTCCGCCC 60 242.5 [0;1300] - 3.85 - - Unknown site, similar to Sp1
CCCCTCCC 67 563 [0;500] → (p < 10-4) 5.12 1(0/1) - Unknown site, regulation of transcription, DNA-dependent (p < 10-5)
AAGATGGCG 76 334 [0;1300] - 1.14 - - Unknown site
CTGCGCA 89 199 [0;300] - 3.63 - - Unknown site
CCAGCCTGG 123 1245 [200;2000] - 4.42 - - Unknown site
CCTGCCC 162 788 [0;1800] - 1.55 21(20/1) E47/Sp1 Unknown site
CCCTTTAAG 166 230 [0;800] → (p < 10-10) 3.45 - - Unknown site
CCCCAGC 207 785 - - 1.42 22(22/0) - Unknown site
TACAACTCC 225 154 [0;700] - 2.51 - - Unknown site
GTGAGCCAC 248 1208 - → (p < 10-6) 6.28 - - Unknown site
  1. (a) For each known regulatory element, we show the best k-mer, its rank within the set of 284 highest scoring k-mers, the median distance to ATG (for occurrences upstream of genes within the conserved set), the optimal window, the orientation bias, the corrected ratio of upstream/coding bias, the total (upregulated/downregulated) number of microarray conditions in which the k-mer was found (see Materials and methods), TRANSFAC matches, and the best GO enrichment. (b) Novel predicted regulatory elements. k-mers shown here were selected from the list of 284 highest-scoring k-mers based on their short median distance to ATG, short optimal window, significant orientation bias, strong over-representation ratio (U/C), presence in upstream regions of over/underexpressed genes in several microarray conditions, palindromicity or resemblance to known sites in other species.