Skip to main content

Table 1 The 44 selected sequences within the ENCODE region

From: EGASP: the human ENCODE Genome Annotation Assessment Project

  

Random picks Mouse homology

 

Sequence Set

Manual picks

Low

Medium

High

Gene density

Training

ENm006

ENr132

ENr231

ENr333

High

   

ENr232

ENr334

 
 

ENm004

-

ENr222

ENr323

Medium

   

ENr223

ENr324

 
 

-

ENr111

-

-

Low

  

ENr114

   

Test

ENm002

ENr131

ENr233

ENr331

High

 

ENm005

ENr133

 

ENr332

 
 

ENm007

    
 

ENm008

    
 

ENm009

    
 

ENm010

    
 

ENm011

    
 

ENm001

ENr121

ENr221

ENr321

Medium

 

ENm003

ENr122

 

ENr322

 
 

ENm012

ENr123

   
 

ENm013

    
 

ENm014

    
 

-

ENr112

ENr211

ENr311

Low

  

ENr113

ENr212

ENr312

 
   

ENr213

ENr313

 
  1. ENCODE sequences were assigned to either the training or the test set based on annotation data availability (see the section 'The EGASP experiment'). For the performance evaluation, only the test set sequences were used. The numeric code for the randomly picked sequence names correspond to the non-exonic conservation with the mouse genome, the density of previously identified genes, and the sequence number, respectively; numbers vary from 1 (low), to 3 (high). Manually selected sequences range in size from 500 kbp to 2 Mbp, while random regions are 500 kbp. The selection and stratification criteria for all the sequences is described at the ENCODE project web site [34].