Skip to main content

Table 2 Open reading frame predictiona

From: Separating homeologs by phasing in the tetraploid wheat transcriptome

 

T. turgidum

T. urartu

Contigs (n)

140,118

86,247

Non-wheat sequencesb (eliminated) (n)

558

518

Wheat protein coding sequences

  

BLASTX, E-value cutoff 1e-3

96,244

59,439

Contigs with a Pfam domain (1e-3)

59,917

39,965

Contig sequences without BLASTX (1e-3) or Pfam (1e-3)

42,999

26,070

Predicted open reading frames

  

Predicted ORFs (non-redundant, >30 amino acids)

76,570

43,014

Fulllength

32,548

22,868

Missing 5' end

26,723

12,225

Missing 3' end

12,792

5,376

Missing 5' and 3' end

4,507

2,545

Putative pseudogenes (frameshift and/or premature stop codon)

9,937

5,208

Putative fused transcripts

  

Contigs with BLASTX on inconsistent strand

4,376

3,628

Contigs with >1 predicted ORFs (>30 amino acids, no repetitive elements, not a pseudogene)

2,164

1,349

Putative fused transcripts (excluding overlaps) (n)

6,409

4,866

  1. aOpen reading frames were predicted with a comparative genomics approach using the findorfprogram and BLASTX alignments (E-value cutoff 1e-5) between contigs and proteomes of barley, Brachypodium, rice, maize, sorghum, and Arabidopsis.
  2. bNon-wheat sequences were identified based on taxonomic distribution of top 10 BLASTX hits against nr.