Skip to main content

Table 1 Progress in crop genome sequencing

From: Genomics reveals new landscapes for crop improvement

Species (common name)

Genome size

Ploidy

Sequence strategy

Publication

date

Assembly features

Reference

Oryza sativa (rice)

389 Mb

2n = 2x = 24

BAC physical map, Sanger sequencing

Aug 2005

Essentially complete chromosome arm coverage

[2]

Populus trichocarpa (black cottonwood)

550 Mb

2n = 2x = 38

BAC physical map,

WGS, Sanger sequencing

Sep 2006

2,447 cscaffolds containing 410 Mb, 82% of sequence genetically anchored

[3]

Vitus vinifera (pinot noir grape)

475 Mb

2n = 2x = 36

WGS, Sanger sequencing

Sep 2007

3,514 csupercontigs containing 487 Mb, 69% of sequence genetically anchored

[5]

Sorghum bicolor

(sorghum)

700 Mb

2n = 2x = 20

WGS, Sanger sequencing

Jan 2009

229 scaffolds containing 97% of the genome, 88% of sequence genetically anchored

[6]

Zea mays

(maize)

2,300 Mb

2n = 2x = 20,

one aWGD

allotetraploid

BAC physical map,

BAC sequence 4-6 x

deep

Nov 2009

2,048 Mb in 125,325 bcontigs forming 61,161 scaffolds

[4]

Glycine max (soybean)

1,115 Mb

Two

WGD

2n = 2x = 40

allopolyploid

WGS, Sanger sequencing

Jan 2010

397 scaffolds containing 85% of the genome, 98% of sequence genetically anchored

[7]

Malus × domestica

(apple)

750 Mb

One WGD

2n = 2x = 34

WGS, Sanger, Roche 454

Oct 2010

1,629 cmetacontigs containing 80% of the genome, 71% of sequence genetically anchored

[10]

Theobroma cacao

(cacao)

430 Mb

2n = 2x = 20

WGS, Sanger, Illumina, Roche 454

Dec 2010

524 scaffolds containing 80% of the genome, 67% of sequence genetically anchored

[11]

Fragaria vesca

(woodland strawberry)

240 Mb

2n = 2x = 14

WGS, Roche 454, Illumina, SOLiD

Dec 2010

272 scaffolds containing 95% of the genome, 94% of sequence genetically anchored

[13]

Phoenix dactylifera (date palm)

658 Mb

2n = 2x = 36

WGS, Illumina

June 2011

57,277 scaffolds containing 60% of the genome

[12]

Solanum tuberosum (potato)

844 Mb

2n = 4x = 48

Double monoploid DM and diploid RH,

WGS, Illumina, Roche 454

July 2011

443 superscaffolds containing 78% of the genome, 86% of the assembly genetically anchored

[14]

Brassica rapa

(Chinese cabbage)

485 Mb

Three

WGD

2n = 2x = 20

WGS, Illumina, BAC end Sanger sequencing

Aug 2011

288 Mb in scaffolds, 90% of the assembly genetically anchored

[15]

Medicago truncatula

(alfalfa relative)

375 Mb

WGD

2n = 2x = 16

BAC physical map,

Sanger, Illumina

Dec 2011

8 pseudomolecules containing 70% of the genome, 100% in optical map

[16]

Manihot esculenta

(cassava)

770 Mb

2n = 2x = 36

WGS, Roche 454,

BAC end Sanger sequencing

Jan 2012

12,977 scaffolds containing 80% of the genome

[19]

Cajanus cajan

(pigeonpea)

833 Mb

2n = 2x = 22

WGS, Illumina

Jan 2012

137,542 scaffolds containing 73% of the genome

[20]

Setaria italic

(foxtail millet)

500 Mb

2n = 2x = 18

WGS, Sanger, Illumina, BAC end sequence

May 2012

597 scaffolds containing 80% of the genome, 99% of the assembly genetically anchored

[21]

Solanum lycopersicum

(tomato)

900 Mb

2n = 2x = 24

WGS, Roche 454, Illumina and SOLiD,

BAC end Sanger sequencing

May 2012

91 scaffolds containing 85% of the genome, 99% of the assembly genetically anchored

[17]

Cucumis melo

(melon)

312 Mb

Three WGD

2n = 2x = 24

WGS, Roche 454, BAC end sequencing

July 2012

1,584 scaffolds containing 83% of the genome, 88% of the assembly genetically anchored

[22]

Musa acuminate

(Cavendish banana)

523 Mb

2n = 2x = 22

WGS, Roche 454, Sanger, Illumina

Aug 2012

24,425 contigs containing 90% of the genome, 70% of the assembly genetically anchored

[33]

Citrus sinensis

(Valencia sweet orange)

367 Mb

2n = 2x = 18

Dihaploid WGS, Illumina

Jan 2013

4,811 scaffolds containing 82% of the genome, 73% of the assembly genetically anchored

[23]

Gossypium raimondii (D genome cotton)

880 Mb

2n = 2x = 26

WGS, Illumina

Aug 2012

4,715 scaffolds containing 85% of the genome, 73% of the assembly genetically anchored

[24]

Hordeum vulgare

(barley)

5,100 Mb

2n = 2x = 14

WGS, Illumina, BAC physical map, BAC sequence (Roche 454, Illumina)

Nov 2012

Physical map (4.98 Gb), BAC sequence (1.13 Gb), WGS assemblies (1.9 Gb); integrated by physical map and syntenic order

[26]

Triticum aestivum

(bread wheat)

17,000 Mb

2n = 6x = 42

allopolyploid

WGS, Roche 454

Nov 2012

Orthologous group assembly, 437 Mb

[27]

Gossypium

raimondii (D genome cotton)

G. hirsutum (upland cotton)

880 Mb

2n = 2x = 26

AtDt allopolyploid

WGS, Sanger, Roche 454, Illumina

Illumina

Dec 2012

1,084 scaffolds containing 86% of the genome, 98% anchored and oriented to genetic map

82x coverage

[25]

Cicer arietinum

(chickpea)

738 Mb

2n = 2x = 16

WGS, Illumina

BAC end sequence

Jan 2013

7,163 scaffolds containing 64% of the genome

[31]

Phylostachys heterocycla

(bamboo)

2 Gb

2n = 2x = 48

WGS, Illumina

BAC end sequence

Apr 2013

80% of the 2.05 Gb assembly maps to 5,499 scaffolds of less than 62 kb

[34]

Picea abies

(Norway spruce)

20,000 Mb

2n = 2x = 24

fosmid pools with both haploid (megagametophyte) and diploid WGS

May 2013

Merged assembly 12.0 Gb, with 4.3 Gb in ≥10 kb scaffolds

[42]

Pinus taeda

(Loblolly pine)

24,000 Mb

2n = 2x = 24

WGS single haploid

megagametophyte assembly

In progress

  

Miscanthus sp.

(elephant grass)

1,500 Mb

One WGD,

diploid progenitors

2n = 2x = 38

WGS

In progress

  

Elais guineensis

Elais oleifera

(oil palm)

1,890 Mb

2n = 2x = 32

commercial F1 hybrids

WGS, BAC physical maps

In progress

  

Saccharum officinarum x S. spontaneum

(sugar cane)

>15,000 Mb

Diploid progenitors

x = 10; 2n = 80; × = 8; 2n = 40-128

WGS

In progress

  
  1. aWGD alloploids have a whole-genome duplication in recent lineage. bA contig is an unambiguous linear assembly of sequences with no physical gaps in coverage, but which can contain errors. cThe terms supercontig, scaffold or metacontig are used interchangeably to describe a set of contigs that are linked by a known physical distance but that contain sequence gaps. These scaffolds are usually created using mate-pair reads and BAC end sequences. dPseudomolecule is a term applied to a chromosome-scale assembly of contigs and scaffolds that is anchored to a long-range framework using genetic markers and other chromosome features, including cytogenetic features and deletions.