Skip to main content

Table 2 Sequence and gene family discovery rates for various complete and partial genome datasets

From: The global landscape of sequence diversity

  

Sequence rate (%)

Family rate (%)

Dataset*

No. of complete/partial genomes

OSDR

CSDR

OGDR

CGDR

CG Archaea

19

37.8

-

38.7

-

CG Bacteria

161

19.5

11.8 (± 1.5)

22.4

15.4 (± 1.8)

CG Bacteria strains filtered

127

28.4

15.9 (± 1.5)

26.6

20.6 (± 1.7)

CG Bacteria

127

 

13.4 (± 1.7)

 

17.0 (± 2.0)

CG Bacteria species filtered

86

23.2

20.9 (± 1.6)

31.5

26.1 (± 1.6)

CG Bacteria

86

 

16.3 (± 1.8)

 

19.9 (± 2.1)

CG Eukarya

19

39.0

-

30.8

-

PG All

193

53.7

40.3 (± 2.9)

47.7

42.8 (± 2.8)

PG Arthropods

16

74.7

-

66.4

-

PG Deuterostomes

21

71.7

-

60.8

-

PG Fungi

27

70.2

-

60.2

-

PG Nematodes

31

62.8

-

47.0

-

PG Protists

17

88.1

-

71.5

-

PG Viridiplantae

76

48.3

-

37.8

-

CG Bacteria sequences > 100 residues

161

-

8.6 (± 1.4)

-

-

PG Sequences > 300 bp

193

-

35.6 (± 2.8)

-

-

  1. *CG, complete genome datasets; PG, partial genome datasets; 'strains filtered' indicate that only a single species representative was included in the analysis; 'species filtered' indicate that only a single genus representative was included in the analysis. OSDR, overall sequence discovery rate (the total number of distinct sequences/total number of sequences); CSDR, current sequence discovery rate (obtained from Figure 1d, e); OGDR, overall gene family discovery rate (total number of families/total number of sequences); CGDR, current gene family discovery rate (obtained from Figure 1d, e).