Skip to main content

Table 2 KOGs represented by exactly one ortholog in seven analyzed eukaryotic genomes (examples)

From: A comprehensive evolutionary classification of proteins encoded in complete eukaryotic genomes

KOG number

(Predicted) function

Multiprotein complex

Functional class*

Prokaryotic homologs

Fitness class†

Comments

     

Yeast‡

Worm§

 

Genes experimentally or computationally characterized previously

0392

SNF2 family DNA-dependent ATPase

TBP-DNA complex

 

Many bacteria and archaea (COG0553)

0

1

Involved in regulation of transcription from POL II promoters [104]

0121

Nuclear cap-binding protein complex, subunit CBP20 (RRM-domain-containing RNA-binding protein)

Cap-binding complex

A

Several bacteria (COG0724)

1

X

RRM-domain proteins show scattered presence in bacteria and might have been horizontally transferred from eukaryotes

0213

U2-snRNP associated splicing factor 3b, subunit 1

Spliceosome

A

None

0

0

 

0227

snRNA-associated protein, splicing factor 3a, subunit b (Prp11p)

Spliceosome

A

None

0

0

 

2268

Predicted nucleic-acid-binding protein kinase of the RIO1 family; 40S ribosomal subunit biogenesis/18S rRNA processing

Pre-40S subunit

A

Orthologs in most archaea but not in bacteria (COG0478)

0

X

One of the very small number of protein kinases that show a clear-cut orthologous relationship between all eukaryotes and most archaea, and, apparently, the only one containing a helix-turn-helix nucleic-acid-binding domain. [105] Associated with yeast pre-40S subunit and required for its maturation. [106]

3031

Protein required for 60S ribosomal subunit biogenesis; [107] contains the IMP4 domain, which is involved in rRNA processing [108]; paralog of KOG3095 and KOG3292, which are also represented in all analyzed genomes.

Processosome

A

Distantly related to COG2136, represented by orthologs in most archaea, but not in bacteria (KSM, unpublished)

0

X

The COG2136 proteins appear to be subunits of the predicted archaeal exosome [109]. Apparently, this gene has undergone at least two ancient duplications in eukaryotes

3045

Predicted RNA methylase involved in rRNA processing

Processosome?

A

Distantly related to numerous Rossmann-fold methylases but prokaryotic orthologs could not be confidently identified

1

1

This protein (Rrp8p in yeast) has been shown to participate in the processing of rRNA and sequence analysis reveals the presence of a Rossmann-fold methylase domain [110]. Therefore Rrp8p probably methylates either snoRNA or rRNA itself.

3064

RNA-binding nuclear protein containing a distinct C4 Zn-finger; implicated in the biogenesis of 60S ribosomal subunits [111]

Processosome

A

None

0

0

Initially identified in yeast as the MAK16 protein required for dsRNA virus reproduction [112]

0291, 0302, 0306, 310, 0319, 0650, 1272

WD40-repeat proteins, subunits of rRNA processing complexes [69, 70]

Processosome

A

WD40-repeat proteins are present in several bacterial lineages and are particularly abundant in cyanobacteria but are missing in most archaea; none of them appear to be obvious orthologs of this protein (COG2319)

all 0

X,X,1,X,1,1,1

 

0284

Polyadenylation factor I complex, subunit PFS2, WD40-repeat protein

Poly-adenylation complex

A

Same as above (COG2319)

0

X

 

0337

RNA helicase involved in 28S rRNA processing

Processosome

A

Most of the archaea and bacteria (COG0513)

0

X

 

0343

RNA helicase involved in 28S rRNA processing

Processosome

A

Most of the archaea and bacteria (COG0513)

0

X

 

1069

3'-5' exoribonuclease (RNAse PH), exosome subunit Rrp46

Exosome

A

Most bacteria and archaea (COG0689)

0

1

 

1070

Exosome subunit Rrp5 (RNA-binding S1 domain fused to TPR repeats)

Exosome

A

Most bacteria (COG0539, COG0457)

0

1

 

1135

mRNA cleavage and polyadenylation complex subunit CFT2 (CPSF)

Cleavage and polyadenylation complex

A

Most archaea and some bacteria (COG1236)

0

0

 

1914

mRNA cleavage and polyadenylation factor I complex, subunit RNA14

Cleavage and polyadenylation complex

A

None

0

X

 

1975

RNA (guanine-7-) methyltransferase (capping enzyme subunit)

Capping enzyme

A

Numerous methyltrans-ferases (COG0500) but no ortholog

0

1

 

2051

Nonsense-mediated mRNA decay complex, subunit 2

NMD complex

A

None

1

X

 

2554

Pseudouridylate synthase

?

A

Most archaea and bacteria (COG0101)

1

1

 

2613

Upf1p-interacting protein, NMD complex subunit Nmd3p

NMD complex

A

All archaea, no bacteria (COG1499)

0

X

 

2771

tRNA-specific adenosine-34 deaminase subunit Tad3p

Heterodimeric RNA-specific deaminase

A

Most bacteria and some archaea (COG0590)

0

X

 

2780

Protein involved in ribosomal large subunit assembly (RPF1), contains IMP4 domain

Processosome

A

Most archaea, no bacteria (COG2136)

0

1

 

2781

Subunit of the small (ribosomal) subunit (SSU) processosome (snoRNP), IMP4

Processosome

A

Most archaea, no bacteria (COG2136)

0

1

 

2874

Protein involved in rRNA processing and ribosomal assembly

?

A

All archaea, no bacteria (COG1094)

0

1

Predicted RNA-binding protein containing KH domain

3013

Exosome subunit Rrp4

Exosome

A

Most archaea, on bacteria (COG1097)

0

X

 

3031

Protein involved in large ribosome subunit assembly and 28S rRNA processing (Rrf2)

Processosome

A

None

0

X

Contains the BRIX domain

3322

RNAse P/MRP subunit, involved in processing of pre-tRNAs and the 5.8S rRNA

RNAse P/MRP holoenzyme

A

None

0

1

 

3448

Predicted snRNP core protein

Spliceosome

A

All archaea, no bacteria (COG1958)

0

1

 

3482

Small nuclear ribonucleoprotein (snRNP) SMF subunit

Spliceosome

A

All archaea, no bacteria (COG1958)

0

0

 

2463

Predicted RNA-binding protein, consisting of a PIN domain and a Zn-ribbon. Involved in 26S proteasome assembly

26S proteasome, pre-40S subunit

A,O

Represented by orthologs in all archaea but no bacteria (COG1349)

0

X

PIN domain has been detected in exosome subunits and is thought to have RNA-binding properties or even nuclease activity [113, 114]. The demonstration of the role of this protein (Nob1p) in proteasome assembly [115], 40S ribosome subunit assembly, and the processing of 18S rRNA 3'-end [116] supports the connection between degradation of RNA and proteins that seems to have been established already in archaea [109].

3273

Predicted RNA-binding protein containing KH domain, interacts with Nob1p

26S proteasome, pre-40S subunit

A,O

Orthologs in all archaea but no bacteria (COG1094)

0

0

This is the second predicted RNA-binding protein involved in proteasome assembly, [115] which emphasizes the aforementioned link between RNA and protein processing

1831

Deadenylating 3'-5' exonuclease, negative regulator of PolII transcription

CCR4-NOT core complex

AK

None

0

0

 

1159

NADP-dependent flavoprotein reductase, probably sulfite reductase subunit

?

CL

Many bacteria (COG0369)

0

X

Genetic evidence of a role in DNA replication [117]

1800

Ferredoxin/adrenodoxin reductase

?

C

Most bacteria and some archaea (COG0493)

0

X

 

1173

Anaphase-promoting complex (APC), Cdc16 subunit (TPR-repeat protein)

APC

D

Most of archaea and bacteria have TPR-repeat proteins (COG0457) but no orthologs of Cdc16

0

0

 

3437

Anaphase-promoting complex (APC), subunit 10

APC

D

None

1

1

 

1358

Serine palmitoyltransferase

?

I

Most bacteria and some archaea (COG0156)

0

0

 

1511

Mevalonate kinase

?

I

Most archaea and some bacteria (COG1577)

0

X

 

3059

N-acetylglucosaminyltransferase complex, subunit PIG-C/GPI2, involved in phosphatidylinositol biosynthesis

N-acetylglucos-aminyltransferase complex

I

None

0

1

 

0467

Translation elongation factor 2 paralog (GTPase)

?

J

All (COG0480)

0

X

Involved in 60S ribosomal subunit maturation [118]

1147

Glutamyl-tRNA synthetase

Multispecificity aminoacyl-tRNA synthetase complex

J

All (COG0008)

0

X

 

2784

Phenylalanyl-tRNA synthetase, beta subunit

Heterodimeric phenylalanyl-tRNA synthetase

J

All (COG0016)

0

X

 

3123

Diphtamide synthase (methyltransferase)

?

J

All archaea, no bacteria (COG1798)

1

1

 

0261

RNA polymerase III, largest subunit

RNAPIII holoenzyme

K

All (COG0086)

0

X

 

0262

RNA polymerase I, largest subunit

RNAPI holoenzyme

K

All (COG0086)

0

X

 

0215

RNA polymerase III, second largest subunit

RNAPIII holoenzyme

K

All (COG0085)

0

X

 

0216

RNA polymerase I, second largest subunit

RNAPI holoenzyme

K

All (COG0085)

0

X

 

1063

RNA polymerase II elongator complex, subunit ELP2, WD repeat protein

RNA polymerase II elongator complex

K

WD40-repeat proteins are present in several bacterial lineages and are particularly abundant in cyanobacteria but are missing in most archaea; none of them appear to be obvious orthologs of this protein (COG2319)

1

X

 

1131

RNA polymerase II transcription initiation/nucleotide excision repair factor TFIIH, 5'-3' helicase subunit RAD3

RNAPII holoenzyme

K

Most archaea and bacteria (COG1199)

0

X

 

1920

RNA polymerase II Elongator subunit

RNAP II elongator complex

K

None

1

X

 

1932

TBP-associated factor (Taf2p)

TFIID complex

K

None

0

X

 

2009

Transcription initiation factor TFIIIB, Bdp1 subunit (Myb domain)

TFIIIB

K

None

0

0

 

2076

RNA polymerase III transcription factor TFIIIC, TPR-repeat-containing protein

TFIIIC

K

Most of archaea and bacteria have TPR-repeat proteins (COG0457) but no orthologs of TFIIC

0

X

 

2487

RNA polymerase II transcription initiation/nucleotide excision repair factor TFIIH, subunit TFB4

TFIIH

K

None

0

1

 

2691

RNA polymerase II subunit 9

RNAP II holoenzyme

K

Most archaea, no bacteria (COG1594)

1

X

 

2807

RNA polymerase II transcription initiation/nucleotide excision repair factor TFIIH, SSL1 subunit

TFIIH

K

No orthologs although von Willebrand A domains are present in a variety of prokaryotic proteins

0

0

Consists of a von Willebrand A domain most closely related to those in the proteasome subunit RPN10 [119] and a Zn-finger domain

2907

RNA polymerase I transcription factor TFIIS, subunit A12.2/RPA12

TFIIS

K

All archaea, no bacteria (COG1594)

1

0

 

3169

RNA polymerase II transcriptional regulation mediator

Mediator complex [120]

K

None

0

X

 

3233

RNA polymerase III subunit C34

RNAP III holoenzyme

K

None

0

1

 

3297

RNA polymerase III subunit C25

RNAP III holoenzyme

K

All archaea, no bacteria (COG1095)

0

0

 

3438

Subunit common to RNA polymerases I (A) and III (C); Rpc19p

RNAP I and III holoenzymes

K

 

0

1

 

3471

RNA polymerase II transcription initiation/nucleotide excision repair factor TFIIH, subunit TFB2

TFIIH

K

None

0

X

 

3490

Transcription elongation factor SPT4, Zn-ribbon protein

Chromatin-associated transcription complexes

K

None

1

1

 

3497

RNA polymerase II subunit; Rpb10p

RNAP II holoenzyme

K

All archaea, no bacteria (COG1644)

0

X

 

3901

Transcription initiation factor IID subunit (Taf13p)

TFIID

K

None

0

X

 

3949

RNA polymerase II elongator complex, subunit ELP4

RNAP II elongator complex

K

None

1

1

 

4086

SOH1 protein potentially involved in Pol II transcription regulation and repair

SMCC complex [121]

K

None

1

X

 

1532

Predicted GTPase of the XAB1 family [122]

TBP-free TAF(II) complex

L

All archaea and several bacteria (COG1100)

0

0

XP-A-binding protein in humans, thus implicated in repair ([122] and references therein).

1533

Predicted GTPase of the XAB1 family (paralog of KOG1757) [122]

TBP-free TAF(II) complex?

L

All archaea and several bacteria (COG1100)

0

X

Might have a function in repair given the paralogous relationship with KOG1757.

1625

DNA polymerase α processivity subunit, inactivated phosphatase

DNA polymerase α holoenzyme

L

Small subunit of archaeal DNA polymerase II (COG1311)

0

0

The small, regulatory subunit of DNA polymerase α also forms a pan-eukaryotic KOG3044, which is a paralog of KOG0861 (the only recent duplication in KOG3044 is seen in vertebrates). In contrast, another paralog, the small subunit of DNA polymerase ε, is represented in animals, fungi and the early-branching protozoan Plasmodium, but not in plants or Microsporidia. Thus, the history of this polymerase subunit apparently involved inactivation of the phosphatase (or nuclease) inherited from archaea, with subsequent duplications at early stages of eukaryotic evolution [123]

0479

DNA replication licensing factor MCM3

Pre-replication complex

L

All archaea, no bacteria (COG1241)

0

X

 

0481

DNA replication licensing factor MCM5

Pre-replication complex

L

All archaea, no bacteria (COG1241)

0

X

 

0482

DNA replication licensing factor MCM7

Pre-replication complex

L

All archaea, no bacteria (COG1241)

0

0

 

0964

Structural maintenance of chromosome protein 3 (cohesin subunit SMC3)

Sister chromatid cohesion complex

L

Many archaea and bacteria (COG1196)

0

X

 

0979

Structural maintenance of chromosome protein 5 (cohesin subunit SMC5)

Sister chromatid cohesion complex

L

Many archaea and bacteria (COG1196)

0

X

 

1942

TBP-interacting protein TIP49 (DNA helicase)

chromatin remodeling complex

L

Most of the archaea, no bacteria (COG1224)

0

0

 

1979

DNA mismatch repair ATPase, MLH1

Mismatch repair complex

L

Most bacteria and some archaea (COG0323)

1

1

 

2267

DNA primase, large subunit

DNA polymerase α:primase complex

L

All archaea, no bacteria (COG2219)

0

0

 

2299

Ribonuclease HI

Replisome

L

All archaea, most bacteria (COG0164)

1

X

 

2310

DNA repair exonuclease MRE11

MRN complex involved in double-strand break repair

L

All archaea, most bacteria (COG0420)

1

1

 

2929

Origin recognition complex, subunit 2 (ORC2)

ORC

L

None

1

1

 

0179

20S proteasome, regulatory subunit beta type PSMB1/PRE7 (paralog of KOG0185)

20S proteasome

O

All archaea but only actinomycetes among bacteria (COG0638)

0

0

 

0185

20S proteasome, regulatory subunit beta type PSMB4/PRE4 (paralog of KOG0179)

20S proteasome

O

All archaea but only actinomycetes among bacteria (COG0638)

0

0

 

2708

Predicted metalloprotease with chaperone activity (RNAse H/HSP70 fold) [124]

Putative complex involved in translation regulation [125]

O

Represented by orthologs in all archaea and bacteria (COG0533)

0

X

One of the few remaining uncharacterized proteins that are universally conserved in all cellular life forms. The only experimentally demonstrated activity is that of sialoglycoprotease but fusion with a distinct protein kinase in several archaea and analysis of gene neighborhood suggest a fundamental role in signal transduction, possibly translation regulation. [125]

0301

Protein required for normal rates of ubiquitin-dependent proteolysis, contains WD40 repeats

Proteasome?

O

Same as above (COG2319)

1

X

 

0358

Chaperonin complex component, TCP-1 delta subunit (CCT4)

TCP-1

O

All archaea and nearly all bacteria (COG0459)

0

0

 

0363

Chaperonin complex component, TCP-1 beta subunit (CCT2)

TCP-1

O

All archaea and nearly all bacteria (COG0459)

0

0

 

0687

26S proteasome regulatory complex, subunit RPN7/PSMD6

26S proteasome

O

None

0

0

 

1299

Vacuolar sorting protein VPS45/Stt10 (Sec1 family)

t-SNARE complex

O

None

1

X

Involved in t-SNARE complex assembly [126]

1349

GPI-anchor transamidase complex, GPI8 subunit

GPI-anchor transamidase complex

O

Distantly related proteases in some bacteria (no COG)

0

1

 

1943

Beta-tubulin folding cofactor D, involved in chromosome segregation

?

O

None

1

1

 

2015

NEDD8-activating complex, UBA3 subunit

NEDD8-activating complex

O

Most bacteria and some archaea (COG0476)

1

1

 

2126

Phosphoethanolamine N-methyltransferase involved in GPI-anchor biosynthesis

?

O

Several bacteria and archaea (COG1524)

0

X

 

2884

26S proteasome regulatory complex, subunit RPN10/PSMD4

26S proteasome regulatory complex

O

No orthologs although von Willebrand A domains are present in a variety of prokaryotic proteins

1

1

Contains von Willebrand A domain

2908

26S proteasome regulatory complex, subunit RPN9/PSMD13

26S proteasome regulatory complex

O

None

0

0

Contains PINT domain

0209

Endoplasmic reticulum membrane P-type ATPase

?

P

Many bacteria and some archaea (COG0474)

1

X

 

3379

Uncharacterized member of the histidine triad superfamily of nucleotide hydorlases

?

R

Most archaea and bacteria (COG0537)

1

X

Only biochemical function predicted.

2635

Coatomer (COPI) complex delta subunit

COPI complex

U

None

0

0

 

2927

Membrane component of ER protein translocation apparatus (Sec62)

Sec complex

U

None

0

1

 

2978

Dolichol-phosphate mannosyltransferase

?

U

All archaea, most bacteria (COG0463)

0

X

 

3198

Signal recognition particle, subunit Srp19

Signal recognition particle

U

All archaea, no bacteria (COG1400)

0

X

 

3315

Subunit of the targeting complex (TRAPP) involved in ER to Golgi trafficking

TRAPP

U

None

0

X

 

3369

Subunit of the targeting complex (TRAPP) involved in ER to Golgi trafficking

TRAPP

U

None

0

X

 

1992

Nuclear export receptor CSE1/CAS (importin beta)

?

YU

None

0

X

 

New functional predictions

      

2316

PP-loop family ATP pyrophosphatase domain, which in fungi, plants and insects is fused to a duplicated translation inhibitor domain. The fusion, along with the phyletic pattern of the PP-ATPase domain, suggests an essential function in translation regulation

?

A

Orthologs of the PP-loop domain are present in all archaea (COG2102) but not in bacteria. Orthologs of the translation inhibitor domain are found in most bacteria and several archaea (COG0251)

1

X

PP-loop ATPases have been previously implicated in base thiolation in various RNAs [127] and proteins in this K/COG might have a similar function, which is likely to be conserved in eukaryotes and archaea. However, the fusion with translation inhibitor, which has been reported to have endoribonuclease activity [128] is a eukaryote-specific feature

2523

Predicted RNA-binding protein containing a PUA domain, probable role in RNA modification [129]

Putative novel RNA modification complex

A

Orthologs present in all archaea (COG2016) but not in bacteria

1

X

Several of the archaeal orthologs of this protein form fusions with a PP-loop ATPase domain implicated in base thiolation [127]. Thus, the proteins of this KOG might interact with those of KOG2840 (pan-eukaryotic, duplications in Arabidopsis and worm) or KOG2594 (missing in humans and microsporidia) to form a novel enzymatic complex involved in RNA modification

0270, 0271, 1539

WD40-repeat proteins

Processosome

A

WD40-repeat proteins are present in several bacterial lineages and are particularly abundant in cyanobacteria but are missing in most archaea; none of them appear to be obvious orthologs of this protein (COG2319)

all 0

X,1,X

By analogy with other conserved WD40-repeat proteins, predicted to be subunits of rRNA processing/ribosome assembly complexes

2321

Nucleolar protein, contains WD40 repeats

rRNA processosome?

A

WD40-repeat proteins are present in several bacterial lineages and are particularly abundant in cyanobacteria but are missing in most archaea; none of them appear to be obvious orthologs of this protein (COG2319)

0

1

Probable subunit of an rRNA-processing complex

1763

Uncharacterized conserved protein containing a CCCH Zn-finger; possible role in RNA processing or splicing

?

A

None

1

1

CCCH fingers have been shown to bind 3' untranslated regions in various mRNAs [130, 131]

2837

Protein containing a U1-type, RNA-binding C2H2 Zn-finger. Probable role in RNA splicing/processing

Spliceosome?

A

None

0

0

U1-type fingers are essential for the assembly of U1 RNP [132]

3073

Predicted RNA-binding protein containing PIN domain and involved in 18S rRNA processing

Pre-40S subunit

A

Most archaea, no in bacteria (COG1412)

0

1

Interacts with Nop14p and is required for 40S subunit biogenesis and 18S rRNA maturation (11694595). The presence of the PIN domain suggests RNA-binding and, possibly, RNAse activity

3154

Uncharacterized protein with potential function in translation or ribosomal biogenesis

Pre-40S subunit?

A?

Most archaea, no bacteria (COG2042)

1

X

The general functional prediction stems from the observation that the gene for this protein forms a predicted conserved operon with the gene for ribosomal protein L40E in several archaeal genomes

3214

Small protein containing a Zn-ribbon, possibly RNA-binding; potential role in RNA processing or transcription regulation

?

A?

Conserved in Crenarchaeota (COG4888)

1

1

 

3800

Predicted E3 ubiquitin ligase containing RING finger, subunit of transcription/repair factor TFIIH and CDK-activating kinase assembly factor

TFIIH

KO

None

0

X

 

3176

Predicted α-helical protein, possibly involved in replication/repair; paralog of KOG3636

A novel complex with PCNA involved in replication?

L?

Conserved in most (possibly all) archaea but not in bacteria (COG1711)

0

X

A function in DNA replication/repair and/or transcription is suggested by the analysis of the genome context of archaeal orthologs which form an evolutionarily conserved association with the genes for replication sliding clamp (PCNA ortholog) (K.S.M. and E.V.K., unpublished work)

3303

Predicted α-helical protein, possibly involved in replication/repair transcription; paralog of KOG3508

A novel complex with PCNA involved in replication?

L?

Conserved in most (possibly all) archaea but not in bacteria (COG1711)

0

0

A function in DNA replication/repair and/or transcription is suggested by the analysis of the genome context of archaeal orthologs which form an evolutionarily conserved association with the genes for replication sliding clamp (PCNA ortholog) (K.S.M. and E.V.K., unpublished.work)

0396

Predicted E3 ubiquitin ligase

Ub ligase

O

None

1

1

The proteins in this KOG contain a modified RING domain, which might not be capable of metal-binding similarly to the U-box domain [133] that has been shown to function as E3 [134]

1443

Multitransmembrane protein, predicted drug/metabolite transporter

?

R

Most archaea and bacteria (COG0697)

1

X

 

2647

Multitransmembrane protein, potential transporter

?

R

Most bacteria and some archaea (COG0628)

0

1

 

2488

Predicted N-acetyltransferase

?

R

Most archaea and bacteria (COG0454)

1

X

Putative role in ribosomal maturation?

3347

Predicted nucleotide kinase; nuclear protein (Fap7p)

?

R

Conserved in all archaea but not in bacteria (COG1936)

0

1

Involved in oxidative stress reponse in yeast [135]

3974

Predicted sugar kinase

Putative novel complex with KOG2585 proteins

R

All archaea and most bacteria (COG0063)

1

1

Based on fusions seen in prokaryotes, predicted to interact functionally and, possibly, physically with uncharacterized proteins of KOG2585 (represented in all eukaryotes but includes paralogs in some species)

No functional prediction

      

2318

Uncharacterized conserved protein

?

S

None

0

1

 

3237

Uncharacterized conserved protein containing coiled-coil domain

?

S

None

0

1

Coiled-coil domains are often involved in complex assembly; this could be an uncharacterized component of the chromatin or the spliceosome

  1. *Abbreviations for the functional categories are as in Figure 3. †0, essential gene (lethal knockout); 1, non-essential gene (non-lethal knockout); X indicates that no data is available for the given gene. ‡Data from [85]. §Data from [86].