Skip to main content

Table 1 KOGs and TWOGs with unexpected phyletic patterns (examples)

From: A comprehensive evolutionary classification of proteins encoded in complete eukaryotic genomes

KOG/TWOG number

Phyletic pattern*

(Predicted) structure and function

Prokaryotic homologs

Comments

TWOG0892

---H--E

Discoidin domain protein, potential regulator of proteasome activity

Detected in a few phylogenetically scattered bacteria, no COG so far [69]

 

TWOG0263

A-----E

ATP/ADP translocase

ATP/ADP translocases of chlamydia, rickettsia, Xylella fastidiosa

ATP/ADP translocase is a hallmark of intracellular parasites and symbionts, which allows them to scavenge ATP from the host cell; chloroplast protein in plants. Could be acquired by plants and microsporidia via independent HGT from bacteria. [58]

TWOG0689

---HY--

Uncharacterized protein essential for propionate metabolism

PrpD protein of several bacteria and archaea (COG2079)

The yeast and human (and the orthologs from other vertebrates) proteins show the greatest similarity to different subsets of bacterial orthologs, which might suggest independent HGT events.

TWOG0871

---H-P-

Uncharacterized conserved protein, probably enzyme

COG4336, sporadic representation in several bacterial lineages

The human (and mouse) protein has an additional domain conserved in the archaeon Pyrococcus. Human and S. pombe proteins are most similar to different subsets of bacterial homologs, which suggests the possibility of independent HGT events.

TWOG0788

A----P-

Urease

Ureases of many bacterial species

Highly conserved enzyme present in plants and many fungi but not S. cerevisiae. Plant and fungal ureases have a common domain architecture distinct from that of bacterial orthologs, which suggests monophyletic origin. Might have evolved via early HGT from bacteria (proto-mitochondria?) with subsequent loss in animals and some fungi.

4751

A--H--E

Recombination repair protein BRCA2, contains varying number of BRCA2 repeats

None

Although sequence conservation is limited to the BRC repeats [101] the number of which varies substantially, statistical significance of the observed sequence similarity and the absence of other homologs suggests that the proteins in this KOG are true orthologs. Apparent orthologs of BRCA2 are detectable also in other species from the taxa represented in the KOGs (mosquito Anopheles gambiae, fungus Ustilago maydis) [102] and in early-branching eukaryotes (Leishmania, Trypanosoma; E.V.K., unpublished work), suggesting that evolution of BRCA2 involved multiple gene losses

4597

A--H--E

TATA-binding protein 1-interacting protein

None

Probable multiple gene losses

4486

A--H--E

3-methyl-adenine DNA glycosylase

Orthologs in many bacteria (COG2094)

The plant protein and those from mammals and microsporidia show the greatest similarity to different subsets of bacterial orthologs. Evolution might have included a combination of gene loss and independent HGT events

1594

A-D-Y--

Predicted epimerase related to aldose 1-epimerase

Bacterial orthologs, primarily proteobacteria (COG0676)

Eukaryotic proteins are more closely related to each other than to bacterial orthologs, indicating monophyletic origin. Function remains unknown; might be involved in a distinct and still uncharacterized pathway of polysaccharide biosynthesis. LSE in Arabidopsis (seven paralogs).

4141

---HYPE

Rad52/22, protein involved in double-strand break repair

None

Probable gene loss in plants, insects and nematodes

4528

-CDH--E

Uncharacterized predicted enzyme, possibly a polynucleotide kinase (structure of the ortholog from the bacterium Thermotoga maritima has been determined - pdb code 1j5u)

Conserved in all archaea and several bacteria (COG1371)

Context analysis of archaeal and bacterial genomes suggests functional interaction between proteins of KOG5324 and KOG4246, RNA 3'-terminal phosphate cyclase (KOG4398, COG0430), and tRNA/rRNA cytosine C5-methylase (KOG1299/COG0144) ([103] and E.V.K., unpublished observations). Taken together, the observations appear to implicate KOG5324 and KOG4246 in a still uncharacterized pathway of rRNA and/or tRNA processing and modification. Conservation of these proteins in archaea and early-branching eukaryotes suggests lineage-specific gene loss in plants and fungi.

3833

-CDH--E

Uncharacterized predicted enzyme, possibly a polynuclotide phosphatase

Conserved in all archaea and several bacteria (COG1690)

See comment for KOG5324

  1. *Abbreviations: A, thale cress A. thaliana; C, nematode C. elegans; D, fruit fly D. melanogaster; E, microsporidian Encephalitozoon cuniculi; H, Homo sapiens; S, budding yeast S. cerevisiae; P, fission yeast S. pombe; a letter indicates the presence of the respective species in the given KOG and a dash indicates its absence.