- Protein family review
- Open Access
Higher plant glycosyltransferases
© BioMed Central Ltd 2001
Published: 7 February 2001
Uridine diphosphate (UDP) glycosyltransferases (UGTs) mediate the transfer of glycosyl residues from activated nucleotide sugars to acceptor molecules (aglycones), thus regulating properties of the acceptors such as their bioactivity, solubility and transport within the cell and throughout the organism. A superfamily of over 100 genes encoding UGTs, each containing a 42 amino acid consensus sequence, has been identified in the model plant Arabidopsis thaliana. A phylogenetic analysis of the conserved amino acids encoded by these Arabidopsis genes reveals the presence of 14 distinct groups of UGTs in this organism. Genes encoding UGTs have also been identified in several other higher plant species. Very little is yet known about the regulation of plant UGT genes or the localization of the enzymes they encode at the cellular and subcellular levels. The substrate specificities of these UGTs are now beginning to be established and will provide a foundation for further analysis of this large enzyme superfamily as well as a platform for future biotechnological applications.
Gene organization and evolutionary history
The sequencing of the model plant species A. thaliana has recently been completed . Using the UGT amino acid consensus sequence shown in Figure 1 as a search tool, we have screened its genome and identified a very large glycosyltransferase superfamily containing 107 putative UGT genes and 10 UGT pseudogenes . Analysis of this superfamily has allowed the first characterization of higher plant UGTs at the genomic level to be performed. Information available on other plant UGTs can now begin to be integrated into the results from this genomic analysis.
Using programs capable of detecting more distantly related sequences, such as PSI-BLAST, two additional A. thaliana genes have recently been identified that contain amino acid sequences similar to the UGT consensus sequence. These genes encode proteins 100 residues longer than any of the previously identified A. thaliana UGT genes and each contains 13 introns (see the Gene organization section of this article for details of the intron organization of UGT genes). One of these genes has been previously identified as a UDP-glucose sterol ß-D-glucosyltransferase .
No comparable analysis has yet been carried out for other plant species. Given that so many UGT sequences can be found in the comparatively small genome of A. thaliana, however, it is probable that large numbers will also be detected in species throughout the plant kingdom. Similarly, large numbers can be detected in species of the animal kingdom - such as the 60-gene UGT superfamily in Caenorhabditis elegans. A complete list of UGTs currently annotated can be found at the UDP Glucuronosyltransferase home page .
Characteristic structural features
The amino acid sequences encoded by the UGT genes containing the consensus sequence shown in Figure 1, which vary in length from 435 to 507 amino acids, have all been found to possess nine conserved regions, including the UGT-defining consensus sequence (Figure 4) . The level of similarity between these UGT amino acid sequences varies from over 95% to lower than 30% identity. The amino-terminal regions are more variable than the carboxy-terminal regions, supporting the suggestion that the domain involved in the recognition and binding of the diverse aglycone substrates is located towards the amino terminus of the protein whereas the carboxy-terminal region encodes a domain involved in binding the nucleotide sugar substrate .
To date, none of the proteins encoded by the UGT superfamily has been crystallized and their three-dimensional structures are not known. Six glycosyltransferases from other superfamilies have been analyzed structurally, however, and these analyses suggest that, although they were previously thought to be unrelated, they may fall into just two superfamilies . The first of these contains bacteriophage T4 ß-glucosyltransferase (BGT) and the Escherichia coli N-acetylglucosaminyltransferase MurG, and the second contains Bacillus subtilis glycosyltransferase SpsA, bovine ß-1,4-galactosyltransferase 1, rabbit N-acetylglucosaminyltransferase I and the catalytic fragment of the human glucuronyltransferase I. Interestingly, an approximately 30 amino acid sequence motif in MurG, suggested by the structure to be involved in nucleotide-sugar binding, has been shown to be similar to the UGT consensus sequence described above (Figure 1) [2,11]. Further insight into UGT structure and subsequent structure-function relationship now awaits the resolution of a three-dimensional structure for an enzyme from this superfamily.
Localization and function
Mammalian UGTs, which transfer glucuronic acid to hydrophobic substrates, are membrane-bound enzymes localized in the endoplasmic reticulum with their catalytic sites facing the lumen. These enzymes contain an amino-terminal leader sequence that is cleaved on cotranslational segregation into the rough endoplasmic reticulum, and a hydrophobic carboxy-terminal halt sequence that anchors the enzyme to the membrane . Our analysesof A. thaliana UGTs using TopPred2, SignalP and Psort programs has not identified either of these motifs, supporting the widely held belief that plant UGTs are cytoplasmic enzymes.
Very little information is available from plants regarding the expression of UGT genes. Tomato and tobacco UGTs have been shown to respond rapidly to signals from wounds and pathogen attack [12,13]. There are also now significant data available from the Stanford microarray websiteon expression  of 14 of the 106 A. thaliana UGTs. The high level of sequence homology between family members suggests, however, that expression data using either expressed sequence tag (EST) or full length cDNA probes should be treated with caution, as full-length probes may well hybridize to several closely related UGTs and produce misleading expression profiles. No data are yet available to evaluate whether UGT expression is regulated principally at the DNA, RNA or protein level.
The task of comprehensively assaying UGT substrate specificity is a formidable one and much work remains to be done. Nevertheless, the identification of substrate specificity of higher plant UGTs is beginning to allow some conclusions to be drawn and some interesting relationships between different UGTs to be detected. For example: enzymes that catalyze the formation of salicylic glucose ester and indole-3-acetic acid glucose ester share the highest sequence homology to Group L from Arabidopsis, which contains enzymes that produce hydroxycinnamoyl glucose ester [15,16,17]; three UGTs known to be involved in the 3-O-glucosylation of anthocyanidin in both monocotyledons and dicotyledons are all clustered with the Arabidopsis Group F ; and two highly homologous sequences encoding enzymes that glycosylate the plant hormone zeatin are distinct from all the major UGT groups of Arabidopsis, suggesting the possible presence of Arabidopsis zeatin glycosyltransferases that have not been identified in the A. thaliana UGT superfamily .
These data, taken together, provide a useful foundation for starting to understand the structure-activity relationships of the UGT family. It will be interesting to compare the catalytic specificity in vitro with the consequences of changing the level of individual enzymes in vivo. A broad specificity of recombinant enzymes in vitro may not provide insight into the activity in planta, because substrate availability will also be relevant in the cellular context.
It has been suggested that many UGTs may not exhibit high substrate specificity at all, but rather recognize individual hydroxyl groups present on a wide range of different aglycones . Our substrate-specificity data do not seem to support this suggestion, as screening of 36 Arabidopsis UGTs revealed only one enzyme capable of glucosylating indole-3-acetic acid . Thus, for at least certain UGTs, reactions may be directed by substrate specificity rather than regiospecificity. A much clearer picture will emerge when substrates of more Arabidopsis enzymes have been identified and these data are considered within the context of temporal and spatial expression profiles in planta.
UGTs transfer nucleotide-diphosphate-activated sugars to low-molecular-weight aglycone substrates. In plants, the activated sugar is usually UDP-glucose but other sugars such as UDP-xylose  are also found. The conjugation of the sugar can lead to the formation of a range of glycosylated molecules including glucose esters, cyanogenic glucosides, phenolic glucosides and glucosinolates containing a ß-thioglucose moiety. Many aglycones, such as the flavonols, can also accept more than one sugar if a number of sites are available for glycosylation. The exact catalytic mechanism used by UGTs is not yet known. As discussed above, the enzymes are generally thought to contain an aglycone-binding amino terminus and a UDP-sugar-binding carboxyl terminus but any conclusions regarding enzymatic mechanism await determination of the crystal structure.
It will be essential to integrate data from in vitro and in vivo studies to gain a more complete picture of the potential biological roles of UGTs in plants. This is now feasible with current technology: microarray data, details of the catalytic activities of specific recombinant proteins, metabolite profiles of plants over-expressing or lacking individual UGTs, as well as information on the cell- and tissue-specificity of gene expression, can all be accessed and integrated. Similarly, once the three-dimensional structure of one UGT has been accomplished, molecular modeling will provide very rapid insights into the structural relatedness of other superfamily members and how this relatedness is reflected in catalytic activities.
The recent realization that Arabidopsis, with such a small genome relative to other species in the plant kingdom, has so many UGTs opens up a whole range of new frontiers, both for the fundamental understanding of UGT functions and for the many strategic applications of the UGT superfamily.
Additional data file
An Excel file containing the accession numbers of the Arabidopsis BAC clones that contain UGT genes and the location of each gene in the clone is included (file added on 17 July 2001).
- Campbell JA, Davies GJ, Bulone V, Henrissat B: A classification of nucleotide-diphospho-sugar glycosyltransferases based on amino acid sequence similarities. Biochem J. 1998, 326: 929-939. Describes a comprehensive classification of all known NDP-sugar hex-osyltransferases and presents the use of this system to group the known glycosyltransferases into 26 families. A more recent version of this analysis is available at the Introduction to Glycosyltransferase website .View ArticleGoogle Scholar
- Kapitonov D, Yu RK: Conserved domains of glycosyltransferases. Glycobiology. 1999, 9: 961-978. 10.1093/glycob/9.10.961. Identifies and aligns three glycosyltransferase conserved domains. The evolutionary relationship of each of these domains is presented along with a potential mechanism for the glycosyltransferase catalytic reaction.PubMedView ArticleGoogle Scholar
- Mackenzie P, Owens I, Burchell B, Bock K, Bairoch A, Bélanger A, Fournel-Gigleux S, Green M, Hum D, Iyanagi T, et al: The UDP glycosyltransferase gene superfamily: recommended nomenclature update based on evolutionary divergence. Pharmacogenetics. 1997, 7: 255-269. An update of the nomenclature system for UDP glycosyltransferases. Amino acid sequences of proteins from animal, yeast, plant and bacteria are compared to define 33 families.PubMedView ArticleGoogle Scholar
- UDP Glucuronosyltransferase home page. This is the home page of the committee for naming UDP glucuronosyl-transferase. The site has links to relevant databases and information resources., [http://www.unisa.edu.au/pharm_medsci/Gluc_trans/Gt_ttl.htm]
- Introduction to Glycosyltransferase. Site further describing the classification of glycosyltransferases that use nucleotide diphospho-sugars, nucleotide monophospho-sugars and sugar phosphates (EC 2.4.1.x). Enzymes are grouped into distinct sequence-based families ., [http://afmb.cnrs-mrs.fr/~pedro/CAZY/GT.html]
- The Arabidopsis Initiative: Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature. 2000, 408: 796-815. 10.1038/35048692. Report describing the sequencing of the complete genome of Arabidopsis thaliana.Google Scholar
- Li Y, Baldauf S, Lim E-K, Bowles DJ: Phylogenetic analysis of the UDP-glycosyltransferase multigene family of Arabidopsis thaliana. J Biol Chem. 2001, Phylogenetic analysis of 88 UGT amino acid sequences, defining 12 major groups of glycosyltransferases., Google Scholar
- Warnecke DC, Baltrusch M, Buck F, Wolter FP, Heinz E: UDP-glucose:sterol glucosyltransferase: cloning and functional expression in Escherichia coli. Plant Mol Biol. 1997, 35: 597-603. 10.1023/A:1005806119807. A description of the properties of a membrane bound UDP-glucosyl-transferase that glucosylates plant sterols.PubMedView ArticleGoogle Scholar
- Meech R, MacKenzie PI: Structure and function of uridine diphosphate glucuronosyltransferases. Clin Exp Pharmacol Physiol. 1997, 24: 907-915. Discusses the concepual division of UGT proteins into two domains, an amino-terminal half containing the aglycone binding site and a carboxy-terminal half believed to contain a UDP-sugar binding site.PubMedView ArticleGoogle Scholar
- Unligil UM, Rini JM: Glycosyltransferase structure and mechanism. Curr Opin Struct Biol. 2000, 10: 510-517. 10.1016/S0959-440X(00)00124-X. Summary of the six known glycosyltransferase three-dimensional structures and discussion of the grouping of these enzymes into two superfamilies on the basis of their structural similarities.PubMedView ArticleGoogle Scholar
- Radominska-Pandya A, Czernik PJ, Little JM, Battaglia E, MacKenzie PI: Structural and functional studies of UDP-glucuronosyl-transferases. Drug Metab Rev. 1999, 31: 817-899. 10.1081/DMR-100101944. Review describing current information on substrate specificity, structure and topology of UGT1A and 2B family glucuronosyltransferases.PubMedView ArticleGoogle Scholar
- O'Donnell PJ, Truesdale MR, Calvert CM, Dorans A, Roberts MR, Bowles DJ: A novel tomato gene that rapidly responds to wound- and pathogen-related signals. Plant J. 1998, 14: 137-142. 10.1046/j.1365-313X.1998.00110.x. Characterization of a wound-induced glucosyltransferase gene.PubMedView ArticleGoogle Scholar
- Roberts MR, Warner SAJ, Darby R, Lim EK, Draper J, Bowles DJ: Differential regulation of a glucosyl transferase gene homologue during defence responses in tobacco. J Exp Bot. 1999, 50: 407-410. 10.1093/jexbot/50.332.405. Investigation of the expression profile of a glucosyltransferase that is rapidly induced during the defence response.View ArticleGoogle Scholar
- Stanford Microarray Database. Site providing raw and normalized data from microarray experiments as well as their corresponding image files., [http://genomewww4.stanford.edu/MicroArray/SMD/]
- Vogt T, Jones P: Glycosyltransferases in plant natural product synthesis: characterization of a supergene family. Trends Plant Sci. 2000, 5: 380-386. 10.1016/S1360-1385(00)01720-9. A recent review of glycosyltransferases of plant secondary metabolism.PubMedView ArticleGoogle Scholar
- Jackson R, Lim E-K, Li Y, Kowalczyk M, Sandberg G, Hoggett J, Ashford DA, Bowles DJ: Identification and biochemical characterisation of an Arabidopsis indole-3-acetic acid glucosyl-transferase. J Biol Chem. 2001, This report describes the in vitro substrate specificity of a member of the A. thalianaUGT superfamily that shows activity to IAA., Google Scholar
- Lim E-K, Li Y, Parr A, Jackson J, Ashford DA, Bowles DJ: Identification of glucosyltransferase genes involved in sinapate metabolism and lignin synthesis in Arabidopsis. J Biol Chem. 2001, An analysis of the in vitro expression of 36 recombinant A. thalianaUGTs and the identification of enzymes that produce hydroxycin-namoyl glucose conjugates., Google Scholar
- Ford CM, Boss PK, Hoj PB: Cloning and characterization of Vitis vinifera UDP-glucose:flavonoid 3-O-glucosyltransferase, a homologue of the enzyme encoded by the maize Bronze-1 locus that may primarily serve to glucosylate anthocyani-dins in vivo. J Biol Chem. 1998, 273: 9224-9233. 10.1074/jbc.273.15.9224. Describes the identification of a UGT with activity to anthocyanidin substrates.PubMedView ArticleGoogle Scholar
- Martin RC, Mok MC, Mok DW: A gene encoding the cytokinin enzyme zeatin O-xylosyltransferase of Phaseolus vulgaris. Plant Physiol. 1999, 120: 553-558. 10.1104/pp.120.2.553. Description of a zeatin glycosyltransferase that uses UDP-xylose as the nucleotide sugar donor.PubMedPubMed CentralView ArticleGoogle Scholar