- Open Access
Variation in global codon usage bias among prokaryotic organisms is associated with their lifestyles
Genome Biology volume 12, Article number: R109 (2011)
It is widely acknowledged that synonymous codons are used unevenly among genes in a genome. In organisms under translational selection, genes encoding highly expressed proteins are enriched with specific codons. This phenomenon, termed codon usage bias, is common to many organisms and has been recognized as influencing cellular fitness. This suggests that the global extent of codon usage bias of an organism might be associated with its phenotypic traits.
To test this hypothesis we used a simple measure for assessing the extent of codon bias of an organism, and applied it to hundreds of sequenced prokaryotes. Our analysis revealed a large variability in this measure: there are organisms showing very high degrees of codon usage bias and organisms exhibiting almost no differential use of synonymous codons among different genes. Remarkably, we found that the extent of codon usage bias corresponds to the lifestyle of the organism. Especially, organisms able to live in a wide range of habitats exhibit high extents of codon usage bias, consistent with their need to adapt efficiently to different environments. Pathogenic prokaryotes also demonstrate higher extents of codon usage bias than non-pathogenic prokaryotes, in accord with the multiple environments that many pathogens occupy. Our results show that the previously observed correlation between growth rate and metabolic variability is attributed to their individual associations with codon usage bias.
Our results suggest that the extent of codon usage bias of an organism plays a role in the adaptation of prokaryotes to their environments.
The genetic code is composed of triplets of four nucleotide types for 20 amino acids. This redundancy implies the use of synonymous codons - different codons encoding the same amino acid. Synonymous codons may differ in their frequency of occurrence among different genes within an organism, a phenomenon known as 'codon usage bias' . It was demonstrated that many bacteria and yeast undergo translational selection, with highly expressed genes preferentially using codons assumed to be translated faster and/or more accurately by the ribosome [2, 3]. Previous works suggested that these codons are the ones matching abundant tRNAs, which are organism-specific [3–7]. Other works demonstrated additional factors affecting the frequencies of the synonymous codons in an organism, such as the genome GC content [8, 9]. Thus, the preferred codons per amino acid may vary between different organisms, based on their tRNA repertoire and other factors.
While it was claimed that most prokaryotes undergo translational selection , it is conceivable that various organisms may differ in the extent of codon usage bias across their genes and in the forces determining it. For several organisms, such as Escherichia coli and Saccharomyces cerevisiae, a positive correlation between codon bias of genes and their protein levels was demonstrated (for example, [11, 12]), suggesting that in those organisms translational selection is predominant. Other organisms, such as Helicobacter pylori , show almost no differential use of synonymous codons among different genes. While in the former organisms a mixture of mutation bias and natural selection underlies codon usage bias, in the latter organisms codon usage bias is mostly explained by mutation and very weak, if any, translational selection . It was suggested by Andersson and Kurland  and recently substantiated by Kudla et al.  that selection towards highly adapted codons in highly expressed proteins has a global effect on the cell. By this convention, high expression of certain genes is mainly achieved by various mechanisms, such as regulation of transcription and/or translation initiation, and the use of well adapted codons in these genes guarantees efficient recycling of the ribosomes, resulting in an increase in cellular fitness. This might be reflected in a relatively enhanced translation of the whole proteome. It is thus intriguing to examine the association between the extent of codon usage bias of various organisms and their phenotypic traits towards understanding the environmental conditions where high extent of codon usage bias is advantageous. Of note, difference in the usage of synonymous codons that is uniform across all genes in a genome and probably arises from non-selective processes is not addressed here, but the different usage of codons among genes in a genome.
We regard an organism as 'biased' or 'unbiased' in association with its codon usage if the distribution of synonymous codons in its highly expressed genes differs from that in other genes in the genome. Several measures were proposed to estimate the extent of codon usage bias at an organism scale, enabling the classification of an organism as biased or unbiased [14, 17–19]. Here we present such a measure based on the Codon Adaptation Index (CAI)  of individual genes. The CAI of a gene ranges between 0 and 1, with higher values indicating the use of more preferred codons. The extent of preference of each codon is determined by its frequency among the codons in genes encoding ribosomal proteins, using the latter as proxy for highly expressed genes. An organism-scale measure is obtained by computing CAIave, the average of the CAI values of all genes in a genome . Since per definition highly expressed genes are assigned high CAI values, low or high CAIave values indicate whether there is a difference in codon usage between highly expressed genes and the rest of the genes in a genome. A low CAIave of an organism implies that there are many genes with low CAI values, and therefore preferred codons are assigned only to a small group of highly expressed genes. Accordingly, low CAIave values are indicative of biased organisms. A high CAIave of an organism implies that the CAI values of most genes are similar to those of genes encoding the ribosomal proteins, and therefore there is no differential use of synonymous codons among the genes encoded in that organism. Such organisms are unbiased in regard to their codon usage. Another measure for the extent of codon usage bias of an organism is based on another measure of gene codon usage bias, the Nc (the effective number of codons) . This gene measure ranges between 20 and 61, with lower values indicating the use of less codon types per amino acid along a protein-coding gene, which most often are the more preferred codons. To evaluate the extent of codon usage bias of an organism, a measure based on the difference between the average Nc values of the ribosomal genes and the average Nc values of the rest of the genes in the genome, Ncdiff, is computed . Organisms with high values of Ncdiff exhibit large extents of codon usage bias, and vice versa. The two genomic measures, CAIave and Ncdiff, are highly correlated (Figure S1 in Additional file 1; Pearson r = -0.91, P ≈ 0, n = 1,169).
Here we used these measures to characterize the extent of codon usage bias in 773 prokaryotic species representing 1,169 sequenced genomes. We demonstrate a wide range in the extent of codon usage bias among the various organisms, and trace the possible sources of this variation. Our results indicate that similarity in the extents of codon usage bias of different organisms reflects a similar ecological strategy they share.
Organisms differ in their extent of codon usage bias
We analyzed the extent of codon usage bias in 773 organisms: 699 bacteria and 74 archaea, representing 1,169 genomes (Materials and methods). Since the results for CAIave and NCdiff are highly correlated, we present here the results for the first measure and in Additional file 1 the results for the second measure (Figures S2, S4, S5 and S6 in Additional file 1).
We first examined the association between the genomic values of CAIave and GC content (Figure 1), and observed that prokaryotes with extreme GC contents exhibit a narrower range of CAIave values than the other prokaryotes. Such organisms cannot achieve low CAIave values because their extreme GC content defines a limited repertoire of codons that are shared by both ribosomal genes and the rest of the genes, thus resulting in unbiased genomes (high CAIave). To avoid bias in our conclusions due to the inclusion of organisms with extreme GC content, these organisms were excluded from further analyses. The next analyses were carried out for prokaryotes with GC contents between 35% and 65%, a total of 518 organisms.
As shown in Figure 2a, there is a wide distribution of CAIave values across the genomes, ranging between 0.35 (most biased organisms) and 0.82 (least biased organisms), with an average value of 0.59, median of 0.59 and standard deviation of 0.097. Another informative genomic measure is the standard deviation of CAI values of genes in a genome, which indicates the breadth of values used in calculating the genomic CAIave measures. To get an impression of this property across genomes we computed for each genome the coefficient of variation of CAI values of genes (standard deviation divided CAIave). As expected, we find a strong inverse correlation between the CAIave values and their coefficients of variation (Figure 2b; Pearson r = -0.81, P = 4.186E-122, n = 518), indicating that the distribution of CAI values of genes in biased organisms (low CAIave) is broader than that of CAI values of genes in unbiased organisms.
To substantiate this result we investigated in each genome the difference in the usage of each codon between highly expressed genes and the rest of the genes encoded in the genome. For each organism we calculated the frequency of each codon out of the total codons encoding the corresponding amino acid, once in the ribosomal genes and once in the rest of the genes. We then calculated the average difference between codon frequencies in the ribosomal genes and in the rest of the genes. Comparison of these average differences and CAIave values across genomes revealed a strong negative correlation (Pearson r = -0.8593, P = 2.2639E-152, n = 518). This result reinforces our view that organisms with low CAIave are under stronger translational selection, resulting in differences between the codons in the highly expressed genes and the codons in the rest of the genes, while in organisms with high CAIave there is almost no differential use of codons between highly and weakly expressed genes. The latter unbiased organisms can be of two types: either the synonymous codons of both ribosomal and other genes are determined based on the GC content of the genome, or most genes use the preferred codons (that fit the abundant tRNAs or follow other rules that determine codon preference). To this end we looked at the association between CAIave and the average Nc of a genome in all 1,169 organisms. Low average Nc values imply that all genes in the genome use a specific set of codons and high average Nc values represent genomes where all genes use a mixture of codons. Indeed, we find among genomes with high CAIave these two types of trends (Figure S3a in Additional file 1). As to the biased genomes (with CAIave below the average of 0.59), most of them use relatively many codons across their genes (average Nc values above 40), but apparently they use specific codons in their highly expressed genes, represented by the ribosomal genes. In general, it seems that the average values of Nc are strongly determined by the GC content of the genome (Figure S3b in Additional file 1). Especially, as we noted above, in genomes with extreme GC content, where the possibilities of codon variation are severely restricted by the specific nucleotide repertoire, the average Nc values are low.
Organisms at the right tail of the CAIave distribution are the least biased. The prokaryote with the smallest extent of codon usage bias (highest CAIave value, 0.82) is Syntrophothermus lipocalidus, a thermophilic, syntrophic, fatty-acid-oxidizing anaerobe that belongs to the Clostridia class  (Figure 2c). Other prokaryotes with almost as high CAIave values are quite diverse: Geobacter metallireducens (CAIave value of 0.79) is an anaerobic bacterium that uses iron oxides as the electron acceptor in the oxidation of organic compounds to carbon dioxide , Nitrosococcus watsoni (CAIave value of 0.78) is an aerobic marine bacterium, and Coxiella burnetti (CAIave value of 0.78) is a facultative, intracellular pathogenic bacterium that causes the Q fever.
Organisms at the left tail of the distribution are the most biased. The prokaryote that exhibited the greatest extent of codon usage bias (lowest CAIave value, 0.35) is Vibrio vulnificus, a human pathogen of the Gammaproteobacteria class (Figure 2d). Interestingly, other organisms that have such large extents of codon usage bias are also pathogenic: Streptococcus suis is a swine pathogen, Vibrio cholera causes cholera in humans and Corynebacterium diphtheria causes diphtheria, an upper respiratory tract illness (CAIave values of 0.36, 0.37, and 0.38, respectively). This finding led us to examine the distribution of CAIave values among pathogenic versus non-pathogenic prokaryotes (Figure 3a). As shown in Figure 3a, the distribution among pathogenic bacteria is biased to the left compared to non-pathogenic bacteria, with pathogenic prokaryotes having statistically significant lower CAIave values (P = 1.26E-15 by Mann-Whitney test), indicating they are more biased.
Biased organisms can be classified by their phenotypic traits
We annotated each genome with its CAIave, NCdiff and several phenotypic traits, such as its oxygen requirement and range of growth temperatures (Additional file 2). This annotation system (Table 1) has enabled us to compare the distributions of the CAIave values of different groups of organisms, tagged according to a particular phenotypic trait.
This analysis (Figure 3) indicated that groups of prokaryotes classified by their oxygen requirement differ statistically significantly in the distributions of CAIave values (P = 1.99E-23 by Kruskal-Wallis test). Facultative organisms exhibited the largest extent of codon usage bias and anaerobic organisms showed the smallest values (P = 2.92E-12, Mann-Whitney test between facultative and aerobic; P = ~0 between facultative and anaerobic; P = 3.14E-6 between aerobic and anaerobic). Examining groups of prokaryotes that live in environments that differ in their salinity levels demonstrated statistically significant differences among them (P = 2.13E-5 by Mann-Whitney test). Organisms that live in different temperature ranges showed statistically significant differences in their CAIave values (P = 5.52E-21 by Mann-Whitney test): thermophiles demonstrated statistically significantly higher CAIave values than mesophiles. Intriguingly, we found that organisms living in multiple habitats have statistically significantly lower CAIave values than organisms living in specialized habitats (Figure 4; P = 6.3E-10 by Mann-Whitney test). This result is consistent with the results presented above for the other phenotypic traits and generalizes them. Pathogenic bacteria often live in multiple environments outside and within their host, and facultative organisms live in environments with and without oxygen. On the other hand, thermophiles (found above to be less biased than mesophiles) are usually restricted to a specific environment with a specific temperature. The consistency between these results is also implied by the interdependence between these different phenotypic traits (as shown by χ2 test; Table S2 in Additional file 1).
Phenotypic traits rather than phylogenetic relatedness underlie the similarities in codon usage bias between organisms
It is possible that biased organisms are evolutionarily related and their similar values of CAIave stem from their phylogenetic relatedness. To test this, we computed the correlation coefficient between the phylogenetic distance (Materials and methods) and difference in CAIave values of pairs of organisms. The pairwise measures were computed for pairs of prokaryotes and pairs of archaea, and the correlation analysis was carried out for all pairs together (Figure 5). No correlation was found between the phylogenetic distances and the differences in CAIave (Pearson r = 0.078), implying no noticeable influence of the phylogeny on the extent of codon usage bias. As shown in Figure 5, organisms with very small differences in their CAIave values can be distantly separated on the evolutionary tree. Of note, organisms that are extremely close on the phylogenetic tree do not exhibit differences in CAIave that are larger than 0.15. Thus, in these cases the similarity in CAIave values may stem from the close phylogenetic relatedness. However, beyond a certain (low) threshold, there is no dependence between the phylogenetic distance and extent of codon usage bias. These results strengthen our previous conclusion that the extent of codon usage bias is associated with the phenotypic traits of the organism.
The empirical association between growth rate and metabolic variability is attributed to their individual associations with codon usage bias
Previous studies showed that there is an association between codon bias and growth rate [17, 24] and between growth rate and metabolic variability . To verify that our result is not indirectly inferred from these two associations, we computed the correlation coefficient between pairs of properties (CAIave, growth rate, and type of habitat (multiple or specific)). This analysis included 82 organisms for which we had information on their growth rate and habitat type. We repeated this analysis twice. Once we simply computed the correlation coefficient of two variables, and in the second analysis we performed partial correlation, controlling for the third variable (Table 2a). This analysis demonstrates that our conclusion about an association between CAIave and the type of habitat is independent of the correlations with the growth rate, as the two correlation coefficients obtained in the two computations, with and without taking into account the growth rate, were very similar (r = 0.46, P = 1.25E-5 and r = 0.43, P = 6.9E-5, respectively). Intriguingly, the correlation between growth rate and habitat type was shown to be highly dependent on CAIave. While the correlation between these two variables was found to be approximately 0.2 and nearly statistically significant (P = 0.07), the partial correlation, controlling for CAIave, dropped substantially to 0.04 (P = 0.6). Our results suggest that the empirical association observed between growth rate and metabolic variability can be attributed to their individual associations with codon usage bias. Of note, the correlation between CAIave and habitat type was the highest obtained and it is highly statistically significant. To verify that our conclusions are not affected by inclusion of closely related organisms, which might introduce redundancy in the data, we repeated the analysis with a subset of the 82 organisms, including 24 organisms that are phylogenetically remote (Materials and methods; Table 2, Analysis B). Using this dataset, the correlations between growth rate and both habitat type and codon bias were not statistically significant. The correlation and partial correlation between CAIave and habitat type were consistent with the results for the whole dataset.
It is widely acknowledged that synonymous codons are used unevenly among genes in a genome, with genes encoding highly expressed proteins being enriched with specific codons. It is still under debate whether the biased use of certain codons in highly expressed genes is one of the causes or the result of the high expression level. On the one hand, it was shown that the use of certain codons affects directly the speed of translation  and its accuracy , and codon optimization is even used to elevate the levels of proteins expressed outside their original context [28, 29]. On the other hand, Kudla et al.  showed that the variation in the levels of proteins translated from synthetic green fluorescent protein constructs, varying only at synonymous sites, was not correlated with the codon usage. They found that high expression was not associated with specific codons but with avoidance of secondary structure at the translation initiation site. This supports the proposition  that selection for well adapted codons in highly expressed genes does not affect directly the level of individual proteins, but provides a global benefit to the cell, as it assures efficient recycling of the ribosomes, which leads to an increase in cellular fitness. It should be noted that avoidance of secondary structure at the 5' end of the mRNA is only one mechanistic strategy that may lead to high levels of gene expression , and other mechanisms could underlie high levels of expression as well. These include high transcription level from strong promoters, high stability of the mRNA and/or efficient translation initiation by optimal Shine-Dalgarno sequences. Thus, different highly expressed genes might use well adopted codons to improve cellular fitness, independent of the molecular mechanism underlying their high expression. The premise by which translation selection for preferred codons in highly expressed genes has a global effect motivated us to investigate its association with the phenotypic traits of a wide range of organisms. It should be noted that our conclusions are not affected by whether or not the suggested global effect is accompanied by local effects on the translation efficiency.
There have been various attempts to explain what makes certain codons preferred over others in highly expressed genes, from correspondence to abundant tRNAs to physical considerations regarding the optimal stability of codon-anticodon interaction (summarized in ). Here we have not dealt with these various possible types of preferred codons but regarded them as the codons used by highly expressed genes in a genome, based on the premise that selection favored the most adapted codons in highly expressed genes. Hence, our analysis was based on measures that compare the codon usage of all genes to the codon usage of highly expressed genes, represented by the set of ribosomal genes in each genome (CAIave and Ncdiff). Such a comparative measure should provide us information on the selection forces that act on individual genes and on the whole genome. Indeed, we find a wide variation in these measures across genomes (Figure 2a; Figure S2 in Additional file 1), where some organisms are highly biased and others show only very slight, if any, difference between the codon usage in ribosomal genes and other genes. It is possible that the lack of selection implied for some of the unbiased genomes actually reflects their small population size , but in the absence of a reliable measure of effective population size in bacteria, we are unable to assess this further. In biased genomes selection acts only on genes that are highly expressed, to assure the overall translation mechanism to operate efficiently. Thus, our analysis is in line with previous observations  that there are organisms where translational selection is operational (biased genomes) and others where it is not (unbiased). In our study we used the set of ribosomal genes as a representative set of highly expressed genes. When data of gene expression in many prokaryotes become available it should be possible to extend our study by using sets of highly expressed genes based on their measured expression levels. It would be possible then to divide the organisms into those with high and low variation in gene expression, and to examine how the level of variation in gene expression is reflected in CAIave values.
We found that pathogenic prokaryotes have statistically significantly lower CAIave than non-pathogenic prokaryotes, but with a substantial overlap between the histograms of those two groups. This result might be surprising in view of previous studies that found that pathogenic lifestyle is linked to relaxation of selection [32–34], which should be linked to reduced codon bias. It is possible to reconcile the discrepancy by dividing the pathogenic and the non-pathogenic prokaryotes into two subtypes: some are capable of living in multiple environments and some stay mainly host-associated. When we compared pathogenic and non-pathogenic groups of host-associated prokaryotes there was no statistically significant difference in their CAIave values (P = 0.06 by Mann-Whitney test), but when we compared pathogenic and non-pathogenic groups of prokaryotes living in multiple habitats, the pathogenic prokaryotes showed statistically significantly lower CAIave values (P = 5.624E-7 by Mann-Whitney test). Therefore, it seems that pathogenic host-associated microbes like C. burnetti, which is an intracellular pathogen with an extremely high CAIave value of 0.78, are not exposed to strong selection, while other pathogens that are able to live in multiple habitats are still under stronger selection than non-pathogenic prokaryotes living in multiple habitats when it comes to selection on codon usage.
Previous studies discussed the correspondence between ecology preferences and codon adaptation [35, 36]. Our results suggest that organisms may adjust to metabolic variability by maintaining a high extent of codon usage bias (reflected by their low CAIave values). Previous studies analyzed the association between codon adaptation and growth rate and between growth rate and metabolic variability [14, 17, 24, 25]. One study showed that most bacterial organisms choose one of two alternative ecological strategies: living in multiple habitats with a large extent of co-habitation or living in a specialized niche in which the co-habitation is limited. It was shown that growth rate is statistically significantly correlated with metabolic variability encountered by an organism, suggesting a universal principle by which metabolic flexibility is associated with a need to grow fast, perhaps because of the greater extent of competition . Independently, Rocha demonstrated an association between bacteria with large extents of codon usage bias and fast growth, and also an association with the number of tRNA genes [17, 24]. It was demonstrated that fast growing bacteria have more tRNA genes of fewer types, and suggested that the translation in those organisms depends on fast tRNA diffusion to the ribosome. That study proposed that co-evolution of the tRNA pool and the codon usage bias allows more efficient translation of highly expressed genes, and that the codon usage bias in highly expressed genes relative to the rest of the genome is predicted to be under stronger selection in fast growing organisms. Our results tie these two results together and suggest that translational selection towards most adapted codons in highly expressed genes is operational in organisms that live in variable environments, enabling them to efficiently address the metabolic variability and the competition.
Codon usage bias in highly expressed genes was suggested to have a global effect on the cell, increasing cellular fitness. Here we perform the first large-scale study that examines the relationship between codon usage bias and the phenotypic traits of prokaryotic organisms. Our analysis revealed a large variation in the global extent of codon bias among prokaryotic genomes, which is associated with their lifestyles. Especially, we discovered that organisms living in multiple habitats, including facultative organisms, mesophiles and pathogenic bacteria, exhibit high extents of codon usage bias, consistent with their need to adapt efficiently to different environments.
Materials and methods
We retrieved 1,169 genome sequences from the NCBI Entrez Genome Project database . For species that had sequenced genomes for more than one subspecies, we maintained one representative subspecies, which was chosen randomly. This resulted in a data set of 773 prokaryotes. From this list, only prokaryotes with GC content larger than 35% and smaller than 65% were included for further analyses, resulting in a data set of 518 prokaryotes.
Computation of codon bias measures at an organism scale
NCdiff: for each organism we calculated the Nc value for each gene in the genome, as described in . We next computed the average Nc value of the genes encoding ribosomal proteins (Nc (rib)) and the average Nc value of the rest of the genome (Nc (all)) . NCdiff was obtained by:
Comparison between groups of organisms
The prokaryotes were classified according to their environmental characteristics (oxygen requirement, salinity, temperature range and habitat) and also whether they are pathogenic, based on the documentation in the NCBI Entrez Genome Project database (detailed in Additional file 2). The number of organisms annotated with each property is detailed in Table 1. Comparisons between the value distributions of organism measures (CAIave or NCdiff) were performed by Mann-Whitney or Kruskal-Wallis tests.
Phylogenetic distances between organisms
The pairwise distances across the phylogenetic tree were based on the tree generated in  and computed as the path length between two organisms through the most recent common ancestor. We used 2,016 possible pairs of 64 bacteria and 78 possible pairs of 13 archaea in this analysis.
Correlations and partial correlations between growth rate, habitat type and codon usage bias
This analysis included 82 prokaryotes, for which we had information for both their growth rates and habitat types. The growth rates were obtained from . The prokaryotes' environmental annotations were obtained from the NCBI Entrez Genome Project database (multiple habitat-living organisms were annotated as 0, and specialized organisms were annotated as 1). The extents of codon usage bias were represented as CAIave values. We repeated the analysis using a subset of 24 organisms, which are all phylogentically remote from each other, based on the phylgenetic tree used above .
Codon Adaptation Index.
Grantham R, Gautier C, Gouy M, Mercier R, Pave A: Codon catalog usage and the genome hypothesis. Nucleic Acids Res. 1980, 8: r49-r62.
Gouy M, Gautier C: Codon usage in bacteria: correlation with gene expressivity. Nucleic Acids Res. 1982, 10: 7055-7074. 10.1093/nar/10.22.7055.
Bennetzen JL, Hall BD: Codon selection in yeast. J Biol Chem. 1982, 257: 3026-3031.
Post LE, Nomura M: DNA sequences from the str operon of Escherichia coli. J Biol Chem. 1980, 255: 4660-4666.
Ikemura T: Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes: a proposal for a synonymous codon choice that is optimal for the E. coli translational system. J Mol Biol. 1981, 151: 389-409. 10.1016/0022-2836(81)90003-6.
Ikemura T: Codon usage and tRNA content in unicellular and multicellular organisms. Mol Biol Evol. 1985, 2: 13-34.
Dong H, Nilsson L, Kurland CG: Co-variation of tRNA abundance and codon usage in Escherichia coli at different growth rates. J Mol Biol. 1996, 260: 649-663. 10.1006/jmbi.1996.0428.
Chen SL, Lee W, Hottes AK, Shapiro L, McAdams HH: Codon usage between genomes is constrained by genome-wide mutational processes. Proc Natl Acad Sci USA. 2004, 101: 3480-3485. 10.1073/pnas.0307827100.
Hershberg R, Petrov DA: General rules for optimal codon choice. PLoS Genet. 2009, 5: e1000556-10.1371/journal.pgen.1000556.
Supek F, Skunca N, Repar J, Vlahovicek K, Smuc T: Translational selection is ubiquitous in prokaryotes. PLoS Genet. 2010, 6: e1001004-10.1371/journal.pgen.1001004.
Lithwick G, Margalit H: Hierarchy of sequence-dependent features associated with prokaryotic translation. Genome Res. 2003, 13: 2665-2673. 10.1101/gr.1485203.
Ghaemmaghami S, Huh WK, Bower K, Howson RW, Belle A, Dephoure N, O'Shea EK, Weissman JS: Global analysis of protein expression in yeast. Nature. 2003, 425: 737-741. 10.1038/nature02046.
Lafay B, Atherton JC, Sharp PM: Absence of translationally selected synonymous codon usage bias in Helicobacter pylori. Microbiology. 2000, 146: 851-860.
Sharp PM, Bailes E, Grocock RJ, Peden JF, Sockett RE: Variation in the strength of selected codon usage bias among bacteria. Nucleic Acids Res. 2005, 33: 1141-1153. 10.1093/nar/gki242.
Andersson SG, Kurland CG: Codon preferences in free-living microorganisms. Microbiol Rev. 1990, 54: 198-210.
Kudla G, Murray AW, Tollervey D, Plotkin JB: Coding-sequence determinants of gene expression in Escherichia coli. Science. 2009, 324: 255-258. 10.1126/science.1170160.
Rocha EP: Codon usage bias from tRNA's point of view: redundancy, specialization, and efficient decoding for translation optimization. Genome Res. 2004, 14: 2279-2286. 10.1101/gr.2896904.
dos Reis M, Savva R, Wernisch L: Solving the riddle of codon usage preferences: a test for translational selection. Nucleic Acids Res. 2004, 32: 5036-5044. 10.1093/nar/gkh834.
Suzuki H, Saito R, Tomita M: Measure of synonymous codon usage diversity among genes in bacteria. BMC Bioinformatics. 2009, 10: 167-10.1186/1471-2105-10-167.
Sharp PM, Li WH: The codon Adaptation Index--a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res. 1987, 15: 1281-1295. 10.1093/nar/15.3.1281.
Wright F: The 'effective number of codons' used in a gene. Gene. 1990, 87: 23-29. 10.1016/0378-1119(90)90491-9.
Sekiguchi Y, Kamagata Y, Nakamura K, Ohashi A, Harada H: Syntrophothermus lipocalidus gen. nov., sp. nov., a novel thermophilic, syntrophic, fatty-acid-oxidizing anaerobe which utilizes isobutyrate. Int J Syst Evol Microbiol. 2000, 50: 771-779. 10.1099/00207713-50-2-771.
Lovley DR, Phillips EJ: Novel mode of microbial energy metabolism: organic carbon oxidation coupled to dissimilatory reduction of iron or manganese. Appl Environ Microbiol. 1988, 54: 1472-1480.
Vieira-Silva S, Rocha EP: The systemic imprint of growth and its uses in ecological (meta)genomics. PLoS Genet. 2010, 6: e1000808-10.1371/journal.pgen.1000808.
Freilich S, Kreimer A, Borenstein E, Yosef N, Sharan R, Gophna U, Ruppin E: Metabolic-network-driven analysis of bacterial ecological strategies. Genome Biol. 2009, 10: R61-10.1186/gb-2009-10-6-r61.
Tuller T, Carmi A, Vestsigian K, Navon S, Dorfan Y, Zaborske J, Pan T, Dahan O, Furman I, Pilpel Y: An evolutionarily conserved mechanism for controlling the efficiency of protein translation. Cell. 2010, 141: 344-354. 10.1016/j.cell.2010.03.031.
Akashi H: Synonymous codon usage in Drosophila melanogaster: natural selection and translational accuracy. Genetics. 1994, 136: 927-935.
Deng T: Bacterial expression and purification of biologically active mouse c-Fos proteins by selective codon optimization. FEBS Lett. 1997, 409: 269-272. 10.1016/S0014-5793(97)00522-X.
Gustafsson C, Govindarajan S, Minshull J: Codon bias and heterologous protein expression. Trends Biotechnol. 2004, 22: 346-353. 10.1016/j.tibtech.2004.04.006.
Gu W, Zhou T, Wilke CO: A universal trend of reduced mRNA stability near the translation-initiation site in prokaryotes and eukaryotes. PLoS Comput Biol. 2010, 6: e1000664-10.1371/journal.pcbi.1000664.
dos Reis M, Wernisch L: Estimating translational selection in eukaryotic genomes. Mol Biol Evol. 2009, 26: 451-461. 10.1093/molbev/msn272.
Hershberg R, Lipatov M, Small PM, Sheffer H, Niemann S, Homolka S, Roach JC, Kremer K, Petrov DA, Feldman MW, Gagneux S: High functional diversity in Mycobacterium tuberculosis driven by genetic drift and human demography. PLoS Biol. 2008, 6: e311-10.1371/journal.pbio.0060311.
Hershberg R, Tang H, Petrov DA: Reduced selection leads to accelerated gene loss in Shigella. Genome Biol. 2007, 8: R164-10.1186/gb-2007-8-8-r164.
Ochman H, Moran NA: Genes lost and genes found: evolution of bacterial pathogenesis and symbiosis. Science. 2001, 292: 1096-1099. 10.1126/science.1058543.
Man O, Pilpel Y: Differential translation efficiency of orthologous genes is involved in phenotypic divergence of yeast species. Nat Genet. 2007, 39: 415-421. 10.1038/ng1967.
Jiang H, Guan W, Pinney D, Wang W, Gu Z: Relaxation of yeast mitochondrial functions after whole-genome duplication. Genome Res. 2008, 18: 1466-1471. 10.1101/gr.074674.107.
Maglott D, Ostell J, Pruitt KD, Tatusova T: Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res. 2007, 35: D26-31. 10.1093/nar/gkl993.
Ciccarelli FD, Doerks T, von Mering C, Creevey CJ, Snel B, Bork P: Toward automatic reconstruction of a highly resolved tree of life. Science. 2006, 311: 1283-1287. 10.1126/science.1123061.
This study was supported by the Israel Science Foundation administered by the Israeli Academy for Sciences and Humanities. We thank Ruth Hershberg for her critical comments on the manuscript. We are grateful to all our lab members for fruitful discussions.
The authors declare that they have no competing interests.
MB carried out all analyses, conceived the work, analyzed the data and wrote the paper. HM conceived the work, analyzed the data and wrote the paper. All authors have read and approved the manuscript for publication.
Electronic supplementary material
Additional file 2: Table S1 - features and characteristics of the prokaryotes included in this study. This file contains data and annotations of the organisms included in the analysis (nam, tax ID, CAIave, median CAI, coefficient of variation (of CAI), Ncdiff, average Nc, environmental properties, if it is the representative subspecies of the species, super kingdom, GC content). (XLS 330 KB)
About this article
Cite this article
Botzman, M., Margalit, H. Variation in global codon usage bias among prokaryotic organisms is associated with their lifestyles. Genome Biol 12, R109 (2011) doi:10.1186/gb-2011-12-10-r109
- Codon Usage
- Phenotypic Trait
- Ribosomal Gene
- Synonymous Codon