Skip to main content
Fig. 3 | Genome Biology

Fig. 3

From: Comparison of gene clustering criteria reveals intrinsic uncertainty in pangenome analyses

Fig. 3

Comparison of core gene sets obtained from different OGC building methods. a Distribution of the mean copy number per core OGC per genome. b Fraction of all single-copy core (scc) gene families that are exclusively recovered through orthology- or synteny-based paralog splitting. c Distribution of synteny-exclusive, single-copy core OGC among orthology-based OGC. The large bar at “1” implies that most synteny-exclusive single-copy core OGC are subsets of larger orthology-based OGC that are core but not single-copy. d Distribution of orthology-exclusive, single-core OGC among synteny-based OGC. Most of the single-copy core OGC exclusively supported by orthology combine 2 accessory synteny-based OGC. In c and d, gray bars denote method-exclusive single-copy core OGC that include sequences that could not be processed by the other method. e Nucleotide sequence divergence in single-copy core OGC exclusively supported by orthology, synteny, or both criteria. To account for between-species differences in evolutionary rates and phylogenetic tree spans, values were normalized by the species-level mean divergence (calculated from all single-copy core OGC that were supported by both synteny and orthology). Vertical lines indicate the distribution means for each group of single-copy core OGC. f Functional differences among single-copy core OGC supported by different criteria. Each set of box plots represents the balance (measured as the isometric log ratio) between the relative frequencies of a given functional category (x-axis) and the remaining categories not considered in previous sets (e.g., the second set of box plots corresponds to the balance between functional category N and all the rest except X). The figure shows the 4 ILR balances with the greatest variation across methods. Abbreviations of functional categories, X: mobilome; N: cell motility; Q: biosynthesis, transport, and catabolism of 2º metabolites; J: translation, ribosomal structure and biogenesis. In b and f, each data point corresponds to one species; boxes span the 25–75 percentiles; the central line indicates the median; whiskers extend to the most extreme data points that are not outliers; isolated points denote outliers; notches (only in f) show the 95% confidence interval of the median

Back to article page