Skip to main content
Fig. 4 | Genome Biology

Fig. 4

From: Comparison of gene clustering criteria reveals intrinsic uncertainty in pangenome analyses

Fig. 4

Systematic and specific biases in functional profiles associated with paralog splitting criteria. a Inconsistency of ORF assignations into OGC (normalized variation of information, top), and fraction of OGC that exactly contain the same ORFs (bottom) under synteny and orthology splitting criteria, stratified by functional category. b Absolute number (top) and relative fraction (bottom) of synteny- and orthology-based OGC associated with each functional category. c Balances (quantified as isometric log ratios) for the functional categories that show the greatest systematic variation between paralog splitting criteria. Each set of boxplots represents the balance between the relative abundances of a group of functional categories (shown below) and all the remaining categories not considered in previous sets. Each data point corresponds to the pangenome of one species; boxes span the 25–75 percentiles; the central line indicates the median; whiskers extend to the most extreme data points that are not outliers; isolated points denote outliers. d Standardized residuals (Z-scores) of the linear mixed effects model used to infer the systematic differences shown in c. Each row corresponds to the pangenome of one species, sorted according to the GDTB species tree [48] (phyla colored as in Fig. 1). Colored cells indicate a significant excess of synteny- (green) or orthology-based (purple) OGC from a given category in a specific pangenome that is not explained by the general trends in c. Abbreviations of functional categories, C: energy production and conversion; D: cell cycle control, cell division, chromosome partitioning; E: amino acid transport and metabolism; F: nucleotide transport and metabolism; G: carbohydrate transport and metabolism; H: coenzyme transport and metabolism; I: lipid transport and metabolism; J: translation, ribosomal structure and biogenesis; K: transcription; L: replication, recombination and repair; M: cell wall/membrane/envelope biogenesis; N: cell motility; O: posttranslational modification, protein turnover, chaperones; P: inorganic ion transport and metabolism; Q: biosynthesis, transport, and catabolism of 2º metabolites; R: general function prediction only; S: function unknown; T: signal transduction mechanisms; U: intracellular trafficking, secretion, and vesicular transport; V: defense mechanisms; X: mobilome: prophages, transposons

Back to article page