Skip to main content
Fig. 4 | Genome Biology

Fig. 4

From: Microbiome meta-analysis and cross-disease comparison enabled by the SIAMCAT machine learning toolbox

Fig. 4

Large-scale application of the SIAMCAT machine learning workflow to human gut metagenomic disease association studies. a Application of SIAMCAT machine learning workflows to taxonomic profiles generated from fecal shotgun metagenomes using the mOTUs2 profiler. Cross-validation performance for discriminating between diseased patients and controls quantified by the area under the ROC curve (AUROC) is indicated by diamonds (95% confidence intervals denoted by horizontal lines) with sample size per dataset given as additional panel (cut at N = 250 and given by numbers instead) (see Table 1 and Additional file 2: Table S1 for information about the included datasets and key for disease abbreviations). b Application of SIAMCAT machine learning workflows to functional profiles generated with eggNOG 4.5 for the same datasets as in a (see Additional file 1: Figure S4, S7 for additional types of and comparison between taxonomic and functional input data). c Cross-validation accuracy of SIAMCAT machine learning workflows as applied to 16S rRNA gene amplicon data for human gut microbiome case-control studies [20] (see a for definitions). d Influence of different parameter choices on the resulting classification accuracy. After training a linear model to predict the AUROC values for each classification task, the variance explained by each parameter was assessed using an ANOVA (see the “Methods” section) (see Fig. 1 for the definition of boxplots). e Performance comparison of machine learning algorithms on gut microbial disease association studies. For each machine learning algorithm, the best AUROC values for each task are shown as boxplots (defined as in d). Generally, the choice of algorithm only has a small effect on classification accuracy, but both the Elastic Net and LASSO performance gains are statistically significant (paired Wilcoxon test: LASSO vs Elastic Net, P = 0.001; LASSO vs random forest, P = 1e−08; Elastic Net vs random forest, P = 4e−14)

Back to article page