Skip to main content
Fig. 1 | Genome Biology

Fig. 1

From: Benchmarking bioinformatic virus identification tools using real-world metagenomic data across biomes

Fig. 1

Real-world metagenomic data benchmarking pipeline. A Samples from three different biomes were size fraction filtered through 0.22 μm filters to obtain microbial- (> 0.22 μm) and viral-enriched fractions (< 0.22 μm). B DNase treatment was performed in viral-enriched fractions to remove free DNA before viral lysis. C DNA was separately extracted, purified, and sequenced from microbial- and viral-enriched fractions to obtain viral and microbial datasets. D Sequenced DNA reads were quality-controlled and assembled into longer contigs. Contigs with lengths shorter than 1500 bp were excluded from downstream analysis. E Homologous contigs between viral and microbial datasets were found using minimap2 and removed. Unique viral fraction contigs and unique microbial fraction contigs were used as ground truth positives and negatives, respectively. F Nine bioinformatic virus identification tools were applied to these datasets. Tool names are colored based on algorithms: convolutional neural network tools (red), other machine learning tools (green), and homology-only tools (blue). Viral contigs that are identified as viral and non-viral are considered as true positives and false negatives, respectively. Microbial contigs that are identified as viral and non-viral are false positives and true negatives, respectively

Back to article page