Alignment-free sequence comparison: benefits, applications, and tools

Zielezinski, Andrzej; Vinga, Susana; Almeida, Jonas; Karlowski, Wojciech M.

doi:10.1186/s13059-017-1319-7

Table 1 Alignment-free sequence comparison tools available for next-generation sequencing data analysis

From: Alignment-free sequence comparison: benefits, applications, and tools

Category	Analysis	Tool	Primary features	Implementation	Reference	URL
Mapping	Transcript quantification	kallisto	Transcript abundance quantification from RNA-seq data (uses pseudoalignment for rapid determination of read compatibility with targets)	Software (C++)	[69]	https://pachterlab.github.io/kallisto/
		Sailfish	Estimation of isoform abundances from reference sequences and RNA-seq data (k-mer based)	Software (C++)	[67]	http://www.cs.cmu.edu/~ckingsf/software/sailfish/
		Salmon	Quantification of the expression of transcripts using RNA-seq data (uses k-mers)	Software (C++)	[70]	https://combine-lab.github.io/salmon/
		RNA-Skim	RNA-seq quantification at transcript-level (partitions the transcriptome into disjoint transcript clusters; uses sig-mers, a special type of k-mers)	Software (C++)	[68]	http://www.csbio.unc.edu/rs/
	Variant calling	ChimeRScope	Fusion transcript prediction using gene k-mers profiles of the RNA-seq paired-end reads	Software (Java)	[74]	https://github.com/ChimeRScope/ChimeRScope/wiki
		FastGT	Genotyping of known SNV/SNP variants directly from raw NGS sequence reads by counting unique k-mers	Software (C)	[73]	https://github.com/bioinfo-ut/GenomeTester4/
		Phy-Mer	Reference-independent mitochondrial haplogroup classifier from NGS data (k-mer based)	Software (Python)	[157]	https://github.com/danielnavarrogomez/phy-mer
		LAVA	Genotyping of known SNPs (dbSNP and Affymetrix's Genome-Wide Human SNP Array) from raw NGS reads (k-mer based)	Software (C)	[71]	http://lava.csail.mit.edu/
		MICADo	Detection of mutations in targeted third-generation NGS data (can distinguish patients’ specific mutations; algorithm uses k-mers and is based on colored de Bruijn graphs)	Software (Python)	[72]	http://github.com/cbib/MICADo
	General mapper	Minimap	Lightweight and fast read mapper and read overlap detector (uses the concept of “minimazers”, a special type of k-mers)	Software (C)	[77]	https://github.com/lh3/minimap
Assembly	De novo genome assembly	MHAP	Produces highly continuous assembly (fully resolved chromosome arms) from third-generation long and noisy reads (10 kbp) using a dimensionality reduction technique MinHash	Software (Java)	[76]	https://github.com/marbl/MHAP
		Miniasm	Assembler of long noisy reads (SMRT, ONT) using the Overlap-Layout Consensus (OLC) approach without the necessity of an error correction stage (uses minimap)	Software (C)	[77]	https://github.com/lh3/miniasm
		LINKS	Scaffolding genome assembly with error-containing long sequence (e.g., ONT or PacBio reads, draft genomes)	Software (Perl)	[75]	https://github.com/warrenlr/LINKS/
	Read clustering	afcluster	Clustering of reads from different genes and different species based on k-mer counts	Software (C++)	[158]	https://github.com/luscinius/afcluster
	Read clustering	QCluster	Clustering of reads with alignment-free measures (k-mer based) and quality values	Software (C++)	[159]	http://www.dei.unipd.it/~ciompin/main/qcluster.html
	Reads error correction	Lighter	Correction of sequencing errors in raw, whole genome sequencing reads (k-mer based)	Software (C++)	[94]	https://github.com/mourisl/Lighter
		QuorUM	Error corrector for Illumina reads using k-mers	Software (C++)	[93]	https://github.com/gmarcais/Quorum
		Trowel	Error corrector for Illumina reads using k-mers	Software (C++)	[95]	https://sourceforge.net/projects/trowel-ec/
Metagenomics	Assembly-free phylogenomics	AAF	Phylogeny reconstruction directly from unassembled raw sequence data from whole genome sequencing projects; provides bootstrap support to assess uncertainty in the tree topology (k-mer based)	Software (Python)	[78]	https://github.com/fanhuan/AAF
		kSNP v3	Reference-free SNP identification and estimation of phylogenetic trees using SNPs (based on k-mer analysis)	Software (C)	[80, 81]	https://sourceforge.net/projects/ksnp/files/
		NGS-MC	Phylogeny of species based on NGS reads using alignment-free sequence dissimilarity measures d₂* and d₂ ^S under different Markov chain models (using k-words)	R package	[79, 160]	http://www-rcf.usc.edu/~fsun/Programs/NGS-MC/NGS-MC.html
	Species identification/taxonomic profiling	CLARK	Taxonomic classification of metagenomic reads to known bacterial genomes using k-mer search and LCA assignment	Software (C++)	[84]	http://clark.cs.ucr.edu/
		FOCUS	Reports organisms present in metagenomic samples and profiles their abundances (uses composition-based approach and non-negative least squares for prediction)	Web service Software (Python)	[161]	http://edwards.sdsu.edu/FOCUS/
		GSM	Estimation of abundances of microbial genomes in metagenomic samples (k-mer based)	Software (Go)	[162]	https://github.com/pdtrang/GSM
		Mash	Species identification using assembled or unassembled Illumina, PacBio, and ONT data (based on MinHash dimensionality-reduction technique)	Software (C++)	[163]	https://github.com/marbl/mash
		Kraken	Taxonomic assignment in metagenome analysis by exact k-mer search; LCA assignment of short reads based on a comprehensive sequence database	Software (C++)	[83]	https://0-ccb-jhu-edu.brum.beds.ac.uk/software/kraken/
		LMAT	Assignment of taxonomic labels to reads by k-mers searches in precomputed database	Software (C++/Python)	[82]	https://sourceforge.net/projects/lmat/
		stringMLST	k-mer-based tool for MLST directly from the genome sequencing reads	Software (Python)	[86]	http://jordan.biology.gatech.edu/page/software/stringMLST
		Taxonomer	k-mer-based ultrafast metagenomics tool for assigning taxonomy to sequencing reads from clinical and environmental samples	Web service	[164]	http://taxonomer.iobio.io/
	Other	d2-tools	Word-based (k-tuple) comparison (pairwise dissimilarity matrix using d2S measure) of metatranscriptomic samples from NGS reads	Software (Python/R)	[56, 165]	https://code.google.com/p/d2-tools/
	Other	VirHostMatcher	Prediction of hosts from metagenomic viral sequences based on ONF using various distance measures (e.g., d₂)	Software (C++)	[153]	https://github.com/jessieren/VirHostMatcher
		MetaFast	Statistics calculation of metagenome sequences and the distances between them based on assembly using de Bruijn graphs and Bray–Curtis dissimilarity measure	Software (Java)	[166]	https://github.com/ctlab/metafast

The up-to-date list of currently available programs can be found at http://www.combio.pl/alfree/tools/. Accessed 23 August 2017
LCA lowest common ancestor, NGS next-generation sequencing, SNP single-nucleotide polymorphism, SNV single-nucleotide variant

Back to article page

ISSN: 1474-760X

Contact us

Submission enquiries: editorial@genomebiology.com
General enquiries: info@biomedcentral.com

Genome Biology

Contact us