From: Alignment-free sequence comparison: benefits, applications, and tools
Category | Analysis | Tool | Primary features | Implementation | Reference | URL |
---|---|---|---|---|---|---|
Mapping | Transcript quantification | kallisto | Transcript abundance quantification from RNA-seq data (uses pseudoalignment for rapid determination of read compatibility with targets) | Software (C++) | [69] | |
Sailfish | Estimation of isoform abundances from reference sequences and RNA-seq data (k-mer based) | Software (C++) | [67] | |||
Salmon | Quantification of the expression of transcripts using RNA-seq data (uses k-mers) | [70] | ||||
RNA-Skim | RNA-seq quantification at transcript-level (partitions the transcriptome into disjoint transcript clusters; uses sig-mers, a special type of k-mers) | Software (C++) | [68] | |||
Variant calling | ChimeRScope | Fusion transcript prediction using gene k-mers profiles of the RNA-seq paired-end reads | Software (Java) | [74] | ||
FastGT | Genotyping of known SNV/SNP variants directly from raw NGS sequence reads by counting unique k-mers | Software (C) | [73] | |||
Phy-Mer | Reference-independent mitochondrial haplogroup classifier from NGS data (k-mer based) | Software (Python) | [157] | |||
LAVA | Genotyping of known SNPs (dbSNP and Affymetrix's Genome-Wide Human SNP Array) from raw NGS reads (k-mer based) | Software (C) | [71] | |||
MICADo | Detection of mutations in targeted third-generation NGS data (can distinguish patients’ specific mutations; algorithm uses k-mers and is based on colored de Bruijn graphs) | Software (Python) | [72] | |||
General mapper | Minimap | Lightweight and fast read mapper and read overlap detector (uses the concept of “minimazers”, a special type of k-mers) | Software (C) | [77] | ||
Assembly | De novo genome assembly | MHAP | Produces highly continuous assembly (fully resolved chromosome arms) from third-generation long and noisy reads (10 kbp) using a dimensionality reduction technique MinHash | Software (Java) | [76] | |
Miniasm | Assembler of long noisy reads (SMRT, ONT) using the Overlap-Layout Consensus (OLC) approach without the necessity of an error correction stage (uses minimap) | Software (C) | [77] | |||
LINKS | Scaffolding genome assembly with error-containing long sequence (e.g., ONT or PacBio reads, draft genomes) | Software (Perl) | [75] | |||
Read clustering | afcluster | Clustering of reads from different genes and different species based on k-mer counts | Software (C++) | [158] | ||
QCluster | Clustering of reads with alignment-free measures (k-mer based) and quality values | Software (C++) | [159] | |||
Reads error correction | Lighter | Correction of sequencing errors in raw, whole genome sequencing reads (k-mer based) | Software (C++) | [94] | ||
QuorUM | Error corrector for Illumina reads using k-mers | Software (C++) | [93] | |||
Trowel | Software (C++) | [95] | ||||
Metagenomics | Assembly-free phylogenomics | AAF | Phylogeny reconstruction directly from unassembled raw sequence data from whole genome sequencing projects; provides bootstrap support to assess uncertainty in the tree topology (k-mer based) | Software (Python) | [78] | |
kSNP v3 | Reference-free SNP identification and estimation of phylogenetic trees using SNPs (based on k-mer analysis) | Software (C) | ||||
NGS-MC | Phylogeny of species based on NGS reads using alignment-free sequence dissimilarity measures d2* and d2 S under different Markov chain models (using k-words) | R package | ||||
Species identification/taxonomic profiling | CLARK | Taxonomic classification of metagenomic reads to known bacterial genomes using k-mer search and LCA assignment | Software (C++) | [84] | ||
FOCUS | Reports organisms present in metagenomic samples and profiles their abundances (uses composition-based approach and non-negative least squares for prediction) | Web service Software (Python) | [161] | |||
GSM | Estimation of abundances of microbial genomes in metagenomic samples (k-mer based) | Software (Go) | [162] | |||
Mash | Species identification using assembled or unassembled Illumina, PacBio, and ONT data (based on MinHash dimensionality-reduction technique) | Software (C++) | [163] | |||
Kraken | Taxonomic assignment in metagenome analysis by exact k-mer search; LCA assignment of short reads based on a comprehensive sequence database | Software (C++) | [83] | |||
LMAT | Assignment of taxonomic labels to reads by k-mers searches in precomputed database | Software (C++/Python) | [82] | |||
stringMLST | k-mer-based tool for MLST directly from the genome sequencing reads | Software (Python) | [86] | |||
Taxonomer | k-mer-based ultrafast metagenomics tool for assigning taxonomy to sequencing reads from clinical and environmental samples | Web service | [164] | |||
Other | d2-tools | Word-based (k-tuple) comparison (pairwise dissimilarity matrix using d2S measure) of metatranscriptomic samples from NGS reads | Software (Python/R) | |||
VirHostMatcher | Prediction of hosts from metagenomic viral sequences based on ONF using various distance measures (e.g., d2) | Software (C++) | [153] | |||
MetaFast | Statistics calculation of metagenome sequences and the distances between them based on assembly using de Bruijn graphs and Bray–Curtis dissimilarity measure | Software (Java) | [166] |