Skip to main content

Table 2 Alignment-free sequence comparison tools available for research purposes

From: Alignment-free sequence comparison: benefits, applications, and tools

Category

Name

Features

Implementation

Reference

URL

Pairwise and multiple sequence comparison

ALF

Calculation of pairwise similarity scores (using N2 measure) for sequences in fasta file

Software (C++)

[101]

https://github.com/seqan/seqan/tree/master/apps/alf

Alfree

25 word-based measures, 8 IT-based measures, 3 graph-based measures, W-metric

Web service Software (Python)

This article

http://www.combio.pl/alfree

decaf + py

13 word-based measures, Lempel–Ziv complexity-based measure, average common substring distance, W-metric

Software (Python)

[52, 53]

http://bioinformatics.org.au/tools/decaf+py/

multiAlignFree

Multiple alignment-free sequence comparison using five word-based statistics

R package

[167]

http://www-rcf.usc.edu/~fsun/Programs/multiAlignFree/

NASC

Non-aligned sequence comparison: four word-based measures and 2 IT-based measures

Matlab framework

[38]

http://web.ist.utl.pt/susanavinga/NASC/

Whole-genome phylogeny

ALFRED ALFRED-G

Phylogenetic tree reconstruction based on the average common substring approach

Software (C++)

[168, 169]

http://alurulab.cc.gatech.edu/phylo

andi

Computation of evolutionary distances between closely related genomes by approximation of local alignments (k-mer based da measure); scalable to thousands of bacterial genomes

Software (C)

[170]

https://github.com/evolbioinf/andi/

CAFE

Alignment-free analysis platform for studying the relationships among genomes and metagenomes (offers 28 word-based dissimilarity measures)

Software (C)

[171]

https://github.com/younglululu/CAFE

CVTree3

Phylogeny reconstruction from whole genome sequences based on word composition

Web service

[172, 173]

http://tlife.fudan.edu.cn/cvtree3

DLTree

Automated whole genome/proteome-based phylogenetic analysis based on alignment-free dynamical language method

Web Service

[174]

http://dltree.xtu.edu.cn

FFP

Feature frequency profile-based measures for whole genome/proteome comparisons (from viral to mammalian scale)

Software (C/Perl)

[34, 55, 112]

https://sourceforge.net/projects/ffp-phylogeny/

jD2Stat (JIWA)

Generation of the distance matrix using D 2 statistics to extract k-mers from large-scale unaligned genome sequences

Software (Java)

[54]

http://bioinformatics.org.au/tools/jD2Stat/

kr

Efficient word-based estimation of mutation distances from unaligned genomes

Software (C)

[175]

http://guanine.evolbio.mpg.de/cgi-bin/kr2/kr.cgi.pl

FSWM/kmacs/Spaced

Three tools for alignment-free sequence comparison based on inexact word matches

Software (C++) Web service

[36, 176]

Software currently unavailable

Software currently unavailable

Software currently unavailable

SlopeTree

Whole genome phylogeny that corrects for HGT

Software (C++)

 

http://prodata.swmed.edu/download/pub/slopetree_v1/

Underlying Approach

Phylogeny of whole genomes using composition of subwords

Software (Java)

[139]

http://www.dei.unipd.it/~ciompin/main/underlying.html

Sequence similarity search tool

RAFTS3

Searches of similar protein sequences against a protein database (>300 times faster than BLAST)

Matlab

[177]

https://sourceforge.net/projects/rafts3/

Annotation of long non-coding RNA

FEELnc

Prediction of lncRNAs from RNA-seq samples based word frequencies and relaxed open reading frames

Software (Perl/R)

[178]

https://github.com/tderrien/FEELnc

lncScore

Identification of long non-coding RNA from assembled novel transcripts

Software (Python)

[152]

https://github.com/WGLab/lncScore

Horizontal gene transfer

alfy

Alignment-free local homology calculation for detecting horizontal gene transfer

Software (C)

[104, 109]

http://guanine.evolbio.mpg.de/alfy/

rush

Detection of recombination between two unaligned DNA sequences

Software (C)

[105]

http://guanine.evolbio.mpg.de/rush/

Smash

Identification and visualization of DNA rearrangements between pairs of sequences

Software (C)

[179]

http://bioinformatics.ua.pt/software/smash/

TF-IDF

Detection of HGT regions and the transfer direction in nucleotide/protein sequences

Software (C++)

[110, 180]

https://github.com/congyingnan/TF-IDF

Regulatory elements

D2Z

Identification of functionally related homologous regulatory elements

Software (Perl)

[102]

http://veda.cs.uiuc.edu/d2z/

MatrixREDUCE

Prediction of functional regulatory targets of TFs by predicting the total affinity of each promoter and orthologous promoters

Software (Python)

[181]

https://systemsbiology.columbia.edu/matrixreduce

RRS

Detection of functionally similar group of enhancers and their regions

Software (Perl/C)

[182]

http://goo.gl/7gW578

Sequence clustering

d2_cluster

Word-based clustering EST and full-length cDNA sequences

Software (C)

[123]

https://github.com/shaze/wcdest/

d2-vlmc

Word-based clustering of metatranscriptomic samples using variable length Markov chains

Software (Python)

[183]

https://d2vlmc.codeplex.com/

mBKM

Clustering of DNA sequences using Shannon entropy and Euclidean distance

Software (Java)

[124]

https://github.com/Huiyang520/DMk-BKmeans

kClust

Large-scale clustering of protein sequences (down to 20–30% sequence identity)

Software (C++)

[125]

https://github.com/soedinglab/kClust

Other

COMET

Rapid classification of HIV-1 nucleotide sequences into subtypes based on prediction by partial matching compression

Web service

[184]

https://comet.lih.lu/

PPI

Identification of protein–protein interaction by coevolution analysis using discrete Fourier transform

Software (Python)

[185]

https://github.com/cyinbox/PPI

VaxiJen

Antigen prediction based on uniform vectors of principal amino acid properties

Web service

[127]

http://www.ddg-pharmfac.net/vaxijen/VaxiJen/VaxiJen.html

  1. The up-to-date list of currently available programs can be found at http://www.combio.pl/alfree/tools/. Accessed 23 August 2017
  2. HGT horizontal gene transfer, IT information theory