Skip to main content
Figure 2 | Genome Biology

Figure 2

From: Having a BLAST with bioinformatics (and avoiding BLASTphemy)

Figure 2

The PAM250 matrix with the amino acids grouped according to the chemistry of the side chain. The numbers indicate how to score the alignment of any given amino acid (taken from one axis) with any other amino acid (taken from the other axis). Each value in the matrix is calculated by dividing the frequency with which one amino acid is observed to be replaced by another in related proteins separated by one evolutionary step (based on phylogenetic trees) by the probability that the same two amino acids might align by chance, giving what is called the relatedness odds score. The more common the amino acids in an aligned pair, the higher the probability of a chance alignment, indicating a less significant alignment. The ratio is then converted to a logarithm (which allows the individual pair scores in an alignment to be added rather than multiplied) and expressed as what is called a log odds score. PAM matrices are usually scaled in 10 log10 units, which is roughly the same as third-bit units. (b) The BLOSUM62 matrix with the amino acids in the table grouped according to the chemistry of the side chain, as in (a). Each value in the matrix is calculated by dividing the frequency of occurrence of the amino acid pair in the BLOCKS database, clustered at the 62% level, divided by the probability that the same two amino acids might align by chance. The ratio is then converted to a logarithm and expressed as a log odds score, as for PAM. BLOSUM matrices are usually scaled in half-bit units. A score of zero indicates that the frequency with which a given two amino acids were found aligned in the database was as expected by chance, while a positive score indicates that the alignment was found more often than by chance, and a negative score indicates that the alignment was found less often than by chance.

Back to article page