U. Mückstein, I.L. Hofacker, P.F. Stadler
Bioinformatics 18: S153-S160 (2002)
Motivation: The level of sequence conservation between related nucleic acids or proteins often varies considerably along the sequence. Both regions with high variability (mutational hot-spots) and regions of almost perfect sequence identity may occur in the same pair of molecules. The reliability of an alignment therefore strongly depends on the level of local sequence similarity.
Results: The probability Pij of a match
between position i in the first and position j in the
second sequence is computed using the the partition function over all
canonical pairwise alignments. A probabilistic backtracking procedure
can then be used to generate ensembles of suboptimal alignments with
correct statistical weights.
A comparison between structure based alignments and large samples of stochastic alignments shows that the ensemble contains correct alignments with significant probabilities even though the optimal alignment deviates significantly from the structural alignment. Ensembles of suboptimal alignments obtained by stochastic backtracking, or the match probability matrices themselves, are therefore promising starting points for improved iterative multiple alignment procedures. In particular, it should be possible to overcome the problem of fixating an incorrect pairwise alignment in an early iteration.
Availability The software described in this contribution is available for downloading at http://www.tbi.univie.ac.at/~ulim/probA.
Keywords: Alignments, Partition Function, Stochastic
Return to 2002 working papers list.