Alignment of low information sequences

LA home
Computing
Publications
CATS98
paper.ps

also see
M-align
CompJ99
and
Bioinformatics

D. R. Powell, L. Allison, T. I. Dix and D. L. Dowe,
Australian Computer Science Communications, Computing Theory '98,
Proceedings of the Fourth Australasian Theory Symposium (CATS '98),
Perth, Australia, 2-3 February 1998,
Springer-Verlag, Singapore, ISBN:98103083-92-1, 20:3: pp215-229

Abstract: Alignment of two random sequences over a fixed alphabet can be shown to be optimally done by a Dynamic Programming Algorithm (DPA). It is normally assumed that the sequences are random and incompressible and that one sequence is a mutation of the other. However, DNA and many other sequences are not always random and unstructured, and the issue arises as how to best align compressible sequences.

Assuming our sequences to be non-random and to emanate from mutations of a first order Markov model, we note that alignment of high information regions is more important than alignment of low information regions and arrive at a new alignment method for low information sequences which performs better than the standard DPA for data generated from mutations of a first order Markov model.

Keywords: Sequence Alignment, DNA, Biology, Information Theory.

www #ad:

The C++ Cookbook

↑ © L. Allison, www.allisons.org/ll/ (or as otherwise indicated).
Created with "vi (Linux)", charset=iso-8859-1, fetched Friday, 19-Apr-2024 09:36:58 UTC.

Free: Linux, Ubuntu operating-sys, OpenOffice office-suite, The GIMP ~photoshop, Firefox web-browser, FlashBlock flash on/off.