low information sequences

D. R. Powell, L. Allison, T. I. Dix and D. L. Dowe, Australian Computer Science Communications, Computing Theory '98, Proceedings of the Fourth Australasian Theory Symposium (CATS '98), Perth, Australia, Springer-Verlag, ISBN:9813083921, 20(:3): pp.215-229, February 1998

Abstract: Alignment of two random sequences over a fixed alphabet can be shown to be optimally done by a Dynamic Programming Algorithm (DPA). It is normally assumed that the sequences are random and incompressible and that one sequence is a mutation of the other. However, DNA and many other sequences are not always random and unstructured, and the issue arises as how to best align compressible sequences.

Assuming our sequences to be non-random and to emanate from mutations of a first order Markov model, we note that alignment of high information regions is more important than alignment of low information regions and arrive at a new alignment method for low information sequences which performs better than the standard DPA for data generated from mutations of a first order Markov model.

Keywords: Sequence Alignment, DNA, Biology, Information Theory.