A Biological Compression Model and its Applications
Minh Duc Cao, Trevor I. Dix, and Lloyd Allison
'Software Tools and Algorithms for Biological Systems', Springer Verlag, Advances in Experimental Medicine and Biology (AEMB), vol.696, pp.657-666, April 2011, [doi:10.1007/978-1-4419-7046-6_67].
Abstract: A biological compression model, expert model, is presented which is superior to existing compression algorithms in both compression performance and speed. The model is able to compress whole eukaryotic genomes. Most importantly, the model provides a framework for knowledge discovery from biological data. It can be used for repeat element discovery, sequence alignment and phylogenetic analysis. We demonstrate that the model can handle statistically biased sequences and distantly related sequences where conventional knowledge discovery tools often fail.