Molecular and Biochemical Parasitology, 118(2), pp.175-186, 2001

Discovering patterns in Plasmodium falciparum genomic DNA

LA home
Computing
Publications
 MBP'01

Also see
 BMCbioinf07
Bioinformatics
 compression

Linda Sterna, Lloyd Allisonb, Ross L. Coppelc and Trevor I. Dixb

(a) Department of Computer Science and Software Engineering, The University of Melbourne, Melbourne, Victoria 3010, Australia.
(b) School of Computer Science and Software Engineering, Monash University, Clayton, Victoria 3800, Australia.
(c) Department of Microbiology, Monash University, Clayton, Victoria 3800, Australia.

Link: [doi:10.1016/S0166-685 1(01)00388-7] [5/'03] with full text in pdf.

Abstract: A method has been developed for discovering patterns in DNA sequences. Loosely based on the well-known Lempel Ziv model for text compression, the model detects repeated sequences in DNA. The repeats can be forward or inverted, and they need not be exact. The method is particularly useful for detecting distantly related sequences, and for finding patterns in sequences of biased nucleotide composition, where spurious patterns are often observed because the bias leads to coincidental nucleotide matches. We show here the utility of the method by applying it to genomic sequences of Plasmodium falciparum. A single scan of chromosomes 2 and 3 of P. falciparum, using our method and no other a priori information about the sequences, reveals regions of low complexity in both telomeric and central regions, long repeats in the subtelomeric regions, and shorter repeat areas in dense coding regions. Application of the method to a recently sequenced contig of chromosome 10 that has a particularly biased base composition detects a long internal repeat more readily than does the conventional dot matrix plot. Space requirements are linear, so the method can be used on large sequences. The observed repeat patterns may be related to large-scale chromosomal organization and control of gene expression. The method has general application in detecting patterns of potential interest in newly sequenced genomic material.

Keywords: Compression; information theory; repeated sequences; pattern discovery.

www #ad:

↑ © L. Allison, www.allisons.org/ll/   (or as otherwise indicated).
Created with "vi (Linux)",  charset=iso-8859-1,   fetched Thursday, 28-Mar-2024 20:26:47 UTC.

Free: Linux, Ubuntu operating-sys, OpenOffice office-suite, The GIMP ~photoshop, Firefox web-browser, FlashBlock flash on/off.