Design of an Efficient Out-of-Core Read Alignment Algorithm
Arun Konagurthu, Lloyd Allison, Thomas Conway, Bryan Beresford-Smith and Justin Zobel
Springer Verlag, LNCS/LNBI 6293, pp.189-201, 2010, doi:10.1007/978-3-642-15294-8_16
Abstract: New genome sequencing technologies are poised to enter the sequencing landscape with significantly higher throughput of read data produced at unprecedented speeds & lower costs per run. However, current in-memory methods to align a set of reads to one or more reference genomes are ill-equipped to handle the expected growth of read-throughput from newer technologies. ... reports the design of a new out-of-core read mapping alg., Syzygy, which can scale to large volumes of read & genome data. The alg. is designed to run in a constant, user-stipulated amount of main memory - small enough to fit on standard desktops - irrespective of the sizes of read & genome data. Syzygy achieves a superior spatial locality-of-reference that allows all large data structures used in the alg. to be maintained on disk. We compare our prototype implementation with several popular read alignment programs. Our results demonstrate clearly that Syzygy can scale to very large read volumes while using only a fraction of memory in comparison, without sacrificing performance.