Statistical inference of protein structural alignments using information and compression
James H. Collier, Lloyd Allison, Arthur M. Lesk, Peter J. Stuckey, Maria Garcia de la Banda, Arun S. Konagurthu
Bioinformatics (Oxford U.P.), 33(7), doi:10.1093/bioinformatics/btw757, pp.1005-1012, (online 4 January) 1 April 2017.
Abstract:
Motivation:
Structural molecular biology depends crucially on
computational techniques that compare protein three-dimensional structures and
generate structural alignments (the assignment of one-to-one correspondences
between subsets of amino acids based on atomic coordinates).
Despite its importance, the structural alignment problem has not
been formulated, much less solved, in a consistent and reliable way.
To overcome these difficulties, we present here a statistical framework for
the precise inference of structural alignments, built on the
Bayesian and information-theoretic principle of Minimum Message Length (MML).
The quality of any alignment is measured by its explanatory power –
the amount of lossless compression achieved to explain the
protein coordinates using that alignment.
Results:
We have implemented this approach in MMLigner,
the first program able to infer statistically significant
structural alignments. We also demonstrate the reliability of MMLigner's
alignment results when compared with the state of the art.
Importantly, MMLigner can also discover different structural alignments of
comparable quality, a challenging problem for oligomers and
protein complexes.
Availability and Implementation:
Source code, binaries and an interactive web version are available at
[lcb.infotech.monash.edu.au/mmligner].
Contact: arun.konagurthu of monash dot edu