Minimum Message Length Inference of Secondary Structure from Protein Coordinate Data
Arun S. Konagurthu, Arthur M. Lesk, Lloyd Allison
J. Bioinformatics, 28(12), pp.i97-i105, June 2012 (ISMB, July 2012) [doi:10.1093/bioinformatics/bts223]
Abstract:
Motivation:
Secondary structure underpins the folding pattern and
architecture of most proteins. Accurate assignment of the secondary
structure elements is therefore an important problem. Although
many approximate solutions of the secondary structure assignment
problem exist, the statement of the problem has resisted a consistent
and mathematically rigorous definition. A variety of comparative
studies have highlighted major disagreements in the way the available
methods define and assign secondary structure to coordinate data.
Results:
We report a new method to infer secondary structure
based on the Bayesian method of
Minimum Message Length (MML)
inference. It treats assignments of secondary structure as hypotheses
that explain the given coordinate data. The method seeks to maximise
the joint probability of a hypothesis and the data. There is a
natural null hypothesis and any assignment that cannot better it is
unacceptable. We developed a program SST based on this approach
and compared it to popular programs such as DSSP and STRIDE
amongst others. Our evaluation suggests that SST gives reliable
assignments even on low resolution structures.
Availability:
SST@[www]['12].