## Circular Clustering of Protein Dihedral Angles by Minimum Message Length

#### D. L. Dowe, L. Allison, T. Dix, L. Hunter, C. S. Wallace, T. Edgoose, Pacific Symp. on Biocomputing, 1996 (PSB96)

**Abstract**:
Early work on proteins identified the existence of helices and
extended sheets in protein secondary structures, a high-level classification
which remains popular today. Using the Snob program for information-theoretic
Minimum Message Length (MML) classification, we are able to take the
protein dihedral angles as determined by X-ray crystallography, and
cluster sets of dihedral angles into groups. Previous work by
Hunter and States has applied a similar Bayesian classification method,
AutoClass, to protein data with site position represented by
3 Cartesian coordinates for each of the α-Carbon, β-Carbon and
Nitrogen, totalling 9 coordinates. By using the von Mises
circular distribution in the Snob program, we are instead able to
represent local site properties by the two dihedral angles,
φ and ψ. Since each site can be modelled as having 2 degrees
of freedom, this orientation-invariant dihedral angle representation
of the data is more compact than that of nine highly-correlated
Cartesian coordinates. Using the information-theoretic message length
concepts discussed in the paper, such a more concise model is more
likely to represent the underlying generating process from which
the data came. We report on the results of our classification,
plotting the classes in (φ,ψ) space; and introducing a
symmetric information-theoretic distance measure to build a minimum
spanning tree between the classes. We also give a transition matrix
between the classes and note the existence of three classes in
the region φ ~ -1.09 rad and ψ ~ -0.75 rad which are close on
the spanning tree and have high inter-transition probabilities.
This gives rise to a tight, abundant and self-perpetuating structure.

[paper.ps], or [www] for the pdf of the paper [2/'01] in [proc.].