An MML Classification of Protein Sequences that knows about Angles and Sequences.

T. Edgoose, L. Allison, D. L. Dowe, Pacific Symp. on Biocomputing 98, pp.585-596, January 1998

Abstract: The MML classification program, Snob, deals with mixture modelling (or clustering) of circular data. It has recently been extended to do Markov modelling of the serial correlation between clusters such as modelling the fact that a Helix cluster favours being followed by another Helix cluster. Such a model is better known as a Hidden Markov Model. The search for the most appropriate secondary structure classification of protein data is of significant importance and was addressed by Hunter and States (1992) using the Bayesian classifier, AutoClass, on Cartesian co-ordinate data of protein residues. Dowe et al. (1996) improved upon this earlier work by using Snob to cluster dihedral angle data, with the advantage that 3 x 3 = 9 Cartesian co-ordinates can be represented by the 2 orientation-invariant angles, phi and psi. The Hidden Markov Model used here is shown to be a more appropriate way again of modelling protein data and results in the selection of a simpler class model with 17 structure classes. We report on this classification, including the class transition matrix, and relate it back to the amino-acid sequence and the simple Helix, Beta, Turn classification. We find 3 types of Helix, 2 types of Beta and many types of Turn. The msot numerous Turn class defines a continuous flexible structure that is negatively correlated to all the other classes

clustering protein dihedral angles

paper.pdf or stanford.