SMML

Strict MML (SMML)

Strict Minimum Message Length (SMML) inference (Wallace & Boulton 1975) constructs a mapping from the data space to the set of models (parameters) so as to minimise the expected length of a two-part message: ‘model; (data|model)’. Note that the mapping defines a partition of the data space, each part being the data values that map to a particular model (parameter value). SMML estimators are, in an important sense, as good as it is possible to be (Wallace 1989, 1996).

SMML is invariant and consistent, and handles model selection, parameter estimation and hypothesis testing. Unfortunately SMML inference is NP-hard for most problems, although a polynomial-time algorithm exists for the Binomial distribution (Farr & Wallace 1997, 2002). Fortunately, MML (Wallace & Boulton 1968, Wallace & Freeman 1987) is a feasible approximation to SMML.

G. E. Farr & C. S. Wallace. The Complexity of Strict Minimum Message Length Inference. The Computer Journal 45(3), pp.285-292, 2002. Also TR 97/321, Department of Computer Science, Monash University (Clayton), Victoria 3168, Australia. 11 August 1997.
(Also see [Binomial].)
C. S. Wallace, False Oracles and SMML Estimators, Proc. Int. Conf. on Information, Statistics and Induction in Science (ISIS), pp.304-316, 1996,
(based on TR 128, Dept. Comp. Sci., Monash Univ., 1989).
C. S. Wallace & D. M. Boulton. An information measure for classification. Computer Journal, 11(2), pp.185-194, 1968.
C. S. Wallace & D. M. Boulton. An invariant Bayes method for point estimation. Classification Soc. Bulletin. 3, pp.11-34, 1975.
C. S. Wallace & P. R. Freeman. Estimation and inference by compact coding. J. Royal Stat. Soc. B 49(3), pp.230-265, 1987.