Change-Point Estimation Using New Minimum Message Length Approximations

Leigh J. Fitzgibbon David L. Dowe, and Lloyd Allison, Proc. Seventh Pacific Rim International Conference on Artificial Intelligence, National Center of Sciences, Tokyo, Japan, Springer, LNCS (LNAI vol.2417), doi:10.1007/3-540-45683-X_28, August 18-22, 2002

Abstract. This paper investigates the coding of change-points in the information-theoretic Minimum Message Length (MML) framework. Change-point coding regions affect model selection and parameter estimation in problems such as time series segmentation and decision trees. The Minimum Message Length (MML) and Minimum Description Length (MDL78) approaches to change-point problems have been shown to perform well by several authors. In this paper we compare some published MML and MDL78 methods and introduce some new MML approximations called `MMLDc' and `MMLDF'. These new approximations are empirically compared with Strict MML (SMML), Fairly Strict MML (FSMML), MML68, the Minimum Expected Kullback-Leibler Distance (MEKLD) loss function and MDL78 on a tractable binomial change-point problem.


References

  1. Wallace, C.S., Boulton, D.M.: An information measure for classification. Computer Journal 11 (1968) pp185-194
  2. Wallace, C.S., Freeman, P.R.: Estimation and inference by compact encoding (with discussion). Journal of the Royal Statistical Society. Series B (Methodological) 49 (1987) pp240-265
  3. Wallace, C.S., Dowe, D.L.: Minimum message length and Kolmogorov complexity. Computer Journal 42 (1999) pp270-283
  4. Baxter, R.A., Oliver, J.J.: The kindest cut: minimum message length segmentation. In Arikawa, S., Sharma, A.K., eds.: Proceedings of the Seventh International Workshop on Algorithmic Learning Theory. Volume 1160 of LNCS., Springer-Verlag Berlin (1996) pp83-90
  5. Oliver, J.J., Forbes, C.S.: Bayesian approaches to segmenting a simple time series. Technical Report 97/336, Department Computer Science, Monash University, Australia 3168 (1997)
  6. Oliver, J.J., Baxter, R.A., Wallace, C.S.: Minimum message length segmentation. In Wu, X., Kotagiri, R., Korb, K., eds.: Research and Development in Knowledge Discovery and Data Mining (PAKDD-98), Springer (1998) pp83-90
  7. Viswanathan, M., Wallace, C.S., Dowe, D.L., Korb, K.: Finding cutpoints in noisy binary sequences - a revised empirical evaluation. In: Australian Joint Conference on Artificial Intelligence. (1999)
  8. Fitzgibbon, L.J., Allison, L., Dowe, D.L.: Minimum message length grouping of ordered data. In Arimura, H., Jain, S., eds.: Proceedings of the Eleventh International Conference on Algorithmic Learning Theory (ALT2000). LNAI, Springer-Verlag Berlin (2000) pp56-70
  9. Farr, G.E., Wallace, C.S.: Algorithmic and combinatorial problems in strict minimum message length inference. In: Research on Combinatorial Algorithms. (1997) pp50-58
  10. Dowe, D.L., Baxter, R.A., Oliver, J.J., Wallace, C.S.: Point estimation using the Kullback-Leibler loss function and MML. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD98). Volume LNAI of 1394., Springer-Verlag (1998) pp87-95
  11. Baxter, R.A.: Minimum Message Length Inductive Inference: Theory and Applications. PhD thesis, Department of Computer Science, Monash University (1996)
  12. Wallace, C.S.: PAKDD-98 Tutorial: Data Mining. Monash University, Australia (Book in preparation) (1998)
  13. Fisher, W.D.: On grouping for maximum homogeneity. Journal of the American Statistical Society 53 (1958) pp789-798
  14. Kearns, M., Mansour, Y., Ng, A.Y., Ron, D.: An experimental and theoretical comparison of model selection methods. Machine Learning 27 (1997) pp7-50
  15. Lam, E.: Improved approximations in MML. Honours thesis, Monash University, School of Computer Science and Software Engineering, Monash University, Clayton, Australia (2000)
  16. Rissanen, J.J.: Modeling by shortest data description. Automatica 14 (1978) pp465-471