On the identification of outliers in a simple model

C. S. Wallace

We suppose that we are given a set of 'N' observations {yi, i=1,...,N} which are thought to arise independently from some process of known form and unknown (vector) parameter θ. However, we may have reason to suspect that some small fraction of the N observations are in some sense contaminated or erroneous, i.e., that they arise from a process different from the main process. Any such observation is called an "outlier". We will then be interested in methods for identifying or at least estimating the number of the outliers, and for estimating θ in a way which is minimally upset by the outliers. . . .
. . . paper: [Outlier.pdf].



BTW
(i) "The method described in this paper was used in a more general setting in:
M. Byrne, 'A data mining investigation into pavement roughness using minimum message length inference', PhD thesis, Monash U., 2007."
(ii) "I believe that Chris wrote this in 1982. With the original manuscript I found some computer output dated August 1982, that contained the results of the different models he was analysing in the paper. Also, as you are aware some of the material in this paper later appeared in a technical report "Inference and Estimation by Compact Coding", which is dated August 1984."
— D. Albrecht, 2011.
(DA added the last ref to the 1984 T.R. — L.A., 5/2011)