On the identification of outliers in a simple model
We suppose that we are given a set of 'N' observations
{yi, i=1,...,N}
which are thought to arise independently from some
process of known form and unknown (vector)
parameter θ.
However, we may have reason to suspect that some small
fraction of the N observations are in some sense contaminated or
erroneous, i.e., that they arise from a process different from the
main process. Any such observation is called an "outlier".
We will then be interested in methods for identifying or at least estimating
the number of the outliers, and for estimating θ in a way which
is minimally upset by the outliers. . . .
. . .
paper: [Outlier.pdf].
BTW
- (i) "The method described in this paper
was used in a more general setting in:
- M. Byrne, 'A data mining investigation into pavement roughness using minimum message length inference', PhD thesis, Monash U., 2007."
- (ii) "I believe that Chris wrote this in 1982. With the original manuscript I found some computer output dated August 1982, that contained the results of the different models he was analysing in the paper. Also, as you are aware some of the material in this paper later appeared in a technical report "Inference and Estimation by Compact Coding", which is dated August 1984."
- — D. Albrecht, 2011.
- (DA added the last ref to the 1984 T.R. — L.A., 5/2011)
- M. Byrne, 'A data mining investigation into pavement roughness using minimum message length inference', PhD thesis, Monash U., 2007."