Normal Distribution (2)
In this page:
maximum likelihood (ML), ML-estimators,
MML, Fisher information, MML-estimators,
measurement accuracy |
Maximum Likelihood
The negative log likelihood, L, for `n' observations assumed to come from
a normal distribution, Nmu,sigma, is:
n 1 (xi-mu)2 L = -log{ PROD ----------------.exp(--------) } i=1 sqrt(2 pi) sigma 2 sigma2 |
n n 1 n = -.log(2 pi) + -.log(sigma2) + -------.SUM (xi-mu)2 2 2 2sigma2 i=1 |
Maximum Likelihood estimator for mu
Differentiating with respect to mu
d L 1 d n ---- = -------.-----{ SUM (xi-mu)2 } d mu 2sigma2 d mu i=1 |
= (n.mu - (x1+ ... + xn)) / sigma2 |
Setting this to zero gives the maximum likelihood estimator for mu
muML = (x1+ ... +xn)/n |
i.e. the (sample-) mean.
Maximum Likelihood estimator for the variance (& sigma)
Differentiating L w.r.t.
d L n 1 n --- = --- - ----.SUM (xi-mu)2 d v 2.v 2.v2 i=1 |
setting this to zero:
n vML = SUM (xi-muML)2/n i=1 |
the maximum likelihood estimate
for the variance
Note that if n=0, the estimate is zero, and
that if n=2 the estimate effectively assumes that the mean lies
between x1 and x2 which is clearly not necessarily
the case,
Minimum Message Length (MML)
Wallace and Boulton (1968) derived the uncertainty region for the [normal distribution] from first principles. Later it was seen to be a special case of a general form using the [Fisher] information.
Fisher Information
The off-diagonal term of the
Fisher information is given by the expectation of:
d2L -------- = - (n.mu - (x1+ ... +xn)) / v2 d mu d v |
and in expectation (i.e. on average), this is zero.
The second derivative of L w.r.t. mu is:
d2L ----- = n/v = n/sigma2 d mu2 |
The second derivative of L w.r.t. v is:
d2L n 1 n ---- = - ---- + --.SUM (xi-mu)2 d v2 2.v2 v3 i=1 |
and in expectation this is
n n v - ---- + --- = n/(2.v2) = n/(2.sigma4) 2.v2 v3 |
The Fisher information is therefore
n/(2.v3) = n2/(2.sigma6) |
- (Note, the above is with respect to mu and v.
Now
v = sigma2, sod v / d sigma = 2.sigma. - To calculate the Fisher information with respect to mu and sigma, the above must be multiplied by
(d v / d sigma)2 , which gives2.n2/sigma4, - as can also be confirmed by forming
d L / d sigma andd2 L / d sigma2 directly. [--L.A. 1/12/2003]) - To calculate the Fisher information with respect to mu and sigma, the above must be multiplied by
Minimum Message Length Estimators
msgLen = -log(h(mu,v)) + L +(1/2).log(F) + constant | |
= -log(h(mu,v)) + (n/2)log(2pi) + (n/2)log(v) + (1/2v).SUM(xi-mu)2 + (1/2)log(n2/2) - (3/2)log(v) + constant |
--h --L --F |
differentiate w.r.t. mu:
d msgLen d n -------- = - ----(log h(mu,v)) + -.(mu-(x1+...+xn)/n) d mu d mu v |
and w.r.t. v:
d msgLen d n-3 1 -------- = - ---(log h(mu,v)) + --- - ---SUM (xi-mu)2 d v d v 2.v 2v2 |
If the prior is
h(mu,v) ~ 1/v,
d h/d mu = 0
muMML = (x1+ ... +xn)/n = muML |
With such a prior,
d h/d v
~ -1/v2,
d msgLen 1 n-3 1 -------- = - + --- - ---.SUM (xi-mu)2 d v v 2.v 2v2 |
n-1 1 = --- - ---.SUM (xi-mu)2 2.v 2v2 |
set to zero:
vMML = {SUMi=1..n (xi-mu)2}/(n-1) |
This use of a divisor of (n-1), rather than n, is also a "well known" but (there) ad-hoc correction for the bias in vML, however here it is derived in a justified way for MML.
Measurement Accuracy
In the case of continuous distributions, such as Nmu,sigma,
the likelihood function is a probability density function.
To turn it into a genuine probability, it must be
multiplied by the measurement accuracy.
MML v. SMML
MML is an approximation
to strict minimum message length (SMML) inference.
As cautioned elsewhere, if MML's simplifying assumptions
Notes
- C. S. Wallace & D. M. Boulton.
An Information Measure for Classification.
The Computer Journal 11(2) pp.185-194,August 1968. - See also the Special Issue on Clustering and Classification,
The Computer Journal ,F. Murtagh (ed) , 41(8), 1998.