Linear Regression 1

The simplest linear regression as a case study:

X: →Y: a=
b=
  σ=
y = a.x + b + N(0,σ),  or equivalently N(a.x + b, σ).
 
The probability density of y given x:
f(y)  = 
1
√(2 π) σ
  exp( 
- (y - a.x - b)2
2 σ2
 )
 
Given {(xi, yi)}, i = 1..n, where the xi are "common knowledge", the negative log l.h. =
 
L = (n/2)log(2 π) + (n/2)log(σ2) + (1/(2 σ2))Σ{yi-a.xi-b}2
 
= (n/2)log(2 π) + n.log(σ) + (1/(2 σ2))Σ{yi-a.xi-b}2
 
First partial derivatives...
 
d L / d a = (-1/σ2) Σ{ xi.(yi-a.xi-b) }
 
= (-1/σ2) Σ{ xi.yi-a.xi2-xi.b }
 
d L / d b = (-1/σ2) Σ{yi-a.xi-b}
 
= (-1/σ2) { (Σ yi) - a(Σ xi) - n.b}
 
d L / d σ = n/σ - (1/σ3)Σ{yi-a.xi-b}2
 
  

L is minimized when the line passes through the C of G of the points (which leaves the slope, a). a = Σ xi(yi-b) / Σ xi2, and σ is the sqrt of the residual variance.
 
Second partial derivatives...
 
d2 L / d a2 = (+1/σ2) Σ{ xi2 }
(and remember, the xi are common knowledge)
 
d2 L / d b2 = n/σ2
 
d2 L / d σ2 = - n/σ2 + (3/σ4)Σ{yi-a.xi-b}2
expectation = 2 n / σ2
 
Off-diagonal second partial derivatives...
 
d2 L / d a.d b = (+1/σ2)Σ xi
= n . mean{xi} / σ2
 
d2 L / d a.d σ = (+2/σ3) Σ{ xi.(yi-a.xi-b) }
expectation = 0
 
d2 L / d b.d σ = (+2/σ3)Σ{yi-a.xi-b}
expectation = 0
 
Fisher
  a b σ
a Ey d2 L / d a2 Ey d2 L / d a d b 0
b Ey d2 L / d a d b Ey d2 L / d b2 0
σ 0 0 Ey d2 L / d σ2
 
F = 2 n { n.(Σ xi2) - (n.mean{xi})2 } / σ6
= 2 n3 { (Σ xi2) / n - (mean{xi})2 } / σ6
F
= 2 n3 variance{xi} / σ6

Priors

a = tan θ   where θ is the angular slope.
d a / d θ = 1 / cos2θ = 1 / (1 + a2).
The uniform prior, 1 / π, on θ corresponds to
the prior pr(a) = 1 / (π (1+a2)) on 'a'.
b can be untangled from 'a' by making the C of G the origin.
Then b ~ μ (of the {yi}) in the [normal distribution].
— L.A. @ Dept. Comp. Sci., U. York, 12/2004