Linear Regression 1
- The simplest linear regression as a case study:
-
- y = a.x + b + N(0,σ), or equivalently N(a.x + b, σ).
- The probability density of y given x:
f(y) = 1
√(2 π) σ exp( - (y - a.x - b)2
2 σ2 ) - Given {(xi, yi)}, i = 1..n, where the xi are "common knowledge", the negative log l.h. =
- L = (n/2)log(2 π) + (n/2)log(σ2) + (1/(2 σ2))Σ{yi-a.xi-b}2
-
- = (n/2)log(2 π) + n.log(σ) + (1/(2 σ2))Σ{yi-a.xi-b}2
- = (n/2)log(2 π) + n.log(σ) + (1/(2 σ2))Σ{yi-a.xi-b}2
- First partial derivatives...
- d L / d a = (-1/σ2) Σ{ xi.(yi-a.xi-b) }
-
- = (-1/σ2) Σ{ xi.yi-a.xi2-xi.b }
- = (-1/σ2) Σ{ xi.yi-a.xi2-xi.b }
- d L / d b = (-1/σ2) Σ{yi-a.xi-b}
-
- = (-1/σ2) { (Σ yi) - a(Σ xi) - n.b}
- = (-1/σ2) { (Σ yi) - a(Σ xi) - n.b}
- d L / d σ = n/σ - (1/σ3)Σ{yi-a.xi-b}2
-
- L is minimized when the line passes through the C of G of the points (which leaves the slope, a). a = Σ xi(yi-b) / Σ xi2, and σ is the sqrt of the residual variance.
- Second partial derivatives...
- d2 L / d a2 = (+1/σ2) Σ{ xi2 }
- (and remember, the xi are common knowledge)
- d2 L / d b2 = n/σ2
- d2 L / d σ2 = - n/σ2 + (3/σ4)Σ{yi-a.xi-b}2
- expectation = 2 n / σ2
- Off-diagonal second partial derivatives...
- d2 L / d a.d b = (+1/σ2)Σ xi
- = n . mean{xi} / σ2
- d2 L / d a.d σ = (+2/σ3) Σ{ xi.(yi-a.xi-b) }
- expectation = 0
- d2 L / d b.d σ = (+2/σ3)Σ{yi-a.xi-b}
- expectation = 0
- Fisher
-
a b σ a Ey d2 L / d a2 Ey d2 L / d a d b 0 b Ey d2 L / d a d b Ey d2 L / d b2 0 σ 0 0 Ey d2 L / d σ2 - F = 2 n { n.(Σ xi2) - (n.mean{xi})2 } / σ6
- = 2 n3 { (Σ xi2) / n
- (mean{xi})2 } / σ6
F - = 2 n3 variance{xi} / σ6
Priors
- a = tan θ where θ is the angular slope.
- d a / d θ
= 1 / cos2θ
= 1 / (1 + a2).
- The uniform prior, 1 / π, on θ corresponds to
- the prior pr(a) = 1 / (π (1+a2)) on 'a'.
- The uniform prior, 1 / π, on θ corresponds to
- b can be untangled from 'a' by making the C of G the origin.
- Then b ~ μ (of the {yi}) in the [normal distribution].
— L.A. @ Dept. Comp. Sci., U. York, 12/2004