Linear Regression

Linear Regression 1

The simplest linear regression as a case study:

y = a.x + b + N(0,σ), or equivalently N(a.x + b, σ).

The probability density of y given x:

f(y) = 1 √(2 π) σ exp( - (y - a.x - b)² 2 σ² )

Given {(x_i, y_i)}, i = 1..n, where the x_i are "common knowledge", the negative log l.h. =

L = (n/2)log(2 π) + (n/2)log(σ²) + (1/(2 σ²))Σ{y_i-a.x_i-b}²

= (n/2)log(2 π) + n.log(σ) + (1/(2 σ²))Σ{y_i-a.x_i-b}²

First partial derivatives...

d L / d a = (-1/σ²) Σ{ x_i.(y_i-a.x_i-b) }

= (-1/σ²) Σ{ x_i.y_i-a.x_i²-x_i.b }

d L / d b = (-1/σ²) Σ{y_i-a.x_i-b}

= (-1/σ²) { (Σ y_i) - a(Σ x_i) - n.b}

d L / d σ = n/σ - (1/σ³)Σ{y_i-a.x_i-b}²

L is minimized when the line passes through the C of G of the points (which leaves the slope, a). a = Σ x_i(y_i-b) / Σ x_i², and σ is the sqrt of the residual variance.

Second partial derivatives...

d² L / d a² = (+1/σ²) Σ{ x_i² }

(and remember, the x_i are common knowledge)

d² L / d b² = n/σ²

d² L / d σ² = - n/σ² + (3/σ⁴)Σ{y_i-a.x_i-b}²

expectation = 2 n / σ²

Off-diagonal second partial derivatives...

d² L / d a.d b = (+1/σ²)Σ x_i

= n . mean{x_i} / σ²

d² L / d a.d σ = (+2/σ³) Σ{ x_i.(y_i-a.x_i-b) }

expectation = 0

d² L / d b.d σ = (+2/σ³)Σ{y_i-a.x_i-b}

expectation = 0

Fisher

	a	b	σ
a	E_y d² L / d a²	E_y d² L / d a d b	0
b	E_y d² L / d a d b	E_y d² L / d b²	0
σ	0	0	E_y d² L / d σ²

F = 2 n { n.(Σ x_i²) - (n.mean{x_i})² } / σ⁶

= 2 n³ { (Σ x_i²) / n - (mean{x_i})² } / σ⁶

= 2 n³ variance{x_i} / σ⁶

Priors

a = tan θ where θ is the angular slope.: d a / d θ = 1 / cos²θ = 1 / (1 + a²).; The uniform prior, 1 / π, on θ corresponds to; the prior pr(a) = 1 / (π (1+a²)) on 'a'.
b can be untangled from 'a' by making the C of G the origin.: Then b ~ μ (of the {y_i}) in the [normal distribution].

— L.A. @ Dept. Comp. Sci., U. York, 12/2004

X:	-9 -8 -6 -3 -2 1 2 4 6 7	→Y:	-3 -7 -5 -3 -1 0 1 4 5 8	a=
				b=
				σ=