Least Squares

Least squares, e.g., for linear regression.

Model

X W ~ Y,

X W + E = Y,

w₁

...

w_K

e₁

...

e_N

y₁

...

y_N

N > K, we hope.

Problem: Given X and Y, find weights, W, so as to minimise the sum of the squared errors.

Errors

e₁

...

e_N

y₁ - ∑_k x_1,kw_k

...

y_N - ∑_k x_N,kw_k

Squared errors

e₁²

...

e_N²

y₁² - {2y₁ ∑_k x_1,kw_k} + {∑_k x_1,kw_k}²

...

y_N² - {2y_N ∑_k x_N,kw_k} + {∑_k x_N,kw_k}²

The sum of the squared errors (a scalar) is S = ∑_n e_n².

Differentiate S wrt w_m, 1≤m≤K, and set to zero

d S / d w_m

= - 2 {∑_n y_n x_n,m} + 2 {∑_n {∑_k x_n,k w_k} x_n,m}

= - 2 {∑_n x^T_m,n y_n} + 2 {∑_n x^T_m,n {∑_k x_n,k w_k}}, ∀ m = 1, ..., K

= 0,

i.e.,

X^T Y = (X^T X) W, where ^T is transpose,

W = (X^T X)^-1 X^T Y, if X^T X is invertible.

(Note that X is not square in general; do not be tempted to write W=X^-1Y, but X^TX is square with shape K×K.)