Least Squares

Least squares, e.g., for linear regression.
 
Model
X W ~ Y,
X W + E = Y,
 
x1,1, ..., x1,K
..., ...,  ...
..., ...,  ...
xN,1, ..., xN,K
w1
...
wK
+
e1
...
...
eN
=
y1
...
...
yN
 
N > K, we hope.
 
Problem: Given X and Y, find weights, W, so as to minimise the sum of the squared errors.
 
Errors
e1
...
...
eN
=
y1 - ∑k x1,kwk
...
...
yN - ∑k xN,kwk
 
Squared errors
e12
...
...
eN2
=
y12 - {2y1k x1,kwk} + {∑k x1,kwk}2
...
...
yN2 - {2yNk xN,kwk} + {∑k xN,kwk}2
The sum of the squared errors (a scalar) is S = ∑n en2.
 
Differentiate S wrt wm, 1≤m≤K, and set to zero
d S / d wm
= - 2 {∑n yn xn,m} + 2 {∑n {∑k xn,k wk} xn,m}
= - 2 {∑n xTm,n yn} + 2 {∑n xTm,n {∑k xn,k wk}},     ∀ m = 1, ..., K
= 0,
i.e.,
XT Y = (XT X) W,       where T is transpose,
W = (XT X)-1 XT Y,     if XT X is invertible.
 
(Note that X is not square in general; do not be tempted to write W=X-1Y, but XTX is square with shape K×K.)