|
- Probability density function
-
f(x | μ, σ, ν) = |
Γ ((ν+1)/2) |
[ 1 + |
(x - μ)2 |
] |
-(ν+1)/2 |
√(πν) Γ(ν/2) σ |
ν σ2 |
-
Γx is the
gamma fn,
x>0.
For int n, Γn = (n-1)!
Γn = (n-1).Γ(n-1)
|
- -∞ < x < ∞,
-∞ < μ < ∞,
σ > 0,
ν > 0.
- Mean undefined if ν ≤ 1.
- Variance = σ2 ν / (ν - 2),
if ν>2, else not defined.
This tends to σ2 as
ν tends to ∞.
-
- ν is the `degrees of freedom' or the `shape parameter'.
- If ν=1 the t-distribution is a Cauchy distribution.
- As ν → ∞ the t tends to
the normal distribution N(μ,σ);
if ν≥30 it is very close to the normal.
-
- ¿If ν data are drawn from a normal distribution
of unknown σ, N(0,σ), the posterior distribution of
the next datum is an infinite weighted-mixture of normal distributions,
which is equivalent to a t-distribution with μ=0 and
variance scaled by σ2?
(There is a little "problem" until you have drawn
at least three values (to get the shape),
so choosing them amounts to setting the prior.)
Discovered by W.S.Gosset c1908
writing under the name Student.
-
-
- Note that we can slightly rearrange f( ) to
-
f(x | μ, σ, ν)
|
= |
Γ((ν+1)/2)
|
νν/2 σν
|
√π Γ(ν/2)) |
{νσ2 + (x - μ)2}(ν+1)/2
|
-
- Three expectations are useful later
- e1 =
Ex{ 1 / (νσ2 + (x-μ)2) }
- e2 =
Ex{ 1 / (νσ2
+ (x-μ)2)2 }
- e3 =
Ex{ (x-μ)2
/ (νσ2
+ (x-μ)2)2 }
-
- [Now],
-∞∫+∞
1 / (a+x2)k
= √π.Γ(k - 1/2)
/ {ak-1/2.Γk}
(thanks DS)
- so
- -∞∫+∞
1 / (νσ2 + (x-μ)2)(ν+3)/2,
(use, a=νσ2, k=(ν+3)/2)
- = √π.Γ(ν/2+1)
/ {(νσ2)ν/2+1 Γ((ν+1)/2+1)}
- so
- e1 =
{Γ((ν+1)/2).νν/2.σν
/ (√π.Γ(ν/2))}
. {√π.Γ(ν/2+1)
/ ( νν/2+1.σν+2.Γ((ν+1)/2+1) )}
- = {√π.ν.νν/2.σν}
/ {√π.(ν+1).νν/2+1.σν+2}
- = 1 / ((ν+1).σ2)
- Similarly
- -∞∫+∞
1 / (νσ2 + (x-μ)2)(ν+5)/2,
(use a=νσ2, k=(ν+5)/2)
- = √π.Γ(ν/2+2 )
/ {(νσ2)ν/2+2 Γ((ν+1)/2+2)}
- so
- e2 =
{ν.(ν+2).νν/2.σν}
/ {(ν+1).(ν+3).νν/2+2.σν+4}
- = (ν+2) / {ν.(ν+1).(ν+3).σ4}
-
- [Now],
-∞∫+∞
x2 / (a+x2)k
= √π.Γ(k-3/2)
/ {2.ak-3/2.Γk}
- so
- -∞∫+∞
(x-μ)2
/ {νσ2 + (x-μ)2}(ν+5)/2,
(use a=νσ2, k=(ν+5)/2)
- = √π.Γ(ν/2+1)
/ {2.(νσ2)ν/2+1.Γ((ν+1)/2+2)}
- so
- e3 = 1 / {(ν+1).(ν+3).σ2}
|
-
- Given n continuous-valued data
x1, x2, .., xn,
the negative log likelihood,
- L =
n { (1/2)log(πν)
+ log(Γ(ν/2))
- log(Γ((ν+1)/2))
+ logσ }
+ ((ν+1)/2)
∑i
log(1 + (xi-μ)2/νσ2)
- =
n { (1/2)log π
+ log(Γ(ν/2))
- log(Γ((ν+1)/2))
- (ν/2) log ν
- ν log σ }
+ ((ν+1)/2)
∑i
log(νσ2 +
(xi-μ)2)
-
- 1st derivatives of L
-
- d L / d μ
- = - (ν+1) ∑i{
(xi-μ)
/ (νσ2 + (xi-μ)2) }
- d L / d σ
- = - nν/σ
+ ν(ν+1)σ ∑i{
1 / (νσ2 + (xi-μ)2) }
-
digamma
ψx = d/dx log(Γx)
= Γ'x/Γx, and
ψ1x = d/dx ψ(x)
|
- d L / d ν
- = n {
(1/2)ψ(ν/2)
- (1/2)ψ((ν+1)/2)
- 1/2 - (1/2) log ν
- log σ }
+ (1/2) ∑i
log(νσ2 +
(xi-μ)2)
+ ((ν+1)σ2/2)
∑i
{ 1 / (νσ2 + (xi-μ)2) }
-
- 2nd derivatives
-
- d2 L / d μ2
- = (ν+1) ∑i {
1 / (νσ2
+ (xi-μ)2)
- 2 (xi-μ)2 /
(νσ2
+ (xi-μ)2)2 }
- using the results for e1, e2 & e3, above, expectation
- = n (ν+1){e1 - 2.e3}
- = n (ν+1){
1/((ν+1)σ2)
- 2 / ((ν+1).(ν+3).σ2) }
- = n.{ 1 - 2/(ν+3) } / σ2
- = n.(ν+1) / {(ν+3).σ2}
-
- d2 L / d σ2
- = nν/σ2
+ ν(ν+1) ∑i {
1 / (νσ2 + (xi-μ)2)
- 2νσ2
/ (νσ2
+ (xi-μ)2)2 }
- expectation
- = nν/σ2
+ n.ν(ν+1){e1 - 2νσ2e2}
- = ...
+ n.ν(ν+1){ 1/((ν+1)σ2)
- 2νσ2(ν+2)/{(ν+1)(ν+3)νσ4} }
- = ... + n.ν(ν+3 - 2(ν+2)) / ((ν+3)σ2)
- = nν/σ2
- n.ν(ν+1) / ((ν+3)σ2)
- = 2.n.ν / ((ν+3)σ2)
-
- Sanity check:
When ν is large (30+), the t tends to N(μ,σ)
and the product of the expected 2nd derivatives w.r.t. μ and σ
tends to 2n2/σ4 which is the
[normal's Fisher],
when that is done w.r.t. μ and σ.
-
- d2 L / d ν2
- = n{
(1/4)ψ1(ν/2)
- (1/4)ψ1((ν+1)/2)
- 1/(2ν) }
+ (σ2/2) ∑i{
1 / (νσ2+(xi-μ)2) }
+ (σ2/2) ∑i{
1 / (νσ2+(xi-μ)2) }
- ((ν+1)σ4/2) ∑i{
1 / (νσ2+(xi-μ)2)2 }
- =
n{ (1/4)ψ1(ν/2)
- (1/4)ψ1((ν+1)/2)
- 1/(2ν) }
+ σ2 ∑i{
1 / (νσ2+(xi-μ)2) }
- ((ν+1)σ4/2) ∑i{
1 / (νσ2+(xi-μ)2)2 }
- expectation
- = n{ (1/4)ψ1(ν/2)
- (1/4)ψ1((ν+1)/2)
- 1/(2ν)
+ σ2e1
- ((ν+1)σ4/2)e2 }
- = n.{ ...ψ1...
- 1/ν + 2σ2/((ν+1)σ2)
- (ν+1)σ4(ν+2)
/ {(ν+1)(ν+3)νσ4} } / 2
- = n.{ ...ψ1...
- 1 + 2ν/(ν+1) - (ν+2)/(ν+3) } / (2ν)
- = n{ (1/4)ψ1(ν/2)
- (1/4)ψ1((ν+1)/2)
- { (ν+5) / (2ν(ν+1)(ν+3)) } }
-
- Off-diagonal 2nd derivatives
-
- d2 L / d μ d σ = d2 L / d σ d μ
- =
2ν(ν+1)σ ∑i{
(xi-μ)
/ (νσ2
+ (xi-μ)2)2 }
- expectation = 0 (which is what you would hope)
because it is an "odd" function about μ
(i.e. g(μ+z) = - g(μ-z)).
-
- d2 L / d μ d ν = d2 L / d ν d μ
- =
(ν+1)σ2 ∑{
(xi-μ)
/ (νσ2 + (xi-μ)2)2 }
- ∑{ (xi-μ)
/ (νσ2 + (xi-μ)2) }
- expectation = 0 because of the two "odd" functions.
-
- d2 L / d ν d σ = d2 L / d σ d ν
- = - n/σ
+ (2ν+1)σ ∑{
1 / (νσ2
+ (xi-μ)2) }
- ν(ν+1)σ3 ∑{
1 / (νσ2
+ (xi-μ)2)2 }
- expectation
= - n/σ
+ n.(2ν+1)σ.e1 - n.ν(ν+1)σ3e2
- = n{
- 1/σ
+ (2ν+1)σ / ((ν+1)σ2)
- ν(ν+1)(ν+2)σ3)
/ ((ν+1)(ν+3)νσ4) }
- = (n/σ){ - 1 + (2ν+1)/(ν+1) - (ν+2)/(ν+3) }
- = (n/σ){ - 2 / (ν+1)(ν+3) }
- = - 2.n / (σ (ν+1) (ν+3))
-
- Fisher
-
| μ | σ | ν |
μ |
E d2L/dμ2 |
0 |
0 |
---|
σ |
0 |
E d2L/dσ2 |
E d2L/dσdν |
---|
ν |
0 |
E d2L/dσdν |
E d2L/dν2 |
-
-
=
|
(ν+1) / {(ν+3)σ2} |
0 |
0 |
0 |
2ν / {(ν+3)σ2} |
- 2 / {(ν+1)(ν+3)σ} |
0 |
= NE neighbour |
(1/4)ψ1(ν/2)
- (1/4)ψ1((ν+1)/2)
- (ν+5) / {(2ν(ν+1)(ν+3)} |
|
×n3 |
-
- = F11 . (F22.F33
- F232)
-
-
n3.(ν+1) |
{ |
2ν |
.{ |
ψ1...-ψ1...
|
- |
ν+5 |
} - |
4 |
} |
(ν+3)σ2 |
(ν+3)σ2 |
4 |
2ν(ν+1)(ν+3) |
((ν+1)(ν+3)σ)2 |
-
- ... including a step
(ν+5)(ν+1)+4 ->
ν2+6ν+9 ->
(ν+3)2 ...
-
- =
(n3/σ4) . {
ν.(ν+1) / (2.(ν+3)2) . {
ψ1(ν/2) - ψ1((ν+1)/2) }
- 1 / ((ν+1).(ν+3)) }
-- LA, July 2007
- confirming
the equation presented (without working)
by Yudi [Agu02].
-
- Note,
+log Fisher
= 3.log(n) - 4.log σ + log(expression(ν))
-
- Message length
- m = - log(h(μ, σ, ν)) + L
+ 1/2 log F
+ (d/2)(1 + log κd),
(d=3
parameters)
See [IP 1.2] for
an implementation of Student's t-distribution.
|
|