Student's t Distribution for clustering and regression of continuous-valued data by Minimum Message Length (MML) inference

t Distribution

LA home
Computing
MML
Glossary
Continuous
  Normal 1
  Normal 2
  t-distn
  von Mises-Fisher
  von Mises
  Linear1
  t

Probability density function

f(x \| μ, σ, ν) =	Γ ((ν+1)/2)	[ 1 +	(x - μ)²	]	^-(ν+1)/2
	√(πν) Γ(ν/2) σ		ν σ²

Γx is the gamma fn, x>0.
For int n, Γn = (n-1)!
Γn = (n-1).Γ(n-1)

-∞ < x < ∞, -∞ < μ < ∞, σ > 0, ν > 0.

Mean undefined if ν ≤ 1.

Variance = σ² ν / (ν - 2), if ν>2, else not defined. This tends to σ² as ν tends to ∞.

ν is the `degrees of freedom' or the `shape parameter'.

If ν=1 the t-distribution is a Cauchy distribution.

As ν → ∞ the t tends to the normal distribution N(μ,σ); if ν≥30 it is very close to the normal.

¿If ν data are drawn from a normal distribution of unknown σ, N(0,σ), the posterior distribution of the next datum is an infinite weighted-mixture of normal distributions, which is equivalent to a t-distribution with μ=0 and variance scaled by σ²? (There is a little "problem" until you have drawn at least three values (to get the shape), so choosing them amounts to setting the prior.) Discovered by W.S.Gosset c1908 writing under the name Student.

Note that we can slightly rearrange f() to f(x | μ, σ, ν) = Γ((ν+1)/2) ν^ν/2 σ^ν √π Γ(ν/2)) {νσ² + (x - μ)²}^(ν+1)/2 Three expectations are useful later e1 = E_x{ 1 / (νσ² + (x-μ)²) } e2 = E_x{ 1 / (νσ² + (x-μ)²)² } e3 = E_x{ (x-μ)² / (νσ² + (x-μ)²)² } [Now], _-∞∫^+∞ 1 / (a+x²)^k = √π.Γ(k - ¹/₂) / {a^k-1/2.Γk} (thanks DS) so _-∞∫^+∞ 1 / (νσ² + (x-μ)²)^(ν+3)/2, (use, a=νσ², k=(ν+3)/2) = √π.Γ(ν/2+1) / {(νσ²)^ν/2+1 Γ((ν+1)/2+1)} so e1 = {Γ((ν+1)/2).ν^ν/2.σ^ν / (√π.Γ(ν/2))} . {√π.Γ(ν/2+1) / ( ν^ν/2+1.σ^ν+2.Γ((ν+1)/2+1) )} = {√π.ν.ν^ν/2.σ^ν} / {√π.(ν+1).ν^ν/2+1.σ^ν+2} e1 = 1 / ((ν+1).σ²) Similarly _-∞∫^+∞ 1 / (νσ² + (x-μ)²)^(ν+5)/2, (use a=νσ², k=(ν+5)/2) = √π.Γ(ν/2+2 ) / {(νσ²)^ν/2+2 Γ((ν+1)/2+2)} so e2 = {ν.(ν+2).ν^ν/2.σ^ν} / {(ν+1).(ν+3).ν^ν/2+2.σ^ν+4} e2 = (ν+2) / {ν.(ν+1).(ν+3).σ⁴} [Now], _-∞∫^+∞ x² / (a+x²)^k = √π.Γ(k-³/₂) / {2.a^k-3/2.Γk} so _-∞∫^+∞ (x-μ)² / {νσ² + (x-μ)²}^(ν+5)/2, (use a=νσ², k=(ν+5)/2) = √π.Γ(ν/2+1) / {2.(νσ²)^ν/2+1.Γ((ν+1)/2+2)} so e3 e3 = 1 / {(ν+1).(ν+3).σ²}

Given n continuous-valued data x₁, x₂, .., x_n, the negative log likelihood,

L = n { (1/2)log(πν) + log(Γ(ν/2)) - log(Γ((ν+1)/2)) + logσ } + ((ν+1)/2) ∑_i log(1 + (x_i-μ)²/νσ²)

= n { (1/2)log π + log(Γ(ν/2)) - log(Γ((ν+1)/2)) - (ν/2) log ν - ν log σ } + ((ν+1)/2) ∑_i log(νσ² + (x_i-μ)²)

1st derivatives of L

d L / d μ

= - (ν+1) ∑_i{ (x_i-μ) / (νσ² + (x_i-μ)²) }

d L / d σ

= - nν/σ + ν(ν+1)σ ∑_i{ 1 / (νσ² + (x_i-μ)²) }

digamma
ψx = d/dx log(Γx)
= Γ'x/Γx, and
ψ₁x = d/dx ψ(x)

d L / d ν

= n { (1/2)ψ(ν/2) - (1/2)ψ((ν+1)/2) - 1/2 - (1/2) log ν - log σ } + (1/2) ∑_i log(νσ² + (x_i-μ)²) + ((ν+1)σ²/2) ∑_i { 1 / (νσ² + (x_i-μ)²) }

2nd derivatives

d² L / d μ²

= (ν+1) ∑_i { 1 / (νσ² + (x_i-μ)²) - 2 (x_i-μ)² / (νσ² + (x_i-μ)²)² }

using the results for e1, e2 & e3, above, expectation

= n (ν+1){e1 - 2.e3}

= n (ν+1){ 1/((ν+1)σ²) - 2 / ((ν+1).(ν+3).σ²) }

= n.{ 1 - 2/(ν+3) } / σ²

= n.(ν+1) / {(ν+3).σ²}

d² L / d σ²

= nν/σ² + ν(ν+1) ∑_i { 1 / (νσ² + (x_i-μ)²) - 2νσ² / (νσ² + (x_i-μ)²)² }

expectation

= nν/σ² + n.ν(ν+1){e1 - 2νσ²e2}

= ... + n.ν(ν+1){ 1/((ν+1)σ²) - 2νσ²(ν+2)/{(ν+1)(ν+3)νσ⁴} }

= ... + n.ν(ν+3 - 2(ν+2)) / ((ν+3)σ²)

= nν/σ² - n.ν(ν+1) / ((ν+3)σ²)

= 2.n.ν / ((ν+3)σ²)

Sanity check: When ν is large (30+), the t tends to N(μ,σ) and the product of the expected 2nd derivatives w.r.t. μ and σ tends to 2n²/σ⁴ which is the [normal's Fisher], when that is done w.r.t. μ and σ.

d² L / d ν²

= n{ (1/4)ψ₁(ν/2) - (1/4)ψ₁((ν+1)/2) - 1/(2ν) } + (σ²/2) ∑_i{ 1 / (νσ²+(x_i-μ)²) } + (σ²/2) ∑_i{ 1 / (νσ²+(x_i-μ)²) } - ((ν+1)σ⁴/2) ∑_i{ 1 / (νσ²+(x_i-μ)²)² }

= n{ (1/4)ψ₁(ν/2) - (1/4)ψ₁((ν+1)/2) - 1/(2ν) } + σ² ∑_i{ 1 / (νσ²+(x_i-μ)²) } - ((ν+1)σ⁴/2) ∑_i{ 1 / (νσ²+(x_i-μ)²)² }

expectation

= n{ (1/4)ψ₁(ν/2) - (1/4)ψ₁((ν+1)/2) - 1/(2ν) + σ²e1 - ((ν+1)σ⁴/2)e2 }

= n.{ ...ψ₁... - 1/ν + 2σ²/((ν+1)σ²) - (ν+1)σ⁴(ν+2) / {(ν+1)(ν+3)νσ⁴} } / 2

= n.{ ...ψ₁... - 1 + 2ν/(ν+1) - (ν+2)/(ν+3) } / (2ν)

= n{ (1/4)ψ₁(ν/2) - (1/4)ψ₁((ν+1)/2) - { (ν+5) / (2ν(ν+1)(ν+3)) } }

Off-diagonal 2nd derivatives

d² L / d μ d σ = d² L / d σ d μ

= 2ν(ν+1)σ ∑_i{ (x_i-μ) / (νσ² + (x_i-μ)²)² }

expectation = 0 (which is what you would hope) because it is an "odd" function about μ (i.e. g(μ+z) = - g(μ-z)).

d² L / d μ d ν = d² L / d ν d μ

= (ν+1)σ² ∑{ (x_i-μ) / (νσ² + (x_i-μ)²)² } - ∑{ (x_i-μ) / (νσ² + (x_i-μ)²) }

expectation = 0 because of the two "odd" functions.

d² L / d ν d σ = d² L / d σ d ν

= - n/σ + (2ν+1)σ ∑{ 1 / (νσ² + (x_i-μ)²) } - ν(ν+1)σ³ ∑{ 1 / (νσ² + (x_i-μ)²)² }

expectation = - n/σ + n.(2ν+1)σ.e1 - n.ν(ν+1)σ³e2

= n{ - 1/σ + (2ν+1)σ / ((ν+1)σ²) - ν(ν+1)(ν+2)σ³) / ((ν+1)(ν+3)νσ⁴) }

= (n/σ){ - 1 + (2ν+1)/(ν+1) - (ν+2)/(ν+3) }

= (n/σ){ - 2 / (ν+1)(ν+3) }

= - 2.n / (σ (ν+1) (ν+3))

Fisher

	μ	σ	ν
μ	E d²L/dμ²	0	0
σ	0	E d²L/dσ²	E d²L/dσdν
ν	0	E d²L/dσdν	E d²L/dν²

(ν+1) / {(ν+3)σ²}	0	0
0	2ν / {(ν+3)σ²}	- 2 / {(ν+1)(ν+3)σ}
0	= NE neighbour	(1/4)ψ₁(ν/2) - (1/4)ψ₁((ν+1)/2) - (ν+5) / {(2ν(ν+1)(ν+3)}

×n³

= F₁₁ . (F₂₂.F₃₃ - F₂₃²)

n³.(ν+1)	{	2ν	.{	ψ₁...-ψ₁...	-	ν+5	} -	4	}
(ν+3)σ²		(ν+3)σ²		4		2ν(ν+1)(ν+3)		((ν+1)(ν+3)σ)²

... including a step (ν+5)(ν+1)+4 -> ν²+6ν+9 -> (ν+3)² ...

= (n³/σ⁴) . { ν.(ν+1) / (2.(ν+3)²) . { ψ₁(ν/2) - ψ₁((ν+1)/2) } - 1 / ((ν+1).(ν+3)) }

-- LA, July 2007

confirming the equation presented (without working) by Yudi [Agu02].

Note, +log Fisher = 3.log(n) - 4.log σ + log(expression(ν))

Message length

m = - log(h(μ, σ, ν)) + L + 1/2 log F + (d/2)(1 + log κ_d), (d=3 parameters)

See [IP 1.2] for an implementation of Student's t-distribution.

www #ad:

The C++ Cookbook

↑ © L. Allison, www.allisons.org/ll/ (or as otherwise indicated).
Created with "vi (Linux)", charset=iso-8859-1, fetched Thursday, 02-May-2024 01:17:32 UTC.

Free: Linux, Ubuntu operating-sys, OpenOffice office-suite, The GIMP ~photoshop, Firefox web-browser, FlashBlock flash on/off.