Normal

Normal, Gaussian

KL-distance from N_μ₁,σ₁ to N_μ₂,σ₂

(Also known as KL-divergence.)

The general form is

∫_x { pdf₁(x).{ log(pdf₁(x)) - log(pdf₂(x)) }}

we have two normals so pdf₁(x) is N_μ₁,σ₁(x), etc..

= ∫_x N_μ₁,σ₁(x).{ log(N_μ₁,σ₁(x)) - log(N_μ₂,σ₂(x)) }

= ∫_x N_μ₁,σ₁(x).{ (1/2)( - ((x-μ₁)/σ₁)² + ((x-μ₂)/σ₂)² ) + ln(σ₂/σ₁) }

can replace x with x+μ₁. The expected value of x² is σ₁². Terms that are odd in x, and otherwise symmetric about zero, cancel out over [-∞,∞] leaving the ...x² and ...constant terms.

= (1/2){ - (σ₁/σ₁)² + (σ₁/σ₂)² + ((μ₁-μ₂)/σ₂)² } + ln(σ₂/σ₁)

= { (μ₁ - μ₂)² + σ₁² - σ₂² } / (2.σ₂²) + ln(σ₂/σ₁)

This is zero if μ₁=μ₂ and σ₁=σ₂. It obviously increases with |μ₁-μ₂| and has rather complex behaviour with σ₁ and σ₂ (and is consistent P&R, and with J&S where σ₁=σ₂).

KL(N(μ_q,σ_q) || N(μ_p,σ_p)), p.18 of Penny & Roberts, PARG-00-12, 2000.

KL(N(μ₁,σ), N(μ₂,σ)) = (μ₁-μ₂)²/(2σ²), Johnson & Sinanovic, NB. a common σ [...] .

Note that the distance is convenient to integrate over, say, a range of μ₁ & σ₁:

∫ μ_1max ∫ σ_1max μ_1min σ_1min (μ₁ - μ₂)² 2σ₂² + ln σ₂ - 1/2 + σ₁² 2σ₂² - ln σ₁ NB. no σ₁ here ... ... & no μ₁

let

f(μ₁) =

(μ₁ - μ₂)³

6σ₂²

+ μ₁ . (ln σ₂ - 1/2)

and

g(σ₁) =

σ₁³

6σ₂²

- σ₁ . (ln σ₁ - 1)

= (f(μ_1max) - f(μ_1min)) . (σ_1max - σ_1min) + (μ_1max - μ_1min) . (g(σ_1max) - g(σ_1min))

KL-distance from Nμ1,σ1 to Nμ2,σ2

KL-distance from N_μ₁,σ₁ to N_μ₂,σ₂