Normal, Gaussian
KL-distance from Nμ1,σ1 to Nμ2,σ2
(Also known as KL-divergence.)- The general form is
- ∫x { pdf1(x).{ log(pdf1(x)) - log(pdf2(x)) }}
-
- we have two normals so pdf1(x) is Nμ1,σ1(x), etc..
- we have two normals so pdf1(x) is Nμ1,σ1(x), etc..
- = ∫x Nμ1,σ1(x).{ log(Nμ1,σ1(x)) - log(Nμ2,σ2(x)) }
- = ∫x Nμ1,σ1(x).{ (1/2)( - ((x-μ1)/σ1)2 + ((x-μ2)/σ2)2 ) + ln(σ2/σ1) }
-
- can replace x with x+μ1. The expected value of x2 is σ12. Terms that are odd in x, and otherwise symmetric about zero, cancel out over [-∞,∞] leaving the ...x2 and ...constant terms.
- can replace x with x+μ1. The expected value of x2 is σ12. Terms that are odd in x, and otherwise symmetric about zero, cancel out over [-∞,∞] leaving the ...x2 and ...constant terms.
- = (1/2){ - (σ1/σ1)2 + (σ1/σ2)2 + ((μ1-μ2)/σ2)2 } + ln(σ2/σ1)
- = { (μ1 - μ2)2 + σ12 - σ22 } / (2.σ22) + ln(σ2/σ1)
- This is zero if μ1=μ2 and σ1=σ2. It obviously increases with |μ1-μ2| and has rather complex behaviour with σ1 and σ2 (and is consistent P&R, and with J&S where σ1=σ2).
- KL(N(μq,σq) ||
N(μp,σp)), p.18 of
Penny & Roberts, PARG-00-12, 2000.
- KL(N(μ1,σ), N(μ2,σ)) = (μ1-μ2)2/(2σ2), Johnson & Sinanovic, NB. a common σ [...] .
- Note that the distance is convenient to integrate over, say, a range of μ1 & σ1:
∫ μ1max ∫ σ1max μ1min σ1min (μ1 - μ2)2
2σ22 + ln σ2 - 1/2 + σ12
2σ22 - ln σ1
NB. no σ1 here ...
... & no μ1- Note that the distance is convenient to integrate over, say, a range of μ1 & σ1:
-
let f(μ1) = (μ1 - μ2)3
6σ22 + μ1 . (ln σ2 - 1/2) and g(σ1) = σ13 6σ22 - σ1 . (ln σ1 - 1) - = (f(μ1max) - f(μ1min)) . (σ1max - σ1min) + (μ1max - μ1min) . (g(σ1max) - g(σ1min))