Normal, Gaussian

KL-distance from Nμ11 to Nμ22

(Also known as KL-divergence.)
The general form is
 
x { pdf1(x).{ log(pdf1(x)) - log(pdf2(x)) }}
 
we have two normals so pdf1(x) is Nμ11(x), etc..
 
= x Nμ11(x).{ log(Nμ11(x)) - log(Nμ22(x)) }
 
= x Nμ11(x).{ (1/2)( - ((x-μ1)/σ1)2 + ((x-μ2)/σ2)2 ) + ln(σ21) }
 
can replace x with x+μ1. The expected value of x2 is σ12. Terms that are odd in x, and otherwise symmetric about zero, cancel out over [-∞,∞] leaving the ...x2 and ...constant terms.
 
= (1/2){ - (σ11)2 + (σ12)2 + ((μ12)/σ2)2 } + ln(σ21)
 
= { (μ1 - μ2)2 + σ12 - σ22 } / (2.σ22) + ln(σ21)
 
This is zero if μ12 and σ12. It obviously increases with |μ12| and has rather complex behaviour with σ1 and σ2  (and is consistent P&R, and with J&S where σ12).
KL(N(μqq) || N(μpp)), p.18 of Penny & Roberts, PARG-00-12, 2000.
KL(N(μ1,σ), N(μ2,σ)) = (μ12)2/(2σ2), Johnson & Sinanovic, NB. a common σ [...] .

 
Note that the distance is convenient to integrate over, say, a range of μ1 & σ1:
μ1max σ1max   
   
μ1min σ1min
1 - μ2)2
22
+ ln σ2 - 1/2  + 
σ12
22
 - ln σ1
 
NB. no σ1 here ...
 
... & no μ1
 
let  f(μ1) =
1 - μ2)3
22
+ μ1 . (ln σ2 - 1/2)
and  g(σ1) =
σ13
22
  - σ1 . (ln σ1 - 1)
 
= (f(μ1max) - f(μ1min)) . (σ1max - σ1min) + (μ1max - μ1min) . (g(σ1max) - g(σ1min))