Approximations to MML
The general form for MML ("MML87") depends on the determinant of the Fisher information matrix.
 MML87: For model parameter(s) θ, prior h(θ), dataspace X, data x, and likelihood_{ }function f(xθ),
 θ = <θ_{1}, ..., θ_{n}>,
 F(x, θ)_{ij} = d^{2}/dθ_{i} dθ_{j} {  ln f(xθ) },
 F(θ) = ∑_{x:X}{ f(xθ).F(x,θ) }  i.e., expectation,
 then
 msgLen = m_{model} + m_{data} where
 m_{model} =  ln(h θ) + (1/2)ln F θ + (n/2)ln k_{n} nits,
 m_{data} =  ln f(xθ) + n/2 nits.
 Note, k_{1} = 1/12 = 0.083333, k_{2} = 0.080188, k_{3} = 0.078543, k_{4} = 0.076603, k_{5} = 0.075625, k_{6} = 0.074244, k_{7} = 0.073116, k_{8} = 0.071682, and k_{n}>1/(2πe) = 0.0585498 as n>∞ [Conway & Sloane '88].
 (MML87 requires that f(xθ) varies little over the data measurement accuracy region and that h(θ) varies little over the parameter uncertainty region.)
Sometimes the Maths for the Fisher is not tractable. It may be possible to transform the problem so that it becomes easier (e.g. as in the use of orthonormal basis functions for polynomial fitting) which is acceptable because MML is invariant. Failing that, the remaining options include:
 simplifying assumptions,
 numerical approximations,
 empirical Fisher.
Gradient^{2}
CSW, csse tea room, 22/5/'01: We take the gradient (a vector), G, of the loglikelihood function and form the matrix GG', i.e. the outer product. This will transform like the square of a density. Assuming that the data are i.i.d., we can then sum over all the observed data to get γ = ∑_{k=1..N} (GG'). This, again, will transform like the square of a density. So, it can be used as an approximation to the expected Fisher information, as γ will be invariant.
The downside is that we need the amount of data, N, to be at least as large as the number of parameters to be estimated. If not, then the matrix γ will be singular.
This approximation has been used in some versions of SNOB.
 Probability, pr(xθ), nlpr(xθ) =  log pr(xθ).
 Given data x_{1}, x_{2}, ..., x_{n},
 negative log likelihood, L,
 L = ∑_{i} nlpr(x_{i}θ) = ∑_{i}{  log pr(x_{i}θ) }
 1st derivative of L wrt θ:
 dL/dθ = ∑_{i} nlpr'(x_{i}θ) = ∑_{i}{  (d/dθ pr(x_{i}θ)) / pr(x_{i}θ) }
 2nd derivative of L wrt θ:
 d^{2}L/dθ^{2} = ∑_{i}{  (d^{2}/dθ^{2} pr(x_{i}θ)) / pr(x_{i}θ) + {(d/dθ pr(x_{i}θ)) / pr(x_{i}θ)}^{2} }
 ~ ∑_{i}{ (d/dθ pr(x_{i}θ)) / pr(x_{i}θ) }^{2}
 = ∑_{i}{nlpr'(x_{i}θ)}^{2}
 assuming that ∑_{i}{  (d^{2}/dθ^{2} pr(x_{i}θ)) / pr(x_{i}θ) } is small; note that the expected value is
 E_{x}  (d^{2}/dθ^{2} pr(xθ)) / pr(xθ)
 = ∫  (d^{2}/dθ^{2} pr(xθ) / pr(xθ) . p(xθ) dx
 = ∫  d^{2}/dθ^{2} pr(xθ) dx
 = d^{2}/dθ^{2} ∫  pr(xθ) dx unless pr is pathological
 = d^{2}/dθ^{2} 1  !
 = 0
 If θ = <θ_{1}, ..., θ_{k}>, the 2nd derivative becomes the matrix of 2nd derivatives d^{2}L/dθ_{i}θ_{j}, nlpr' becomes grad pr (may also see the Jacobian, J), and the { }^{2} becomes the outer product.
Empirical Fisher
The Fisher information matrix contains expected 2nd derivatives of the log likelihood function with respect to the model parameters. It is possible to estimate these 2nd derivatives, given the data, by perturbing the parameters, individually and in pairs, by small amounts and calculating the changes in the likelihood. This computation is feasible for quite large numbers of parameters.
Unfortunately the resulting matrix is not guaranteed to be positive definite. The gradient^{2} method described above does not have this (possible) problem.
The empirical Fisher is also not invariant.
MMLD
MsgLen ~ log{ ∫_{R} h(θ) dθ }   ∫_{R} h(θ) log f(xθ) dθ

∫_{R} h(θ) dθ 
 E. Lam, Improved approximations to MML, Honours thesis, 2000, CSSE, Monash University, Australia.
 C. S. Wallace. Statistical and Inductive Inference by Minimum Message Length, Springer Verlag, 2005.