On MML

The reasons why minimum message length (MML) inference works are quite elementary and were long hidden in plain sight(a), so the fact that it was not in use before 1968 is surprising, and that there is any debate about it is more surprising.

However, that is not to say that to then go and make MML work in useful applications is easy, in fact it can be quite difficult. After the self evident observations above, a lot of hard work on efficient encodings, search algorithms, code books, invariance, Fisher information, fast approximations, robust heuristics, adaptions to specific problems, and all the rest, remained to be done. Fortunately, MML has been made to work in many general and useful applications including, but not limited to, these lists, [1], [2], & [3], and in other areas such as bioinformatics [4], say.

BTW, given Bayes,
pr(H&D) = pr(H).pr(D|H) = pr(D).pr(H|D),
pr(H|D) ~ pr(H) . pr(D|H),
it is sometimes claimed that MML inference is just MAP inference but, in general, this is not the case. MML requires that one sets not just the "height", pdf(H), but also chooses the set of distinguishable(d) hypotheses {H1, H2, ...} and the optimal precison ("width") of each one. (It could be argued that MML is MAP done properly.)
 
-- L.A., 9/2011.
 
MML Reading List:
[W&B], [W&F], [book], [history], & see [CSW].

(a) If you must invent something, the best kind of thing to invent is something that can be "got" easily, but only once it has been described -- the "doh, I could have done that" moment! There were other, somewhat related, theoretical ideas around in the 1960s, but MML arrived with a practical computer program to solve an important inference problem.
(b) (i) ε comes with the data but working out the optimal value of δ may not be easy. (ii) Given multiple continuous parameters, an optimal region of precision is not rectangular in general but its area (volume) is in any case > 0. (iii) Even a discrete parameter may have an optimal precision that is less than what appears to be possible from its discreteness.
(c) If you accept priors.
statisticians
frequentists Bayesians
loss function-ists estimate-ers
MML-ists . . .
(d) Distinguishable given some amount of data.