Adaptive Code

The adaptive code for a multinomial: Given a sequence of N data, [x₁, ..., x_N], each datum having one of k possible values, {v₁, ..., v_k}, keep a running counter for each of the k possible values, e.g., two counters for {H,T}, four counters for DNA, etc.. Initialise each counter to 1, not 0. Process the sequence from left to right and at each step, estimate the probability of the current datum's value as the datum's value's counter divided by the sum of all the counters; only then update the datum's value's counter, not before^[a]. A binomial (k=2) example:

data→		H		T		T		H		T		T		(N=6)
counters	1		2		2		2		3		3
counters	1		1		2		3		3		4
sum	2		3		4		5		6		7
pr		1 2	×	1 3	×	2 4	×	2 5	×	3 6	×	4 7	=	#H!.#T! (N+1)!

In general

pr([x₁, ..., x_N]) = { ∏_i=1..k #v_i! } / { (N+k-1)! / (k-1)! },

where #v_i is the number of v_i in the sequence.

It can be seen that every permutation of the sequence has the same probability, which is good.

It can be shown that the 1-part adaptive and "combinatorial" codes give exactly the same message length, and that the 2-part MML code's message is longer, but only by a fraction of a bit for each of the multinomial's k-1 parameters; MML is also the only one of the three codes to state a model, and there ain't no free lunch.

A convenient programming trick for calculating the length of the first part (model complexity) of the 2-part MML message is to work from the adaptive code's message length, plus MML's small overall length increment, less the length of the MML message's second (data|model) part.

^[a]Note that both the transmitter and the receiver can keep identical counters in synchronisation and thus the receiver can decode the message.