Selection

The median of an array of N values is the middle value in order of magnitude, for example, the median of 4, 2, 5, 3, 1 is 3. The median is a special case of the more general selection problem which is to return the value having a given rank in order of magnitude. (The smallest value has rank 1, the largest has rank N, and the median has rank ceiling(N / 2).)

One way to solve the selection problem is to sort the given values (in O(N log(N))-time) and return the rankth member of the sorted array. However this method does more work then is necessary and there are faster algorithms.

A method that usually runs in O(N)-time makes use of the partition algorithm from quick-sort: Partition the array into "smaller" and "larger" values. If the desired rank is somewhere within the larger values repeat the process on that section only, otherwise repeat it on the smaller values only. The problem is solved when we arrive at a sub-...-sub-section containing a single value. Note that in general the array has been changed but is not necessarily sorted. Recall that when working on a section of an array, partition first makes a guess at the median of that section. If the guess is good the section is divided into two roughly equal parts. If most guesses are good this version of select runs in O(N)-time: N + f.N + f2.N + ..., where 0 ≤ f < 1, equals N / (1 - f). However if most guesses are bad it runs in O(N2)-time.

Blum et al [2] produced a selection algorithm that is guaranteed to run in linear-time. It divides the numbers into small "columns" (blocks) of size 'c' and finds the median of each column. It then finds the median of these medians, recursively. This yields a guaranteed good guess at the actual median which enables at least one quarter of the values to be eliminated from consideration (f ≤ 0.75). Their final algorithm makes at most 5.4305 N comparisons of elements.

The javascript form runs three selection algorithms, (i) a (very) dumb algorithm as a check, (ii) the partition-based quickselect [1], and (iii) a simple version of the first linear-time algorithm of Blum et al[2].

input: N:
rank:
output:
— Lloyd Allison 29/7/2018

References

[1] C. A. R. Hoare, 'Algorithm 65: FIND,' CACM, 4, pp.321-322, 1961.
Use of partition, usually O(n)-time, worst case O(n2).
[2] Blum, R. Floyd, V. Pratt, R. Rivest, R. Tarjan, 'Time bounds for selection,' J. Computer and System Science, 7(4), pp.448-461, 1973.
First worst-case O(N)-time algorithm.
[3] M. Paterson, 'Progress in selection,' Scandinavian Workshop on Algorithm Theory (LNCS vol.1097), pp.368-379, 1996.
A survey at that date.

And also search for 'median selection problem' in the [bibliography].