% -*- mode: Noweb; noweb-code-mode: c++-mode -*- \title{An efficient implementation of Blum, Floyd, Pratt, Rivest, and Tarjan's worst-case linear selection algorithm} \author{Derrick Coetzee, webmaster@moonflare.com} \maketitle \tableofcontents \section{Introduction} In 1973, Manuel Blum, Robert W. Floyd, Vaughan Pratt, Ronald L. Rivest, and Robert E. Tarjan wrote a paper entitled \emph{Time bounds for selection} which explored the problem of selecting the $k$th smallest element in an array, and demonstrated an explicit algorithm for solving it in worst-case $O(n)$ time, using only comparisons. This algorithm has been republished in many other works, and this implementation is based on the description given in Cormen, Leiserson, Rivest, and Stein's important textbook \emph{Introduction to Algorithms}. In practice, the algorithm is typically not used, nor should be in its na\"{i}ve form, because there is a much simpler algorithm, a close relative of quicksort, which solves the algorithm in very fast average-case time, although, like quicksort, it can potentially take a very long time. This algorithm is also demonstrated here, for comparison. I have made several small improvements to the worst-case linear algorithm, however, that close the gap. This code is written in C++, but in a procedural fashion; it is essentially C with some sugar on top. The source file in outline is: <>= <
> <> <> <> <> int pivotIdx = partition(a, size, rand() % size); if (k != pivotIdx) { if (k < pivotIdx) { selectRandom(a, pivotIdx, k); } else /* if (k > pivotIdx) */ { selectRandom(a + pivotIdx + 1, size - pivotIdx - 1, k - pivotIdx - 1); } } } @ Like a typical randomized quicksort, we perform a partition around a randomly chosen pivot element. However, instead of recursing on both partitions, we already know which one will contain the $k$th smallest element, because only the \texttt{pivotIdx} smallest elements are found below the pivot after the partition, and the others above. If $k = $ pivotIdx, then we are done, since, as noted above, \texttt{partition} places the pivot into its correct sorted position. Like most functions in this program, the function is templated over the type of the elements in the array. As in quicksort, this algorithm becomes needlessly inefficient for the subproblems involving very small lists, and so we simply use insertion sort for such cases, moving the $k$th smallest element to position $k$ for all $k$: <>= template void select(T a[], int size, int k) { <