| Class | Sorting algorithm |
|---|---|
| Data structure | Array |
| Worst-caseperformance | O(n logn) |
| Best-caseperformance | O(n logn) |
| Averageperformance | O(n logn) |
| Worst-casespace complexity | O(logn) auxiliary |
| Optimal | Yes |
Proportion extend sort (abbreviated asPESort) is an in-place, comparison-basedsorting algorithm which attempts to improve on the performance, particularly theworst-case performance, ofquicksort.
The basic partitioning operation in quicksort has a linear access pattern which is extremely efficient on modernmemory hierarchies, but the performance of the algorithm is critically dependent on the choice of a pivot value. A good pivot will divide the data to be sorted into nearly equal halves. A poor choice will result in a grossly lopsided division, leaving one part almost as large as the original problem and causingO(n2) performance.
Proportion extend sort begins with a sorted prefix ofk elements, then uses the median of that sample to partition the followingpk elements. By bounding the size ratiop between the sample and the data being partitioned (i.e. the proportion by which the sorted prefix is extended), the imbalance is limited. In this, it has some similarities tosamplesort.[1]
Proportion extend sort was published by Jing-Chao Chen in 2001[2] as an improvement on his earlier proportion split sort design.[3] Its average-case performance, which was only experimentally measured in the original paper, was analyzed by Richard Cole and David C. Kandathil in 2004[4] and by Chen in 2006,[5] and shown to requirelog2n +O(n) comparisons on average. A slightly refined variant, symmetry partition sort, was published in 2007.[6]
The algorithm begins with an array divided into a sorted partS adjacent to an unsorted partU. (The original proportion extend sort always had the sorted part precede the unsorted part; the symmetric variant allows either order.) It is possible to begin with the first element as the sorted part (a single element is always sorted), or to sort a small number of elements using a simplerinsertion sort. The initially sorted elements may also be taken from across the array to improve performance in the case of pre-sorted data.
Next, and most critically, the length of the unsorted part|U| is bounded to a multiplep of the length of the sorted part|S|. Specifically, if|U| >p2|S|, thenrecursively sortS and the adjacentp|S| elements ofU, make the result (p+1 times longer than the original[a]) the newS, and repeat until the condition is satisfied.
If there is no limit on the unsorted part (p=∞), then the algorithm is equivalent to quicksort. If the unsorted part is of length 1 (p=0, almost), then the algorithm is equivalent to binary insertion sort. Values aroundp≈16 give the best average-case performance, competitive with quicksort,[6]: 764 while smaller values improve worst-case performance.[b]
Eliezer Albacea published a similar algorithm in 1995 calledLeapfrogging samplesort where the size is limited so|U| ≤ |S|+1,[7][1] later generalized to(2k−1)(|S|+1).[8]
The sorted part of the array is divided in half (at the median), and one half is moved (by exchanging it with unsorted elements) to the far end of the array, so we have an initial partially partitioned array of the formLUR, whereL is the left half of the sorted part,U is the bounded-length unsorted part, andR is the right half of the sorted part.
Then the standard quicksort partitioning step is performed onU, dividing it (in place) intoUL andUR.UL andUR are not sorted, but every element ofUL is less than or equal to the median, and every element ofUR is greater or equal.[c] The final resultLULURR consists of two arrays of the necessary form (a sorted part adjacent to an unsorted part) and are sortedrecursively.
Leapfrogging samplesort and the original proportion extend sort have the sorted part always precede the unsorted part, achieved by partitioningUbefore movingR, resulting inLRULUR, and then exchangingR with the end ofUL, resulting inLULRUR. While the symmetric version is a bit trickier, it has the advantage that theL andR parts act assentinel values for the partitioning loops, eliminating the need to test in the loop if the bounds ofU have been reached.[1]
Most of the implementation refinements used for quicksort can be applied, including techniques for detecting and efficiently handling mostly sorted inputs.[9][6] In particular, sub-sorts below a certain size threshold are usually implemented using a simple insertion sort.
As with quicksort, the number of recursive levels can be limited tolog2n if the smaller sub-sort is done first and the larger is implemented as atail call. Unlike quicksort, the number of levels is bounded byO(logn) even if this isnot done.[9]: 781