Movatterモバイル変換

orlp/pdqsortPublic

NotificationsYou must be signed in to change notification settings
Fork99
Star2.4k

Pattern-defeating quicksort.

License

Zlib license

2.4k stars 99 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
bench		bench
license.txt		license.txt
pdqsort.h		pdqsort.h
readme.md		readme.md

Repository files navigation

pdqsort

Pattern-defeating quicksort (pdqsort) is a novel sorting algorithm that combines the fast averagecase of randomized quicksort with the fast worst case of heapsort, while achieving linear time oninputs with certain patterns. pdqsort is an extension and improvement of David Mussers introsort.All code is available for free under the zlib license.

Best        Average     Worst       Memory      Stable      Deterministicn           n log n     n log n     log n       No          Yes

Usage

pdqsort is a drop-in replacement forstd::sort.Just replace a call tostd::sort withpdqsort to start using pattern-defeating quicksort. If yourcomparison function is branchless, you can callpdqsort_branchless for a potential big speedup. Ifyou are using C++11, the type you're sorting is arithmetic and your comparison function is not givenor isstd::less/std::greater,pdqsort automatically delegates topdqsort_branchless.

Benchmark

A comparison of pdqsort and GCC'sstd::sort andstd::stable_sort with various inputdistributions:

Compiled with-std=c++11 -O2 -m64 -march=native.

Visualization

A visualization of pattern-defeating quicksort sorting a ~200 element array with some duplicates.Generated using Timo Bingmann'sThe Sound of Sortingprogram, a tool that has been invaluable during the development of pdqsort. For the purposes ofthis visualization the cutoff point for insertion sort was lowered to 8 elements.

The best case

pdqsort is designed to run in linear time for a couple of best-case patterns. Linear time isachieved for inputs that are in strictly ascending or descending order, only contain equal elements,or are strictly in ascending order followed by one out-of-place element. There are two separatemechanisms at play to achieve this.

For equal elements a smart partitioning scheme is used that always puts equal elements in thepartition containing elements greater than the pivot. When a new pivot is chosen it's compared tothe greatest element in the partition before it. If they compare equal we can derive that there areno elements smaller than the chosen pivot. When this happens we switch strategy for this partition,and filter out all elements equal to the pivot.

To get linear time for the other patterns we check after every partition if any swaps were made. Ifno swaps were made and the partition was decently balanced we will optimistically attempt to useinsertion sort. This insertion sort aborts if more than a constant amount of moves are required tosort.

The average case

On average case data where no patterns are detected pdqsort is effectively a quicksort that usesmedian-of-3 pivot selection, switching to insertion sort if the number of elements to be(recursively) sorted is small. The overhead associated with detecting the patterns for the best caseis so small it lies within the error of measurement.

pdqsort gets a great speedup over the traditional way of implementing quicksort when sorting largearrays (1000+ elements). This is due to a new technique described in "BlockQuicksort: How BranchMispredictions don't affect Quicksort" by Stefan Edelkamp and Armin Weiss. In short, we bypass thebranch predictor by using small buffers (entirely in L1 cache) of the indices of elements that needto be swapped. We fill these buffers in a branch-free way that's quite elegant (in pseudocode):

buffer_num =0; buffer_max_size =64;for (int i =0; i < buffer_max_size; ++i) {// With branch:if (elements[i] < pivot) { buffer[buffer_num] = i; buffer_num++; }// Without:    buffer[buffer_num] = i; buffer_num += (elements[i] < pivot);}

This is only a speedup if the comparison function itself is branchless, however. By default pdqsortwill detect this if you're using C++11 or higher, the type you're sorting is arithmetic (e.g.int), and you're using eitherstd::less orstd::greater. You can explicitly request branchlesspartitioning by callingpdqsort_branchless instead ofpdqsort.

The worst case

Quicksort naturally performs bad on inputs that form patterns, due to it being a partition-basedsort. Choosing a bad pivot will result in many comparisons that give little to no progress in thesorting process. If the pattern does not get broken up, this can happen many times in a row. Worse,real world data is filled with these patterns.

Traditionally the solution to this is to randomize the pivot selection of quicksort. While thistechnically still allows for a quadratic worst case, the chances of it happening are astronomicallysmall. Later, in introsort, pivot selection is kept deterministic, instead switching to theguaranteed O(n log n) heapsort if the recursion depth becomes too big. In pdqsort we adopt a hybridapproach, (deterministically) shuffling some elements to break up patterns when we encounter a "bad"partition. If we encounter too many "bad" partitions we switch to heapsort.

Bad partitions

A bad partition occurs when the position of the pivot after partitioning is under 12.5% (1/8th)percentile or over 87,5% percentile - the partition is highly unbalanced. When this happens we willshuffle four elements at fixed locations for both partitions. This effectively breaks up manypatterns. If we encounter more than log(n) bad partitions we will switch to heapsort.

The 1/8th percentile is not chosen arbitrarily. An upper bound of quicksorts worst case runtime canbe approximated within a constant factor by the following recurrence:

T(n, p) = n + T(p(n-1), p) + T((1-p)(n-1), p)

Where n is the number of elements, and p is the percentile of the pivot after partitioning.T(n, 1/2) is the best case for quicksort. On modern systems heapsort is profiled to beapproximately 1.8 to 2 times as slow as quicksort. Choosing p such thatT(n, 1/2) / T(n, p) ~= 1.9as n gets big will ensure that we will only switch to heapsort if it would speed up the sorting.p = 1/8 is a reasonably close value and is cheap to compute on every platform using a bitshift.

About

Pattern-defeating quicksort.

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

pdqsort

Usage

Benchmark

Visualization

The best case

The average case

The worst case

Bad partitions

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Contributors4

Uh oh!

Languages

Movatterモバイル変換

License

orlp/pdqsort

Folders and files

Latest commit

History

Repository files navigation

pdqsort

Usage

Benchmark

Visualization

The best case

The average case

The worst case

Bad partitions

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Contributors4

Uh oh!

Languages

Packages