Movatterモバイル変換

Bucket sort

From Wikipedia, the free encyclopedia

Sorting algorithm

Bucket sort
Class	Sorting algorithm
Data structure	Array
Worst-case performance	$O\left(n^{2}\right)$
Average performance	$O\left(n+{\frac {n^{2}}{k}}+k\right)$ , where k is the number of buckets. $O(n),{\text{when }}k\approx n$ .
Worst-case space complexity	$O(n+k)$

Then, elements are sorted within each bin

Bucket sort, orbin sort, is asorting algorithm that works by distributing the elements of anarray into a number of buckets. Each bucket is then sorted individually, either using a different sorting algorithm, or by recursively applying the bucket sorting algorithm. It is adistribution sort, a generalization ofpigeonhole sort that allows multiple keys per bucket, and is a cousin ofradix sort in the most-to-least significant digit flavor. Bucket sort can be implemented with comparisons and therefore can also be considered acomparison sort algorithm. Thecomputational complexity depends on the algorithm used to sort each bucket, the number of buckets to use, and whether the input is uniformly distributed.

Bucket sort works as follows:

Set up an array of initially empty "buckets".
Scatter: Go over the original array, putting each object in its bucket.
Sort each non-empty bucket.
Gather: Visit the buckets in order and put all elements back into the original array.

Pseudocode

[edit]

function bucketSort(array, k)is    buckets ← new array of k empty lists    M ← 1 + the maximum key value in the arrayfor i = 0to length(array)do        insertarray[i] intobuckets[floor(k × array[i] / M)]for i = 0to kdo         nextSort(buckets[i])return the concatenation of buckets[0], ...., buckets[k]

Letarray denote the array to be sorted andk denote the number of buckets to use. One can compute the maximum key value inlinear time by iterating over all the keys once. Thefloor function must be used to convert a floating number to an integer ( and possibly casting of datatypes too ). The functionnextSort is a sorting function used to sort each bucket. Conventionally,insertion sort is used due to its relatively high performance with small numbers of elements, but other algorithms could be used as well, such asselection sort ormerge sort. UsingbucketSort itself asnextSort produces a relative ofradix sort; in particular, the casen = 2 corresponds toquicksort (although potentially with poor pivot choices).

Analysis

[edit]

Worst-case analysis

[edit]

When the input contains several keys that are close to each other (clustering), those elements are likely to be placed in the same bucket, which results in some buckets containing more elements than average. The worst-case scenario occurs when all the elements are placed in a single bucket. The overall performance would then be dominated by the algorithm used to sort each bucket, for example $O(n^{2})$ insertion sort or $O(n\log(n))$ comparison sort algorithms, such asmerge sort.

Average-case analysis

[edit]

Consider the case that the input is uniformly distributed. The first step, which isinitialize the buckets andfind the maximum key value in the array, can be done in $O(n)$ time. If division and multiplication can be done in constant time, thenscattering each element to its bucket also costs $O(n)$ . Assume insertion sort is used to sort each bucket, then the third step costs $O(\textstyle \sum _{i=1}^{k}\displaystyle n_{i}^{2})$ , where $n_{i}$ is the length of the bucket indexed $i {\displaystyle i}$ . Since we are concerning the average time, the expectation $E(n_{i}^{2})$ has to be evaluated instead. Let $X_{ij}$ be the random variable that is $1 {\displaystyle 1}$ if element $j {\displaystyle j}$ is placed in bucket $i {\displaystyle i}$ , and $0 {\displaystyle 0}$ otherwise. We have $n_{i}=\sum _{j=1}^{n}X_{ij}$ . Therefore,

{\begin{aligned}E(n_{i}^{2})&=E\left(\sum _{j=1}^{n}X_{ij}\sum _{l=1}^{n}X_{il}\right)\\&=E\left(\sum _{j=1}^{n}\sum _{l=1}^{n}X_{ij}X_{il}\right)\\&=E\left(\sum _{j=1}^{n}X_{ij}^{2}\right)+E\left(\sum _{1\leq j,l\leq n}\sum _{j\neq l}X_{ij}X_{il}\right).\end{aligned}}

The last line separates the summation into the case $j=l$ and the case $j\neq l$ . Since the chance of an object distributed to bucket $i {\displaystyle i}$ is $1/k$ , $X_{ij}$ is 1 with probability $1/k$ and 0 otherwise.

E(X_{ij}^{2})=1^{2}\cdot \left({\frac {1}{k}}\right)+0^{2}\cdot \left(1-{\frac {1}{k}}\right)={\frac {1}{k}}

E(X_{ij}X_{ik})=1\cdot \left({\frac {1}{k}}\right)\left({\frac {1}{k}}\right)={\frac {1}{k^{2}}}

With the summation, it would be

E\left(\sum _{j=1}^{n}X_{ij}^{2}\right)+E\left(\sum _{1\leq j,k\leq n}\sum _{j\neq k}X_{ij}X_{ik}\right)=n\cdot {\frac {1}{k}}+n(n-1)\cdot {\frac {1}{k^{2}}}={\frac {n^{2}+nk-n}{k^{2}}}

Finally, the complexity would be $O\left(\sum _{i=1}^{k}E(n_{i}^{2})\right)=O\left(\sum _{i=1}^{k}{\frac {n^{2}+nk-n}{k^{2}}}\right)=O\left({\frac {n^{2}}{k}}+n\right)$ .

The last step of bucket sort, which isconcatenating all the sorted objects in each bucket, requires $O(k)$ time. Therefore, the total complexity is $O\left(n+{\frac {n^{2}}{k}}+k\right)$ . Note that if k is chosen to be $k=\Theta (n)$ , then bucket sort runs in $O(n)$ average time, given a uniformly distributed input.^[1]

Optimizations

[edit]

A common optimization is to put the unsorted elements of the buckets back in the original arrayfirst, then runinsertion sort over the complete array; because insertion sort's runtime is based on how far each element is from its final position, the number of comparisons remains relatively small, and the memory hierarchy is better exploited by storing the list contiguously in memory.^[2]

If the input distribution is known or can be estimated, buckets can often be chosen which contain constant density (rather than merely having constant size). This allows $O(n)$ average time complexity even without uniformly distributed input.

Variants

[edit]

Generic bucket sort

[edit]

The most common variant of bucket sort operates on a list ofn numeric inputs between zero and some maximum valueM and divides the value range intob buckets each of sizeM/b. If each bucket is sorted usinginsertion sort, the sort can be shown to run in expected linear time (where the average is taken over all possible inputs).^[3] However, the performance of this sort degrades with clustering; if many values occur close together, they will all fall into a single bucket and be sorted slowly. This performance degradation is avoided in the original bucket sort algorithm by assuming that the input is generated by a random process that distributes elements uniformly over the interval[0,1).^[1]

ProxmapSort

[edit]

Main article:Proxmap sort

Similar to generic bucket sort as described above,ProxmapSort works by dividing an array of keys into subarrays via the use of a "map key" function that preserves a partial ordering on the keys; as each key is added to its subarray, insertion sort is used to keep that subarray sorted, resulting in the entire array being in sorted order when ProxmapSort completes. ProxmapSort differs from bucket sorts in its use of the map key to place the data approximately where it belongs in sorted order, producing a "proxmap" — a proximity mapping — of the keys.

Histogram sort

[edit]

Another variant of bucket sort known as histogram sort orcounting sort adds an initial pass that counts the number of elements that will fall into each bucket using a count array.^[4] Using this information, the array values can be arranged into a sequence of buckets in-place by a sequence of exchanges, leaving no space overhead for bucket storage.^{[failed verification]}

Postman's sort

[edit]

ThePostman's sort is a variant of bucket sort that takes advantage of a hierarchical structure of elements, typically described by a set of attributes. This is the algorithm used by letter-sorting machines inpost offices: mail is sorted first between domestic and international; then by state, province or territory; then by destination post office; then by routes, etc. Since keys are not compared against each other, sorting time is O(cn), wherec depends on the size of the key and number of buckets. This is similar to aradix sort that works "top down," or "most significant digit first."^[5]

Shuffle sort

[edit]

Main article:Shuffle sort

Theshuffle sort^[6] is a variant of bucket sort that begins by removing the first 1/8 of then items to be sorted, sorts them recursively, and puts them in an array. This createsn/8 "buckets" to which the remaining 7/8 of the items are distributed. Each "bucket" is then sorted, and the "buckets" are concatenated into a sorted array.

Comparison with other sorting algorithms

[edit]

Bucket sort can be seen as a generalization ofcounting sort; in fact, if each bucket has size 1 then bucket sort degenerates to counting sort. The variable bucket size of bucket sort allows it to use O(n) memory instead of O(M) memory, whereM is the number of distinct values; in exchange, it gives up counting sort's O(n +M) worst-case behavior.

Bucket sort with two buckets is effectively a version ofquicksort where the pivot value is always selected to be the middle value of the value range. While this choice is effective for uniformly distributed inputs, other means of choosing the pivot in quicksort such as randomly selected pivots make it more resistant to clustering in the input distribution.

Then-waymergesort algorithm also begins by distributing the list inton sublists and sorting each one; however, the sublists created by mergesort have overlapping value ranges and so cannot be recombined by simple concatenation as in bucket sort. Instead, they must be interleaved by a merge algorithm. However, this added expense is counterbalanced by the simpler scatter phase and the ability to ensure that each sublist is the same size, providing a good worst-case time bound.

Top-downradix sort can be seen as a special case of bucket sort where both the range of values and the number of buckets is constrained to be a power of two. Consequently, each bucket's size is also a power of two, and the procedure can be applied recursively. This approach can accelerate the scatter phase, since we only need to examine a prefix of the bit representation of each element to determine its bucket.

References

[edit]

^^a ^bThomas H. Cormen;Charles E. Leiserson;Ronald L. Rivest &Clifford Stein.Introduction to Algorithms.Bucket sort runs in linear time on the average. Like counting sort, bucket sort is fast because it assumes something about the input. Whereas counting sort assumes that the input consists of integers in a small range, bucket sort assumes that the input is generated by a random process that distributes elements uniformly over the interval[0,1). The idea of bucket sort is to divide the interval[0, 1) inton equal-sized subintervals, or buckets, and then distribute then input numbers into the buckets. Since the inputs are uniformly distributed over[0, 1), we don't expect many numbers to fall into each bucket. To produce the output, we simply sort the numbers in each bucket and then go through the buckets in order, listing the elements in each.
^Corwin, E. and Logar, A. "Sorting in linear time — variations on the bucket sort".Journal of Computing Sciences in Colleges, 20, 1, pp.197–202. October 2004.
^Thomas H. Cormen,Charles E. Leiserson,Ronald L. Rivest, andClifford Stein.Introduction to Algorithms, Second Edition. MIT Press and McGraw-Hill, 2001.ISBN 0-262-03293-7. Section 8.4: Bucket sort, pp.174–177.
^NIST's Dictionary of Algorithms and Data Structures: histogram sort
^"Robert Ramey Software Development".
^A revolutionary new sort from John Cohen Nov 26, 1997

Paul E. Black"Postman's Sort" fromDictionary of Algorithms and Data Structures atNIST.
Robert Ramey'"The Postman's Sort"C Users Journal Aug. 1992
NIST's Dictionary of Algorithms and Data Structures: bucket sort

External links

[edit]

v t e Sorting algorithms
Theory	Computational complexity theory Big O notation Total order Lists Inplacement Stability Comparison sort Adaptive sort Sorting network Integer sorting X + Y sorting Transdichotomous model Quantum sort
Exchange sorts	Bubble sort Cocktail shaker sort Odd–even sort Comb sort Gnome sort Proportion extend sort Quicksort
Selection sorts	Selection sort Heapsort Smoothsort Cartesian tree sort Tournament sort Cycle sort Weak-heap sort
Insertion sorts	Insertion sort Shellsort Splaysort Tree sort Library sort Patience sorting
Merge sorts	Merge sort Cascade merge sort Oscillating merge sort Polyphase merge sort
Distribution sorts	American flag sort Bead sort Bucket sort Burstsort Counting sort Interpolation sort Pigeonhole sort Proxmap sort Radix sort Flashsort
Concurrent sorts	Bitonic sorter Batcher odd–even mergesort Pairwise sorting network Samplesort
Hybrid sorts	Block merge sort Introsort Kirkpatrick–Reisch sort Merge-insertion sort Powersort Timsort Spreadsort
Other	Topological sorting Pre-topological order Pancake sorting Spaghetti sort
Impractical sorts	Stooge sort Slowsort Bogosort

Retrieved from "https://en.wikipedia.org/w/index.php?title=Bucket_sort&oldid=1305895083"

Categories:

Hidden categories:

[8]ページ先頭