Movatterモバイル変換

[0]ホーム

Jump to content

Average-case complexity

Edit links

From Wikipedia, the free encyclopedia

Algorithm characteristic in computations

"AvgP" redirects here. For other uses, seeavgp (disambiguation).

Incomputational complexity theory, theaverage-case complexity of analgorithm is the amount of somecomputational resource (typically time) used by the algorithm, averaged over all possible inputs. It is frequently contrasted withworst-case complexity which considers the maximal complexity of the algorithm over all possible inputs.

There are three primary motivations for studying average-case complexity.^[1] First, although some problems may be intractable in the worst-case, the inputs which elicit this behavior may rarely occur in practice, so the average-case complexity may be a more accurate measure of an algorithm's performance. Second, average-case complexity analysis provides tools and techniques to generate hard instances of problems which can be utilized in areas such ascryptography andderandomization. Third, average-case complexity allows discriminating the most efficient algorithm in practice among algorithms of equivalent best case complexity (for instanceQuicksort).

Average-case analysis requires a notion of an "average" input to an algorithm, which leads to the problem of devising aprobability distribution over inputs. Alternatively, arandomized algorithm can be used. The analysis of such algorithms leads to the related notion of anexpected complexity.^[2]^: 28

History and background

[edit]

The average-case performance of algorithms has been studied since modern notions of computational efficiency were developed in the 1950s. Much of this initial work focused on problems for which worst-case polynomial time algorithms were already known.^[3] In 1973,Donald Knuth^[4] published Volume 3 of theArt of Computer Programming which extensively surveys average-case performance of algorithms for problems solvable in worst-case polynomial time, such as sorting and median-finding.

An efficient algorithm forNP-complete problems is generally characterized as one which runs in polynomial time for all inputs; this is equivalent to requiring efficient worst-case complexity. However, an algorithm which is inefficient on a "small" number of inputs may still be efficient for "most" inputs that occur in practice. Thus, it is desirable to study the properties of these algorithms where the average-case complexity may differ from the worst-case complexity and find methods to relate the two.

The fundamental notions of average-case complexity were developed byLeonid Levin in 1986 when he published a one-page paper^[5] defining average-case complexity and completeness while giving an example of a complete problem fordistNP, the average-case analogue ofNP.

Definitions

[edit]

Efficient average-case complexity

[edit]

The first task is to precisely define what is meant by an algorithm which is efficient "on average". An initial attempt might define an efficient average-case algorithm as one which runs in expected polynomial time over all possible inputs. Such a definition has various shortcomings; in particular, it is not robust to changes in the computational model. For example, suppose algorithmA runs in timet_A(x) on inputx and algorithmB runs in timet_A(x)² on inputx; that is,B is quadratically slower thanA. Intuitively, any definition of average-case efficiency should capture the idea thatA is efficient-on-average if and only ifB is efficient on-average. Suppose, however, that the inputs are drawn randomly from the uniform distribution of strings with lengthn, and thatA runs in timen² on all inputs except the string1ⁿ for whichA takes time2ⁿ. Then it can be easily checked that the expected running time ofA is polynomial but the expected running time ofB is exponential.^[3]

To create a more robust definition of average-case efficiency, it makes sense to allow an algorithmA to run longer than polynomial time on some inputs but the fraction of inputs on whichA requires larger and larger running time becomes smaller and smaller. This intuition is captured in the following formula for average polynomial running time, which balances the polynomial trade-off between running time and fraction of inputs:

\Pr _{x\in _{R}D_{n}}\left[t_{A}(x)\geq t\right]\leq {\frac {p(n)}{t^{\epsilon }}}

for everyn,t > 0 and polynomialp, wheret_A(x) denotes the running time of algorithmA on inputx, andε is a positive constant value.^[6] Alternatively, this can be written as

E_{x\in _{R}D_{n}}\left[{\frac {t_{A}(x)^{\epsilon }}{n}}\right]\leq C

for some constantsC andε, wheren = |x|.^[7] In other words, an algorithmA has good average-case complexity if, after running fort_A(n) steps,A can solve all but a⁠n^c/(t_A(n))^ε⁠ fraction of inputs of lengthn, for someε,c > 0.^[3]

Distributional problem

[edit]

The next step is to define the "average" input to a particular problem. This is achieved by associating the inputs of each problem with a particular probability distribution. That is, an "average-case" problem consists of a languageL and an associated probability distributionD which forms the pair(L,D).^[7] The two most common classes of distributions which are allowed are:

Polynomial-time computable distributions (P-computable): these are distributions for which it is possible to compute the cumulative density of any given inputx. More formally, given a probability distributionμ and a stringx ∈ {0, 1}ⁿ it is possible to compute the value $\mu (x)=\sum \limits _{y\in \{0,1\}^{n}:y\leq x}\Pr[y]$ in polynomial time. This implies thatPr[x] is also computable in polynomial time.
Polynomial-time samplable distributions (P-samplable): these are distributions from which it is possible to draw random samples in polynomial time.

These two formulations, while similar, are not equivalent. If a distribution isP-computable it is alsoP-samplable, but the converse is not true ifP ≠P^#P.^[7]

AvgP and distNP

[edit]

A distributional problem(L,D) is in the complexity classAvgP if there is an efficient average-case algorithm forL, as defined above. The classAvgP is occasionally calleddistP in the literature.^[7]

A distributional problem(L,D) is in the complexity classdistNP ifL is inNP andD isP-computable. WhenL is inNP andD isP-samplable,(L,D) belongs tosampNP.^[7]

Together,AvgP anddistNP define the average-case analogues ofP andNP, respectively.^[7]

Reductions between distributional problems

[edit]

Let(L,D) and(L′,D′) be two distributional problems.(L,D) average-case reduces to(L′,D′) (written(L,D) ≤_AvgP (L′,D′)) if there is a functionf that for everyn, on inputx can be computed in time polynomial inn and

(Correctness)x ∈L if and only iff(x) ∈L′
(Domination) There are polynomialsp andm such that, for everyn andy, $\sum \limits _{x:f(x)=y}D_{n}(x)\leq p(n)D'_{m(n)}(y)$

The domination condition enforces the notion that if problem(L,D) is hard on average, then(L′,D′) is also hard on average. Intuitively, a reduction should provide a way to solve an instancex of problemL by computingf(x) and feeding the output to the algorithm which solvesL'. Without the domination condition, this may not be possible since the algorithm which solvesL in polynomial time on average may take super-polynomial time on a small number of inputs butf may map these inputs into a much larger set ofD' so that algorithmA' no longer runs in polynomial time on average. The domination condition only allows such strings to occur polynomially as often inD'.^[6]

DistNP-complete problems

[edit]

The average-case analogue toNP-completeness isdistNP-completeness. A distributional problem(L′,D′) isdistNP-complete if(L′,D′) is indistNP and for every(L,D) indistNP,(L,D) is average-case reducible to(L′,D′).^[7]

An example of adistNP-complete problem is the BoundedHalting Problem,(BH,D) (for anyP-computableD) defined as follows:

$BH=\{(M,x,1^{t}):M{\text{ is a non-deterministic Turing machine that accepts }}x{\text{ in}}\leq t{\text{ steps}}\}$ ^[7]

In his original paper, Levin showed an example of a distributional tiling problem that is average-caseNP-complete.^[5] A survey of knowndistNP-complete problems is available online.^[6]

One area of active research involves finding newdistNP-complete problems. However, finding such problems can be complicated due to a result of Gurevich which shows that any distributional problem with a flat distribution cannot bedistNP-complete unlessEXP =NEXP.^[8] (A flat distributionμ is one for which there exists anε > 0 such that for anyx,μ(x) ≤ 2^{−|x|^ε}.) A result by Livne shows that all naturalNP-complete problems haveDistNP-complete versions.^[9] However, the goal of finding a natural distributional problem that isDistNP-complete has not yet been achieved.^[10]

Applications

[edit]

Sorting algorithms

[edit]

As mentioned above, much early work relating to average-case complexity focused on problems for which polynomial-time algorithms already existed, such as sorting. For example, many sorting algorithms which utilize randomness, such asQuicksort, have a worst-case running time ofO(n²), but an average-case running time ofO(n log(n)), wheren is the length of the input to be sorted.^[2]

Cryptography

[edit]

For most problems, average-case complexity analysis is undertaken to find efficient algorithms for a problem that is considered difficult in the worst-case. In cryptographic applications, however, the opposite is true: the worst-case complexity is irrelevant; we instead want a guarantee that the average-case complexity of every algorithm which "breaks" the cryptographic scheme is inefficient.^[11]^{[page needed]}

Thus, all secure cryptographic schemes rely on the existence ofone-way functions.^[3] Although the existence of one-way functions is still anopen problem, many candidate one-way functions are based on hard problems such asinteger factorization or computing thediscrete log. Note that it is not desirable for the candidate function to beNP-complete since this would only guarantee that there is likely no efficient algorithm for solving the problem in the worst case; what we actually want is a guarantee that no efficient algorithm can solve the problem over random inputs (i.e. the average case). In fact, both the integer factorization and discrete log problems are inNP ∩coNP, and are therefore not believed to beNP-complete.^[7] The fact that all of cryptography is predicated on the existence of average-case intractable problems inNP is one of the primary motivations for studying average-case complexity.

Other results

[edit]

Yao's principle, from a 1978 paper byAndrew Yao, shows that for broad classes of computational problems, average-case complexity for a hard input distribution and adeterministic algorithm adapted to that distribution is the same thing as expected complexity for a fast randomized algorithm and its worst-case input.^[12]

In 1990, Impagliazzo and Levin showed that if there is an efficient average-case algorithm for adistNP-complete problem under the uniform distribution, then there is an average-case algorithm for every problem inNP under any polynomial-time samplable distribution.^[13] Applying this theory to natural distributional problems remains an outstanding open question.^[3]

In 1992, Ben-David et al. showed that if all languages indistNP have good-on-average decision algorithms, they also have good-on-average search algorithms. Further, they show that this conclusion holds under a weaker assumption: if every language inNP is easy on average for decision algorithms with respect to the uniform distribution, then it is also easy on average for search algorithms with respect to the uniform distribution.^[14] Thus, cryptographic one-way functions can exist only if there aredistNP problems over the uniform distribution that are hard on average for decision algorithms.

In 1993, Feigenbaum and Fortnow showed that it is not possible to prove, under non-adaptive random reductions, that the existence of a good-on-average algorithm for adistNP-complete problem under the uniform distribution implies the existence of worst-case efficient algorithms for all problems inNP.^[15] In 2003, Bogdanov and Trevisan generalized this result to arbitrary non-adaptive reductions.^[16] These results show that it is unlikely that any association can be made between average-case complexity and worst-case complexity via reductions.^[3]

References

[edit]

^Goldreich, Oded; Vadhan, Salil (December 2007)."Special Issue On Worst-case Versus Average-case Complexity Editors' Foreword".Computational Complexity.16 (4):325–330.doi:10.1007/s00037-007-0232-y.ISSN 1016-3328.
^^a ^bCormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford (2009) [1990].Introduction to Algorithms (3rd ed.). MIT Press and McGraw-Hill.ISBN 978-0-262-03384-8.OCLC 311310321.
^^a ^b ^c ^d ^e ^fBogdanov, Andrej; Trevisan, Luca (2006)."Average-Case Complexity".Foundations and Trends in Theoretical Computer Science.2 (1):1–106.doi:10.1561/0400000004.ISSN 1551-305X.
^Knuth, Donald (1973).The Art of Computer Programming. Vol. 3. Addison-Wesley.
^^a ^bLevin, Leonid A. (February 1986)."Average Case Complete Problems".SIAM Journal on Computing.15 (1):285–286.doi:10.1137/0215020.ISSN 0097-5397.
^^a ^b ^cWang, Jie (1997). "Average-case computational complexity theory". In Hemaspaandra, Lane A.; Selman, Alan L. (eds.).Complexity Theory: Retrospective II(PDF). Vol. 2. Springer Science & Business Media. pp. 295–328.
^^a ^b ^c ^d ^e ^f ^g ^h ⁱArora, Sanjeev; Barak, Boaz (2009). "18. Average case complexity: Levin's theory".Computational Complexity: A Modern Approach. Cambridge; New York: Cambridge University Press.
^Gurevich, Yuri (October 1987). "Complete and incomplete randomized NP problems".28th Annual Symposium on Foundations of Computer Science (SFCS 1987). pp. 111–117.doi:10.1109/SFCS.1987.14.ISBN 0-8186-0807-2.
^Livne, Noam (December 2010)."All Natural NP-Complete Problems Have Average-Case Complete Versions".Computational Complexity.19 (4):477–499.doi:10.1007/s00037-010-0298-9.ISSN 1016-3328.
^Goldreich, Oded (2011), Goldreich, Oded (ed.),"Notes on Levin's Theory of Average-Case Complexity",Studies in Complexity and Cryptography. Miscellanea on the Interplay between Randomness and Computation, Lecture Notes in Computer Science, vol. 6650, Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 233–247,doi:10.1007/978-3-642-22670-0_21,ISBN 978-3-642-22669-4, retrieved2025-05-21{{citation}}: CS1 maint: work parameter with ISBN (link)
^Katz, Jonathan; Lindell, Yehuda (2021).Introduction to modern cryptography. Chapman & Hall/CRC cryptography and network security series (3 ed.). Boca Raton, FL: CRC Press.ISBN 978-1-351-13303-6.
^Yao, Andrew (1977), "Probabilistic computations: Toward a unified measure of complexity",Proceedings of the 18th IEEE Symposium on Foundations of Computer Science (FOCS), pp. 222–227,doi:10.1109/SFCS.1977.24
^R. Impagliazzo and L. Levin, "No Better Ways to Generate Hard NP Instances than Picking Uniformly at Random," in Proceedings of the 31st IEEE Sympo- sium on Foundations of Computer Science, pp. 812–821, 1990.
^Ben-David, S.; Chor, B.; Goldreich, O. (1989)."On the theory of average case complexity".Proceedings of the twenty-first annual ACM symposium on Theory of computing - STOC '89. ACM Press. pp. 204–216.doi:10.1145/73007.73027.ISBN 978-0-89791-307-2.
^Feigenbaum, Joan; Fortnow, Lance (October 1993)."Random-Self-Reducibility of Complete Sets".SIAM Journal on Computing.22 (5):994–1005.doi:10.1137/0222061.ISSN 0097-5397.
^Bogdanov, Andrej; Trevisan, Luca (January 2006)."On Worst-Case to Average-Case Reductions for NP Problems".SIAM Journal on Computing.36 (4):1119–1159.doi:10.1137/S0097539705446974.ISSN 0097-5397.

Movatterモバイル変換

Average-case complexity

History and background

Definitions

Efficient average-case complexity

Distributional problem

AvgP and distNP

Reductions between distributional problems

DistNP-complete problems

Applications

Sorting algorithms

Cryptography

Other results

See also

References

Further reading