This articlemay be too technical for most readers to understand. Pleasehelp improve it tomake it understandable to non-experts, without removing the technical details.(February 2018) (Learn how and when to remove this message) |
Incomputer science,cycle detection orcycle finding is thealgorithmic problem of finding a cycle in asequence ofiterated function values.
For anyfunctionf that maps afinite setS to itself, and any initial valuex0 inS, the sequence of iterated function values
must eventually use the same value twice: there must be some pair of distinct indicesi andj such thatxi =xj. Once this happens, the sequence must continueperiodically, by repeating the same sequence of values fromxi toxj − 1. Cycle detection is the problem of findingi andj, givenf andx0.
Several algorithms are known for finding cycles quickly and with little memory.Robert W. Floyd'stortoise and hare algorithm moves two pointers at different speeds through the sequence of values until they both point to equal values. Alternatively, Brent's algorithm is based on the idea ofexponential search. Both Floyd's and Brent's algorithms use only a constant number of memory cells, and take a number of function evaluations that is proportional to the distance from the start of the sequence to the first repetition. Several other algorithms trade off larger amounts of memory for fewer function evaluations.
The applications of cycle detection include testing the quality ofpseudorandom number generators andcryptographic hash functions,computational number theory algorithms, detection ofinfinite loops in computer programs and periodic configurations incellular automata, automatedshape analysis oflinked list data structures, and detection ofdeadlocks fortransactions management inDBMS.

The figure shows a functionf that maps the setS = {0,1,2,3,4,5,6,7,8} to itself. If one starts fromx0 = 2 and repeatedly appliesf, one sees the sequence of values
The cycle in this value sequence is6, 3, 1.
LetS be any finite set,f be anyendofunction fromS to itself, andx0 be any element ofS. For anyi > 0, letxi =f(xi − 1). Letμ be the smallest index such that the valuexμ reappears infinitely often within the sequence of valuesxi, and letλ (the loop length) be the smallest positive integer such thatxμ =xλ +μ. The cycle detection problem is the task of findingλ andμ.[1]
One can view the same problemgraph-theoretically, by constructing afunctional graph (that is, adirected graph in which each vertex has a single outgoing edge) the vertices of which are the elements ofS and the edges of which map an element to the corresponding function value, as shown in the figure. The set of verticesreachable from starting vertexx0 form a subgraph with a shape resembling theGreek letter rho (ρ): a path of lengthμ fromx0 to acycle ofλ vertices.[2]
Practical cycle-detection algorithms do not findλ andμ exactly.[1] They usually find lower and upper boundsμl ≤μ ≤μh for the start of the cycle, and a more detailed search of the range must be performed if the exact value ofμ is needed. Also, most algorithms do not guarantee to findλ directly, but may find some multiplekλ <μ +λ. (Continuing the search for an additionalkλ/q steps, whereq is the smallest prime divisor ofkλ, will either find the trueλ or prove thatk = 1.)
Except in toy examples like the above,f will not be specified as a table of values. Such a table impliesO(|S|)space complexity, and if that is permissible, building a predecessor array (associative array mappingxi toi) while iteratingf will detect the first repeated value when it is visited the second time, at which point the value in the predecessor array isμ and the current index isμ+λ. Rather, a cycle detection algorithm is given ablack box for generating the sequencexi, and the task is to findλ andμ using very little memory.
The black box might consist of an implementation of the recurrence functionf, but it might also store additional internal state to make the computation more efficient. Althoughxi =f(xi−1) must be truein principle, this might be expensive to compute directly; the function could be defined in terms of thediscrete logarithm ofxi−1 or some otherdifficult-to-compute property which can only be practically computed in terms of additional information. In such cases, the number of black boxes required becomes a figure of merit distinguishing the algorithms.
A second reason to use one of these algorithms is that they arepointer algorithms which do no operations on elements ofS other than testing for equality. An associative array implementation requires computing ahash function on the elements ofS, orordering them. But cycle detection can be applied in cases where neither of these are possible.
The classic example isPollard's rho algorithm forinteger factorization, which searches for a factorp of a given numbern by looking for valuesxi andxi+λ which are equal modulopwithout knowingp in advance. This is done by computing thegreatest common divisor of the differencexi −xi+λ with a known multiple ofp, namelyn. If the gcd is non-trivial (neither 1 norn), then the value is a proper factor ofn, as desired.[2] Ifn is not prime, it must have at least one factorp ≤√n, and by thebirthday paradox, a random functionf has an expected cycle length (modulo p) of√p ≤4√n.
If the input is given as a subroutine for calculatingf, the cycle detection problem may be trivially solved using onlyλ +μ function applications, simply by computing the sequence of valuesxi and using adata structure such as ahash table to store these values and test whether each subsequent value has already been stored. However, the space complexity of this algorithm is proportional toλ +μ, unnecessarily large. Additionally, to implement this method as apointer algorithm would require applying the equality test to each pair of values, resulting in quadratic time overall. Thus, research in this area has concentrated on two goals: using less space than this naive algorithm, and finding pointer algorithms that use fewer equality tests.

Floyd's cycle-finding algorithm is a pointer algorithm that uses only two pointers, which move through the sequence at different speeds. It is also called the "tortoise and the hare algorithm", alluding to Aesop's fable ofThe Tortoise and the Hare.
The algorithm is named afterRobert W. Floyd, who was credited with its invention byDonald Knuth.[3][4] However, the algorithm does not appear in Floyd's published work, and this may be a misattribution: Floyd describes algorithms for listing all simple cycles in adirected graph in a 1967 paper,[5] but this paper does not describe the cycle-finding problem in functional graphs that is the subject of this article. In fact, Knuth's statement (in 1969), attributing it to Floyd, without citation, is the first known appearance in print, and it thus may be afolk theorem, not attributable to a single individual.[6]
The key insight in the algorithm is as follows. If there is a cycle, then, for any integersi ≥μ andk ≥ 0,xi =xi +kλ, whereλ is the length of the loop to be found,μ is the index of the first element of the cycle, andk is a whole integer representing the number of loops. Based on this, it can then be shown thati =kλ ≥μ for somek if and only ifxi =x2i (ifxi =x2i in the cycle, then there exists somek such that2i =i +kλ, which implies thati =kλ; and if there are somei andk such thati =kλ, then2i =i +kλ andx2i =xi +kλ). Thus, the algorithm only needs to check for repeated values of this special form, one twice as far from the start of the sequence as the other, to find a periodν of a repetition that is a multiple ofλ. Onceν is found, the algorithm retraces the sequence from its start to find the first repeated valuexμ in the sequence, using the fact thatλ dividesν and therefore thatxμ =xμ +v. Finally, once the value ofμ is known it is trivial to find the lengthλ of the shortest repeating cycle, by searching for the first positionμ +λ for whichxμ +λ =xμ.
The algorithm thus maintains two pointers into the given sequence, one (the tortoise) atxi, and the other (the hare) atx2i. At each step of the algorithm, it increasesi by one, moving the tortoise one step forward and the hare two steps forward in the sequence, and then compares the sequence values at these two pointers. The smallest value ofi > 0 for which the tortoise and hare point to equal values is the desired valueν.
The followingPython code shows how this idea may be implemented as an algorithm.
deffloyd(f,x0)->(int,int):"""Floyd's cycle detection algorithm."""# Main phase of algorithm: finding a repetition x_i = x_2i.# The hare moves twice as quickly as the tortoise and# the distance between them increases by 1 at each step.# Eventually they will both be inside the cycle and then,# at some point, the distance between them will be# divisible by the period λ.tortoise=f(x0)# f(x0) is the element/node next to x0.hare=f(f(x0))whiletortoise!=hare:tortoise=f(tortoise)hare=f(f(hare))# At this point the tortoise position, ν, which is also equal# to the distance between hare and tortoise, is divisible by# the period λ. So hare moving in cycle one step at a time,# and tortoise (reset to x0) moving towards the cycle, will# intersect at the beginning of the cycle. Because the# distance between them is constant at 2ν, a multiple of λ,# they will agree as soon as the tortoise reaches index μ.# Find the position μ of first repetition.mu=0tortoise=x0whiletortoise!=hare:tortoise=f(tortoise)hare=f(hare)# Hare and tortoise move at same speedmu+=1# Find the length of the shortest cycle starting from x_μ# The hare moves one step at a time while tortoise is still.# lam is incremented until λ is found.lam=1hare=f(tortoise)whiletortoise!=hare:hare=f(hare)lam+=1returnlam,mu
This code only accesses the sequence by storing and copying pointers, function evaluations, and equality tests; therefore, it qualifies as a pointer algorithm. The algorithm usesO(λ +μ) operations of these types, andO(1) storage space.[7]
Richard P. Brent described an alternative cycle detection algorithm that, like the tortoise and hare algorithm, requires only two pointers into the sequence.[8] However, it is based on a different principle: searching for the smallestpower of two2i that is larger than bothλ andμ. Fori = 0, 1, 2, ..., the algorithm comparesx2i−1 with each subsequent sequence value up to the next power of two, stopping when it finds a match. It has two advantages compared to the tortoise and hare algorithm: it finds the correct lengthλ of the cycle directly, rather than needing to search for it in a subsequent stage, and its steps involve only one evaluation of the functionf rather than three.[9]
The following Python code shows how this technique works in more detail.
defbrent(f,x0)->(int,int):"""Brent's cycle detection algorithm."""# main phase: search successive powers of twopower=lam=1tortoise=x0hare=f(x0)# f(x0) is the element/node next to x0.# this assumes there is a cycle; otherwise this loop won't terminatewhiletortoise!=hare:ifpower==lam:# time to start a new power of two?tortoise=harepower*=2lam=0hare=f(hare)lam+=1# Find the position of the first repetition of length λtortoise=hare=x0foriinrange(lam):hare=f(hare)# The distance between the hare and tortoise is now λ.# Next, the hare and tortoise move at same speed until they agreemu=0whiletortoise!=hare:tortoise=f(tortoise)hare=f(hare)mu+=1returnlam,mu
Like the tortoise and hare algorithm, this is a pointer algorithm that usesO(λ +μ) tests and function evaluations andO(1) storage space. It is not difficult to show that the number of function evaluations can never be higher than for Floyd's algorithm. Brent claims that, on average, his cycle finding algorithm runs around 36% more quickly than Floyd's and that it speeds up thePollard rho algorithm by around 24%. He also performs anaverage case analysis for a randomized version of the algorithm in which the sequence of indices traced by the slower of the two pointers is not the powers of two themselves, but rather a randomized multiple of the powers of two. Although his main intended application was in integer factorization algorithms, Brent also discusses applications in testing pseudorandom number generators.[8]
R. W. Gosper's algorithm[10][11] finds the period, and the lower and upper bound of the starting point, and, of the first cycle. The difference between the lower and upper bound is of the same order as the period, i.e..
The algorithm maintains an array of tortoises. For each:
If it is inconvenient to vary the number of comparisons as increases, you may initialize all of the, but must then return if while.
The main features of Gosper's algorithm are that it is economical in space, very economical in evaluations of the generator function, and always finds the exact cycle length (never a multiple). The cost is a large number of equality comparisons. It could be roughly described as a concurrent version of Brent's algorithm. While Brent's algorithm uses a single tortoise, repositioned every time the hare passes a power of two, Gosper's algorithm uses several tortoises (several previous values are saved), which are roughly exponentially spaced. According to the note inHAKMEM item 132,[11] this algorithm will detect repetition before the third occurrence of any value, i.e. the cycle will be iterated at most twice. HAKMEM also states that it is sufficient to store previous values; however, this only offers a saving if we knowa priori that is significantly smaller than. The standard implementations[10] store values. For example, assume the function values are 32-bit integers, so and Then Gosper's algorithm will find the cycle after less than function evaluations (in fact, the most possible is), while consuming the space of 33 values (each value being a 32-bit integer).
Upon the-th evaluation of the generator function, the algorithm compares the generated value with previous values; observe that goes up to at least and at most. Therefore, the time complexity of this algorithm is. Since it stores values, its space complexity is. This is under the usualtransdichotomous model, assumed throughout this article, in which the size of the function values is constant. Without this assumption, we know it requires space to store distinct values, so the overall space complexity is
A number of authors have studied techniques for cycle detection that use more memory than Floyd's and Brent's methods, but detect cycles more quickly. In general these methods store several previously-computed sequence values, and test whether each new value equals one of the previously-computed values. In order to do so quickly, they typically use ahash table or similar data structure for storing the previously-computed values, and therefore are not pointer algorithms: in particular, they usually cannot be applied to Pollard's rho algorithm. Where these methods differ is in how they determine which values to store. Following Nivasch,[12] we survey these techniques briefly.
Any cycle detection algorithm that stores at mostM values from the input sequence must perform at least function evaluations.[18][19]
Cycle detection has been used in many applications.
*print-circle* variable, detects circular list structure and prints it compactly.