Movatterモバイル変換

[0]ホーム

Jump to content

Topological sorting

Edit links

From Wikipedia, the free encyclopedia

Node ordering for directed acyclic graphs

"Dependency resolution" redirects here. For other uses, seeDependency (disambiguation).

Incomputer science, atopological sort ortopological ordering of adirected graph is alinear ordering of itsvertices such that for every directed edge(u,v) from vertexu to vertexv,u comes beforev in the ordering. For instance, the vertices of the graph may represent tasks to be performed, and the edges may represent constraints that one task must be performed before another; in this application, a topological ordering is just a valid sequence for the tasks.

Precisely, a topological sort is a graph traversal in which each nodev is visited only after all its dependencies are visited. A topological ordering is possible if and only if the graph has nodirected cycles, that is, if it is adirected acyclic graph (DAG). Any DAG has at least one topological ordering, and there arelinear time algorithms for constructing it. Topological sorting has many applications, especially in ranking problems such asfeedback arc set. Topological sorting is also possible when the DAG hasdisconnected components.

Examples

[edit]

This graph has many valid topological sorts, including:
5, 7, 3, 11, 8, 2, 9, 10 (visual left-to-right, top-to-bottom)
3, 5, 7, 8, 11, 2, 9, 10 (smallest-numbered available vertex first)
3, 5, 7, 8, 11, 2, 10, 9 (lexicographic by incoming neighbors)
5, 7, 3, 8, 11, 2, 10, 9 (fewest edges first)
7, 5, 11, 3, 10, 8, 9, 2 (largest-numbered available vertex first)
5, 7, 11, 2, 3, 8, 9, 10 (attempting top-to-bottom, left-to-right)
3, 7, 8, 5, 11, 10, 2, 9 (arbitrary)

The canonical application of topological sorting is inscheduling a sequence of jobs or tasks based on theirdependencies. The jobs are represented by vertices, and there is an edge fromx toy if jobx must be completed before joby can be started (for example, when washing clothes, the washing machine must finish before we put the clothes in the dryer). Then, a topological sort gives an order in which to perform the jobs. A closely related application of topological sorting algorithms was first studied in the early 1960s in the context of thePERT technique for scheduling inproject management.^[1] In this application, the vertices of a graph represent the milestones of a project, and the edges represent tasks that must be performed between one milestone and another. Topological sorting forms the basis of linear-time algorithms for finding thecritical path of the project, a sequence of milestones and tasks that controls the length of the overall project schedule.

In computer science, applications of this type arise ininstruction scheduling, ordering of formula cell evaluation when recomputing formula values inspreadsheets,logic synthesis, determining the order of compilation tasks to perform inmakefiles, dataserialization, and resolving symbol dependencies inlinkers. It is also used to decide in which order to load tables with foreign keys in databases.

Algorithms

[edit]

The usual algorithms for topological sorting have running time linear in the number of nodes plus the number of edges, asymptotically, $O(\left|{V}\right|+\left|{E}\right|).$

Kahn's algorithm

[edit]

Not to be confused withKuhn's algorithm.

One of these algorithms, first described byKahn (1962), works by choosing vertices in the same order as the eventual topological sort.^[2] First, find a list of "start nodes" that have no incoming edges and insert them into a set S; at least one such node must exist in a non-empty (finite) acyclic graph. Then:

L ← Empty list that will contain the sorted elementsS ← Set of all nodes with no incoming edgewhileSis not emptydo    remove a noden fromS    addn toLfor each nodem with an edgee fromn tomdo        remove edgee from the graphifm has no other incoming edgesthen            insertm intoSifgraph has edgesthenreturn error(graph has at least one cycle)elsereturnL(a topologically sorted order)

If the graph is aDAG, a solution will be contained in the list L (although the solution is not necessarily unique). Otherwise, the graph must have at least one cycle and therefore a topological sort is impossible.

Reflecting the non-uniqueness of the resulting sort, the structure S can be simply a set or a queue or a stack. Depending on the order that nodes n are removed from set S, a different solution is created. A variation of Kahn's algorithm that breaks tieslexicographically forms a key component of theCoffman–Graham algorithm for parallel scheduling andlayered graph drawing.

Depth-first search

[edit]

An alternative algorithm for topological sorting is based ondepth-first search. The algorithm loops through each node of the graph, in an arbitrary order, initiating a depth-first search that terminates when it hits any node that has already been visited since the beginning of the topological sort or the node has no outgoing edges (i.e., a leaf node):

L ← Empty list that will contain the sorted nodeswhile exists nodes without a permanent markdo    select an unmarked noden    visit(n)function visit(noden)ifn has a permanent markthenreturnifn has a temporary markthenstop(graph has at least one cycle)    markn with a temporary markfor each nodem with an edge fromn tomdo        visit(m)    markn with a permanent mark    addn to head ofL

Each noden getsprepended to the output list L only after considering all other nodes that depend onn (all descendants ofn in the graph). Specifically, when the algorithm adds noden, we are guaranteed that all nodes that depend onn are already in the output list L: they were added to L either by the recursive call to visit() that ended before the call to visitn, or by a call to visit() that started even before the call to visitn. Since each edge and node is visited once, the algorithm runs in linear time. This depth-first-search-based algorithm is the one described byCormen et al. (2001);^[3] it seems to have been first described in print by Tarjan in 1976.^[4]

Parallel algorithms

[edit]

On aparallel random-access machine, a topological ordering can be constructed inO((logn)²) time using a polynomial number of processors, putting the problem into the complexity classNC².^[5]One method for doing this is to repeatedly square theadjacency matrix of the given graph, logarithmically many times, usingmin-plus matrix multiplication with maximization in place of minimization. The resulting matrix describes thelongest path distances in the graph. Sorting the vertices by the lengths of their longest incoming paths produces a topological ordering.^[6]

An algorithm for parallel topological sorting ondistributed memory machines parallelizes the algorithm of Kahn for aDAG $G=(V,E)$ .^[7] On a high level, the algorithm of Kahn repeatedly removes the vertices of indegree 0 and adds them to the topological sorting in the order in which they were removed. Since the outgoing edges of the removed vertices are also removed, there will be a new set of vertices of indegree 0, where the procedure is repeated until no vertices are left. This algorithm performs $D+1$ iterations, whereD is the longest path inG. Each iteration can be parallelized, which is the idea of the following algorithm.

In the following, it is assumed that the graph partition is stored onp processing elements (PE), which are labeled $0,\dots ,p-1$ . Each PEi initializes a set of local vertices $Q_{i}^{1}$ withindegree 0, where the upper index represents the current iteration. Since all vertices in the local sets $Q_{0}^{1},\dots ,Q_{p-1}^{1}$ have indegree 0, i.e., they are not adjacent, they can be given in an arbitrary order for a valid topological sorting. To assign a global index to each vertex, aprefix sum is calculated over the sizes of $Q_{0}^{1},\dots ,Q_{p-1}^{1}$ . So, each step, there are ${\textstyle \sum _{i=0}^{p-1}|Q_{i}|}$ vertices added to the topological sorting.

Execution of the parallel topological sorting algorithm on a DAG with two processing elements.

In the first step, PEj assigns the indices ${\textstyle \sum _{i=0}^{j-1}|Q_{i}^{1}|,\dots ,\left(\sum _{i=0}^{j}|Q_{i}^{1}|\right)-1}$ to the local vertices in $Q_{j}^{1}$ . These vertices in $Q_{j}^{1}$ are removed, together with their corresponding outgoing edges. For each outgoing edge $(u,v)$ with endpointv in another PE $l,j\neq l$ , the message $(u,v)$ is posted to PEl. After all vertices in $Q_{j}^{1}$ are removed, the posted messages are sent to their corresponding PE. Each message $(u,v)$ received updates the indegree of the local vertexv. If the indegree drops to zero,v is added to $Q_{j}^{2}$ . Then the next iteration starts.

In stepk, PEj assigns the indices ${\textstyle a_{k-1}+\sum _{i=0}^{j-1}|Q_{i}^{k}|,\dots ,a_{k-1}+\left(\sum _{i=0}^{j}|Q_{i}^{k}|\right)-1}$ , where $a_{k-1}$ is the total number of processed vertices after step⁠ $k-1$ ⁠. This procedure repeats until there are no vertices left to process, hence ${\textstyle \sum _{i=0}^{p-1}|Q_{i}^{D+1}|=0}$ . Below is a high level,single program, multiple data pseudo-code overview of this algorithm.

Note that theprefix sum for the local offsets ${\textstyle a_{k-1}+\sum _{i=0}^{j-1}|Q_{i}^{k}|,\dots ,a_{k-1}+\left(\sum _{i=0}^{j}|Q_{i}^{k}|\right)-1}$ can be efficiently calculated in parallel.

p processing elements with IDs from 0 top-1Input: G = (V, E) DAG, distributed to PEs, PE index j = 0, ..., p - 1Output: topological sorting of Gfunction traverseDAGDistributed    δ incoming degree of local verticesVQ = {v ∈V | δ[v] = 0}                     // All vertices with indegree 0    nrOfVerticesProcessed = 0doglobal build prefix sum over size ofQ     // get offsets and total number of vertices in this step        offset = nrOfVerticesProcessed + sum(Q_i, i = 0 to j - 1)          //j is the processor indexforeach uin Q            localOrder[u] = index++;foreach (u,v) in Edo post message (u, v) to PE owning vertexv        nrOfVerticesProcessed += sum(|Q_i|, i = 0 to p - 1)        deliver all messages to neighbors of vertices in Q        receive messages for local vertices V        remove all vertices in Qforeach message (u, v) received:if --δ[v] = 0                addv toQwhile global size ofQ > 0return localOrder

The communication cost depends heavily on the given graph partition. As for runtime, on aCRCW-PRAM model that allows fetch-and-decrement in constant time, this algorithm runs in ${\textstyle {\mathcal {O}}\left({\frac {m+n}{p}}+D(\Delta +\log n)\right)}$ , whereD is again the longest path inG andΔ the maximum degree.^[7]

Application to shortest path finding

[edit]

The topological ordering can also be used to quickly computeshortest paths through aweighted directed acyclic graph. LetV be the list of vertices in such a graph, in topological order. Then the following algorithm computes the shortest path from some source vertexs to all other vertices:^[3]

Letd be an array of the same length asV; this will hold the shortest-path distances froms. Setd[s] = 0, all otherd[u] = ∞.
Letp be an array of the same length asV, with all elements initialized tonil. Eachp[u] will hold the predecessor ofu in the shortest path froms tou.
Loop over the verticesu as ordered inV, starting froms:
- For each vertexv directly followingu (i.e., there exists an edge fromu tov):
  - Letw be the weight of the edge fromu tov.
  - Relax the edge: ifd[v] >d[u] +w, set
    - d[v] ←d[u] +w,
    - p[v] ←u.

Equivalently:

Letd be an array of the same length asV; this will hold the shortest-path distances froms. Setd[s] = 0, all otherd[u] = ∞.
Letp be an array of the same length asV, with all elements initialized tonil. Eachp[u] will hold the predecessor ofu in the shortest path froms tou.
Loop over the verticesu as ordered inV, starting froms:
- For each vertexv intou (i.e., there exists an edge fromv tou):
  - Letw be the weight of the edge fromv tou.
  - Relax the edge: ifd[u] >d[v] +w, set
    - d[u] ←d[v] +w,
    - p[u] ←v.

On a graph ofn vertices andm edges, this algorithm takesΘ(n +m), i.e.,linear, time.^[3]

Uniqueness

[edit]

If a topological sort has the property that all pairs of consecutive vertices in the sorted order are connected by edges, then these edges form a directedHamiltonian path in theDAG. If a Hamiltonian path exists, the topological sort order is unique; no other order respects the edges of the path. Conversely, if a topological sort does not form a Hamiltonian path, the DAG will have two or more valid topological orderings, for in this case it is always possible to form a second valid ordering by swapping two consecutive vertices that are not connected by an edge to each other. Therefore, it is possible to test in linear time whether a unique ordering exists, and whether a Hamiltonian path exists, despite theNP-hardness of the Hamiltonian path problem for more general directed graphs (i.e., cyclic directed graphs).^[8]

Relation to partial orders

[edit]

Topological orderings are also closely related to the concept of alinear extension of apartial order in mathematics. A partially ordered set is just a set of objects together with a definition of the "≤" inequality relation, satisfying the axioms of reflexivity (x ≤ x), antisymmetry (ifx ≤ y andy ≤ x thenx = y) andtransitivity (ifx ≤ y andy ≤ z, thenx ≤ z). Atotal order is a partial order in which, for every two objectsx andy in the set, eitherx ≤ y ory ≤ x. Total orders are familiar in computer science as the comparison operators needed to performcomparison sorting algorithms. For finite sets, total orders may be identified with linear sequences of objects, where the "≤" relation is true whenever the first object precedes the second object in the order; a comparison sorting algorithm may be used to convert a total order into a sequence in this way. A linear extension of a partial order is a total order that is compatible with it, in the sense that, ifx ≤ y in the partial order, thenx ≤ y in the total order as well.

One can define a partial ordering from any DAG by letting the set of objects be the vertices of the DAG, and definingx ≤ y to be true, for any two verticesx andy, whenever there exists adirected path fromx toy; that is, whenevery isreachable fromx. With these definitions, a topological ordering of the DAG is the same thing as a linear extension of this partial order. Conversely, any partial ordering may be defined as the reachability relation in a DAG. One way of doing this is to define a DAG that has a vertex for every object in the partially ordered set, and an edgexy for every pair of objects for whichx ≤ y. An alternative way of doing this is to use thetransitive reduction of the partial ordering; in general, this produces DAGs with fewer edges, but the reachability relation in these DAGs is still the same partial order. By using these constructions, one can use topological ordering algorithms to find linear extensions of partial orders.

Relation to scheduling optimisation

[edit]

By definition, the solution of a scheduling problem that includes a precedence graph is a valid solution to topological sort (irrespective of the number of machines), however, topological sort in itself isnot enough to optimally solve a scheduling optimisation problem. Hu's algorithm is a popular method used to solve scheduling problems that require a precedence graph and involve processing times (where the goal is to minimise the largest completion time amongst all the jobs). Like topological sort, Hu's algorithm is not unique and can be solved using DFS (by finding the largest path length and then assigning the jobs).

References

[edit]

^Jarnagin, M. P. (1960),Automatic machine methods of testing PERT networks for consistency, Technical Memorandum No. K-24/60, Dahlgren, Virginia: U. S. Naval Weapons Laboratory
^Kahn, Arthur B. (1962), "Topological sorting of large networks",Communications of the ACM,5 (11):558–562,doi:10.1145/368996.369025,S2CID 16728233
^^a ^b ^cCormen, Thomas H.;Leiserson, Charles E.;Rivest, Ronald L.;Stein, Clifford (2001), "Section 22.4: Topological sort",Introduction to Algorithms (2nd ed.), MIT Press and McGraw-Hill, pp. 549–552,ISBN 0-262-03293-7
^Tarjan, Robert E. (1976), "Edge-disjoint spanning trees and depth-first search",Acta Informatica,6 (2):171–185,doi:10.1007/BF00268499,S2CID 12044793
^Cook, Stephen A. (1985), "A Taxonomy of Problems with Fast Parallel Algorithms",Information and Control,64 (1–3):2–22,doi:10.1016/S0019-9958(85)80041-3
^Dekel, Eliezer; Nassimi, David; Sahni, Sartaj (1981), "Parallel matrix and graph algorithms",SIAM Journal on Computing,10 (4):657–675,doi:10.1137/0210049,MR 0635424
^^a ^bSanders, Peter; Mehlhorn, Kurt; Dietzfelbinger, Martin; Dementiev, Roman (2019),Sequential and Parallel Algorithms and Data Structures: The Basic Toolbox, Springer International Publishing,ISBN 978-3-030-25208-3
^Vernet, Oswaldo; Markenzon, Lilian (1997),"Hamiltonian problems for reducible flowgraphs"(PDF),Proceedings: 17th International Conference of the Chilean Computer Science Society, pp. 264–267,doi:10.1109/SCCC.1997.637099,hdl:11422/2585,ISBN 0-8186-8052-0,S2CID 206554481

External links

[edit]

v t e Data structures andalgorithms
Data structures	Array Associative array Binary search tree Fenwick tree Graph Hash table Heap Linked list Queue Segment tree Stack String Tree Trie
Algorithms andalgorithmic paradigms	Backtracking Binary search Breadth-first search Brute-force search Depth-first search Divide and conquer Dynamic programming Graph traversal Fold Greedy Hash function Minimax Online Randomized Recursion Root-finding Sorting Streaming Sweep line String-searching Topological sorting
List of data structures List of algorithms

v t e Sorting algorithms
Theory	Computational complexity theory Big O notation Total order Lists Inplacement Stability Comparison sort Adaptive sort Sorting network Integer sorting X + Y sorting Transdichotomous model Quantum sort
Exchange sorts	Bubble sort Cocktail shaker sort Odd–even sort Comb sort Gnome sort Proportion extend sort Quicksort
Selection sorts	Selection sort Heapsort Smoothsort Cartesian tree sort Tournament sort Cycle sort Weak-heap sort
Insertion sorts	Insertion sort Shellsort Splaysort Tree sort Library sort Patience sorting
Merge sorts	Merge sort Cascade merge sort Oscillating merge sort Polyphase merge sort
Distribution sorts	American flag sort Bead sort Bucket sort Burstsort Counting sort Interpolation sort Pigeonhole sort Proxmap sort Radix sort Flashsort
Concurrent sorts	Bitonic sorter Batcher odd–even mergesort Pairwise sorting network Samplesort
Hybrid sorts	Block merge sort Introsort Kirkpatrick–Reisch sort Merge-insertion sort Powersort Timsort Spreadsort
Other	Topological sorting Pre-topological order Pancake sorting Spaghetti sort
Impractical sorts	Stooge sort Slowsort Bogosort