Movatterモバイル変換

[0]ホーム

Jump to content

Analysis of parallel algorithms

Українська

Edit links

From Wikipedia, the free encyclopedia

Subfield of computer science

Incomputer science,analysis of parallel algorithms is the process of finding thecomputational complexity ofalgorithms executed inparallel – the amount of time, storage, or otherresources needed to execute them. In many respects, analysis ofparallel algorithms is similar tothe analysis ofsequential algorithms, but is generally more involved because one must reason about the behavior of multiple cooperatingthreads of execution. One of the primary goals of parallel analysis is to understand how a parallel algorithm's use of resources (speed, space, etc.) changes as the number of processors is changed.

Background

[edit]

A so-called work-time (WT) (sometimes called work-depth, or work-span) framework was originally introduced by Shiloach and Vishkin^[1]for conceptualizing and describing parallel algorithms. In the WT framework, a parallel algorithm is first described in terms of parallel rounds. For each round, the operations to be performed are characterized, but several issues can be suppressed. For example, the number of operations at each round need not be clear, processors need not be mentioned and any information that may help with the assignment of processors to jobs need not be accounted for. Second, the suppressed information is provided. The inclusion of the suppressed information is guided by the proof of a scheduling theorem due to Brent,^[2] which is explained later in this article. The WT framework is useful since while it can greatly simplify the initial description of a parallel algorithm, inserting the details suppressed by that initial description is often not very difficult. For example, the WT framework was adopted as the basic presentation framework in the parallel algorithms books (for theparallel random-access machine PRAM model)^[3]and,^[4] as well as in the class notes .^[5] The overview below explains how the WT framework can be used for analyzing more general parallel algorithms, even when their description is not available within the WT framework.

Definitions

[edit]

Suppose computations are executed on a machine that hasp processors. LetT_p denote the time that expires between the start of the computation and its end. Analysis of the computation'srunning time focuses on the following notions:

Thework of a computation executed byp processors is the total number of primitive operations that the processors perform.^[6] Ignoring communication overhead from synchronizing the processors, this is equal to the time used to run the computation on a single processor, denotedT₁.
Thedepth orspan is the length of the longest series of operations that have to be performed sequentially due todata dependencies (thecritical path). The depth may also be called thecritical path length of the computation.^[7] Minimizing the depth/span is important in designing parallel algorithms, because the depth/span determines the shortest possible execution time.^[8] Alternatively, the span can be defined as the timeT_∞ spent computing using an idealized machine with an infinite number of processors.^[9]
Thecost of the computation is the quantitypT_p. This expresses the total time spent, by all processors, in both computing and waiting.^[6]

Several useful results follow from the definitions of work, span and cost:

Work law. The cost is always at least the work:pT_p ≥T₁. This follows from the fact thatp processors can perform at mostp operations in parallel.^[6]^[9]
Span law. A finite numberp of processors cannot outperform an infinite number, so thatT_p ≥T_∞.^[9]

Using these definitions and laws, the following measures of performance can be given:

Speedup is the gain in speed made by parallel execution compared to sequential execution:S_p =T₁ /T_p. When the speedup isΩ(p) forp processors (usingbig O notation), the speedup is linear, which is optimal in simple models of computation because the work law implies thatT₁ /T_p ≤p (super-linear speedup can occur in practice due tomemory hierarchy effects). The situationT₁ /T_p =p is called perfect linear speedup.^[9] An algorithm that exhibits linear speedup is said to bescalable.^[6] Analytical expressions for the speedup of many important parallel algorithms are presented in this book.^[10]
Efficiency is the speedup per processor,S_p /p.^[6]
Parallelism is the ratioT₁ /T_∞. It represents the maximum possible speedup on any number of processors. By the span law, the parallelism bounds the speedup: ifp >T₁ /T_∞, then:^[9] ${\displaystyle {\frac {T_{1}}{T_{p}}}\leq {\frac {T_{1}}{T_{\infty }}}<semantics><mrow><mstyle displaystyle=$
Theslackness isT₁ / (pT_∞). A slackness less than one implies (by the span law) that perfect linear speedup is impossible onp processors.^[9]

Execution on a limited number of processors

[edit]

Analysis of parallel algorithms is usually carried out under the assumption that an unbounded number of processors is available. This is unrealistic, but not a problem, since any computation that can run in parallel onN processors can be executed onp <N processors by letting each processor execute multiple units of work. A result calledBrent's law states that one can perform such a "simulation" in timeT_p, bounded by^[11]

T_{p}\leq T_{N}+{\frac {T_{1}-T_{N}}{p}},

or, less precisely,^[6]

T_{p}=O\left(T_{N}+{\frac {T_{1}}{p}}\right).

An alternative statement of the law boundsT_p above and below by

{\frac {T_{1}}{p}}\leq T_{p}\leq {\frac {T_{1}}{p}}+T_{\infty }

showing that the span (depth)T_∞ and the workT₁ together provide reasonable bounds on the computation time.^[2]

References

[edit]

^Shiloach, Yossi; Vishkin, Uzi (1982). "AnO(n² log n) parallel max-flow algorithm".Journal of Algorithms.3 (2):128–146.doi:10.1016/0196-6774(82)90013-X.
^^a ^bBrent, Richard P. (1974-04-01). "The Parallel Evaluation of General Arithmetic Expressions".Journal of the ACM.21 (2):201–206.CiteSeerX 10.1.1.100.9361.doi:10.1145/321812.321815.ISSN 0004-5411.S2CID 16416106.
^JaJa, Joseph (1992).An Introduction to Parallel Algorithms. Addison-Wesley.ISBN 978-0-201-54856-3.
^Keller, Jorg; Kessler, Cristoph W.; Traeff, Jesper L. (2001).Practical PRAM Programming. Wiley-Interscience.ISBN 978-0-471-35351-5.
^Vishkin, Uzi (2009).Thinking in Parallel: Some Basic Data-Parallel Algorithms and Techniques, 104 pages(PDF). Class notes of courses on parallel algorithms taught since 1992 at the University of Maryland, College Park, Tel Aviv University and the Technion.
^^a ^b ^c ^d ^e ^fCasanova, Henri; Legrand, Arnaud; Robert, Yves (2008).Parallel Algorithms. CRC Press. p. 10.CiteSeerX 10.1.1.466.8142.
^Blelloch, Guy (1996)."Programming Parallel Algorithms"(PDF).Communications of the ACM.39 (3):85–97.CiteSeerX 10.1.1.141.5884.doi:10.1145/227234.227246.S2CID 12118850.
^Michael McCool; James Reinders; Arch Robison (2013).Structured Parallel Programming: Patterns for Efficient Computation. Elsevier. pp. 4–5.
^^a ^b ^c ^d ^e ^fCormen, Thomas H.;Leiserson, Charles E.;Rivest, Ronald L.;Stein, Clifford (2009) [1990].Introduction to Algorithms (3rd ed.). MIT Press and McGraw-Hill. pp. 779–784.ISBN 0-262-03384-4.
^Kurgalin, Sergei; Borzunov, Sergei (2020).The discrete math workbook: a companion manual using Python. Texts in Computer Science (2nd ed.). Cham, Switzerland: Springer Naturel.ISBN 978-3-030-42220-2.
^Gustafson, John L. (2011). "Brent's Theorem".Encyclopedia of Parallel Computing. pp. 182–185.doi:10.1007/978-0-387-09766-4_80.ISBN 978-0-387-09765-7.

v t e Parallel computing
General	Distributed computing Parallel computing Parallel algorithm Massively parallel Cloud computing High-performance computing Multiprocessing Manycore processor GPGPU Computer network Systolic array
Levels	Bit Instruction Thread Task Data Memory Loop Pipeline
Multithreading	Temporal Simultaneous (SMT) Simultaneous and heterogenous Speculative (SpMT) Preemptive Cooperative Clustered multi-thread (CMT) Hardware scout
Theory	PRAM model PEM model Analysis of parallel algorithms Amdahl's law Gustafson's law Cost efficiency Karp–Flatt metric Slowdown Speedup
Elements	Process Thread Fiber Instruction window Array
Coordination	Multiprocessing Memory coherence Cache coherence Cache invalidation Barrier Synchronization Application checkpointing
Programming	Stream processing Dataflow programming Models Implicit parallelism Explicit parallelism Concurrency Non-blocking algorithm
Hardware	Flynn's taxonomy SISD SIMD Array processing (SIMT) Pipelined processing Associative processing MISD MIMD Dataflow architecture Pipelined processor Superscalar processor Vector processor Multiprocessor symmetric asymmetric Memory shared distributed distributed shared UMA NUMA COMA Massively parallel computer Computer cluster Beowulf cluster Grid computer Hardware acceleration
APIs	Ateji PX Boost Chapel HPX Charm++ Cilk Coarray Fortran CUDA Dryad C++ AMP Global Arrays GPUOpen MPI OpenMP OpenCL OpenHMPP OpenACC Parallel Extensions PVM pthreads RaftLib ROCm UPC TBB ZPL
Problems	Automatic parallelization Deadlock Deterministic algorithm Embarrassingly parallel Parallel slowdown Race condition Software lockout Scalability Starvation
Category: Parallel computing

Retrieved from "https://en.wikipedia.org/w/index.php?title=Analysis_of_parallel_algorithms&oldid=1272162709"

Category:

Analysis of parallel algorithms

Hidden categories:

[8]ページ先頭