Movatterモバイル変換

[0]ホーム

Jump to content

General-purpose computing on graphics processing units

Edit links

From Wikipedia, the free encyclopedia

(Redirected fromGeneral-purpose computing on graphics processing units (software))

Use of a GPU for computations typically assigned to CPUs

This articleneeds additional citations forverification. Please helpimprove this article byadding citations to reliable sources. Unsourced material may be challenged and removed.
Find sources: "General-purpose computing on graphics processing units" – news ·newspapers ·books ·scholar ·JSTOR(February 2022) (Learn how and when to remove this message)

General-purpose computing on graphics processing units (GPGPU, or less oftenGPGP) is the use of agraphics processing unit (GPU), which typically handles computation only forcomputer graphics, to perform computation in applications traditionally handled by thecentral processing unit (CPU).^[1]^[2]^[3]^[4] The use of multiplevideo cards in one computer, or large numbers of graphics chips, further parallelizes the already parallel nature of graphics processing.^[5]

Essentially, a GPGPUpipeline is a kind ofparallel processing between one or more GPUs and CPUs, with special accelerated instructions for processing image or other graphic forms of data. While GPUs operate at lower frequencies, they typically have many times the number ofProcessing elements. Thus, GPUs can process far more pictures and other graphical data per second than a traditional CPU. Migrating data into parallel form and then using the GPU to process it can (theoretically) create a largespeedup.

GPGPU pipelines were developed at the beginning of the 21st century forgraphics processing (e.g. for bettershaders). From thehistory of supercomputing it is well-known thatscientific computing drives the largest concentrations of Computing power in history, listed in theTOP500: the majority today utilizeGPUs.

The best-known GPGPUs areNvidia Tesla that are used forNvidia DGX, alongsideAMD Instinct and Intel Gaudi.

Application	Description	Supported features	Expected speed-up†	GPU‡	Multi-GPU support	Release status
BarraCUDA	DNA, including epigenetics, sequence mapping software^[97]	Alignment of short sequencing reads	6–10x	T 2075, 2090, K10, K20, K20X	Yes	Available now, version 0.7.107f
CUDASW++	Open source software for Smith-Waterman protein database searches on GPUs	Parallel search of Smith-Waterman database	10–50x	T 2075, 2090, K10, K20, K20X	Yes	Available now, version 2.0.8
CUSHAW	Parallelized short read aligner	Parallel, accurate long read aligner – gapped alignments to large genomes	10x	T 2075, 2090, K10, K20, K20X	Yes	Available now, version 1.0.40
GPU-BLAST	Local search with fastk-tuple heuristic	Protein alignment according to blastp, multi CPU threads	3–4x	T 2075, 2090, K10, K20, K20X	Single only	Available now, version 2.2.26
GPU-HMMER	Parallelized local and global search with profile hidden Markov models	Parallel local and global search of hidden Markov models	60–100x	T 2075, 2090, K10, K20, K20X	Yes	Available now, version 2.3.2
mCUDA-MEME	Ultrafast scalable motif discovery algorithm based on MEME	Scalable motif discovery algorithm based on MEME	4–10x	T 2075, 2090, K10, K20, K20X	Yes	Available now, version 3.0.12
SeqNFind	A GPU accelerated sequence analysis toolset	Reference assembly, blast, Smith–Waterman, hmm, de novo assembly	400x	T 2075, 2090, K10, K20, K20X	Yes	Available now
UGENE	Opensource Smith–Waterman for SSE/CUDA, suffix array based repeats finder and dotplot	Fast short read alignment	6–8x	T 2075, 2090, K10, K20, K20X	Yes	Available now, version 1.11
WideLM	Fits numerous linear models to a fixed design and response	Parallel linear regression on multiple similarly-shaped models	150x	T 2075, 2090, K10, K20, K20X	Yes	Available now, version 0.1-1

Application	Description	Supported features	Expected speed-up†	GPU‡	Multi-GPU support	Release status
Abalone	Models molecular dynamics of biopolymers for simulations of proteins, DNA and ligands	Explicit and implicit solvent,hybrid Monte Carlo	4–120x	T 2075, 2090, K10, K20, K20X	Single only	Available now, version 1.8.88
ACEMD	GPU simulation of molecular mechanics force fields, implicit and explicit solvent	Written for use on GPUs	160 ns/day GPU version only	T 2075, 2090, K10, K20, K20X	Yes	Available now
AMBER	Suite of programs to simulate molecular dynamics on biomolecule	PMEMD: explicit and implicit solvent	89.44 ns/day JAC NVE	T 2075, 2090, K10, K20, K20X	Yes	Available now, version 12 + bugfix9
DL-POLY	Simulate macromolecules, polymers, ionic systems, etc. on a distributed memory parallel computer	Two-body forces, link-cell pairs, Ewald SPME forces, Shake VV	4x	T 2075, 2090, K10, K20, K20X	Yes	Available now, version 4.0 source only
CHARMM	MD package to simulate molecular dynamics on biomolecule.	Implicit (5x), explicit (2x) solvent via OpenMM	TBD	T 2075, 2090, K10, K20, K20X	Yes	In development Q4/12
GROMACS	Simulate biochemical molecules with complex bond interactions	Implicit (5x), explicit (2x) solvent	165 ns/Day DHFR	T 2075, 2090, K10, K20, K20X	Single only	Available now, version 4.6 in Q4/12
HOOMD-Blue	Particle dynamics package written grounds up for GPUs	Written for GPUs	2x	T 2075, 2090, K10, K20, K20X	Yes	Available now
LAMMPS	Classical molecular dynamics package	Lennard-Jones, Morse, Buckingham, CHARMM, tabulated, course grain SDK, anisotropic Gay-Bern, RE-squared, "hybrid" combinations	3–18x	T 2075, 2090, K10, K20, K20X	Yes	Available now
NAMD	Designed for high-performance simulation of large molecular systems	100M atom capable	6.44 ns/days STMV 585x 2050s	T 2075, 2090, K10, K20, K20X	Yes	Available now, version 2.9
OpenMM	Library and application for molecular dynamics for HPC with GPUs	Implicit and explicit solvent, custom forces	Implicit: 127–213 ns/day; Explicit: 18–55 ns/day DHFR	T 2075, 2090, K10, K20, K20X	Yes	Available now, version 4.1.1

v t e Parallel computing
General	Distributed computing Parallel computing Parallel algorithm Massively parallel Cloud computing High-performance computing Multiprocessing Manycore processor GPGPU Computer network Systolic array
Levels	Bit Instruction Thread Task Data Memory Loop Pipeline
Multithreading	Temporal Simultaneous (SMT) Simultaneous and heterogenous Speculative (SpMT) Preemptive Cooperative Clustered multi-thread (CMT) Hardware scout
Theory	PRAM model PEM model Analysis of parallel algorithms Amdahl's law Gustafson's law Cost efficiency Karp–Flatt metric Slowdown Speedup
Elements	Process Thread Fiber Instruction window Array
Coordination	Multiprocessing Memory coherence Cache coherence Cache invalidation Barrier Synchronization Application checkpointing
Programming	Stream processing Dataflow programming Models Implicit parallelism Explicit parallelism Concurrency Non-blocking algorithm
Hardware	Flynn's taxonomy SISD SIMD Array processing (SIMT) Pipelined processing Associative processing MISD MIMD Dataflow architecture Pipelined processor Superscalar processor Vector processor Multiprocessor symmetric asymmetric Memory shared distributed distributed shared UMA NUMA COMA Massively parallel computer Computer cluster Beowulf cluster Grid computer Hardware acceleration
APIs	Ateji PX Boost Chapel HPX Charm++ Cilk Coarray Fortran CUDA Dryad C++ AMP Global Arrays GPUOpen MPI OpenMP OpenCL OpenHMPP OpenACC Parallel Extensions PVM pthreads RaftLib ROCm UPC TBB ZPL
Problems	Automatic parallelization Deadlock Deterministic algorithm Embarrassingly parallel Parallel slowdown Race condition Software lockout Scalability Starvation
Category: Parallel computing

Movatterモバイル変換

History

Implementations

Software libraries and APIs

Mobile computers

GPU vs. CPU

Caches

Register file

Execution resources

Energy efficiency

Classical GPGPU

Linear algebra

Hardware support

Integer numbers

Floating-point numbers

Vectorization

Stream processing

GPU programming concepts

Computational resources

Textures as stream

Kernels

Flow control

GPU methods

Map

Reduce

Stream filtering

Scan

Scatter

Gather

Sort

Search

Data structures

Applications

Bioinformatics

Molecular dynamics

See also

References

Further reading