Movatterモバイル変換

[0]ホーム

Jump to content

Simultaneous and heterogeneous multithreading

Edit links

From Wikipedia, the free encyclopedia

Software framework for heterogeneous computing systems

This articlemay rely excessively on sourcestoo closely associated with the subject, potentially preventing the article from beingverifiable andneutral. Please helpimprove it by replacing them with more appropriatecitations toreliable, independent sources.(February 2024) (Learn how and when to remove this message)

Simultaneous and heterogeneous multithreading (SHMT) is asoftware framework that takes advantage ofheterogeneous computing systems that contain a mixture ofcentral processing units (CPUs),graphics processing units (GPUs), and special purposemachine learning hardware, for exampleTensor Processing Units (TPUs).^[1]^[2]

Each component processes information differently. Often data has to move among processors, which can create bottlenecks, with one processor starving while waiting on another to finish.^[1]

Architecture

[edit]

The system definesvirtual processors and virtual operations (VOPs). VOPs decompose into one or more high-level operations (HLOPs). It then distributes the operations across the processors. The runtime system then dynamically maps virtual processors to physical processors, assessing resource availability in order to keep all the processors busy. The scheduler employs a light-weight, quality-aware work-stealing (QAWS) policy.^[1]

Conventional runtimes use assign one processor (set) to each subtask, leaving other types of processors idle. In other words, the CPU(s) run (possibly in parallel), then when that subtask completes, the next subtask is handed to the GPU(s). When they finish the next subtask is handed to the TPU(s).^[2]

Adding software pipelining allows the second subtask to run using partial results from the first subtask, which improves resource utilization.^[2]

SHMT takes things a step further, identifying subtasks that can run independently of others to the appropriate processor type, allow even better parallelism. Some subtasks can be performed on multiple processor types. SHMT can divide a single subtask across such processor types. Thus the fundamental breakthrough is to keep more processors working simultaneously, reducing time and energy costs.^[2]

Benchmark

[edit]

Researchers tested the concept using a typical smartphone configuration tweaked so that it resembled a data center server.^[1]

The hardware wasNvidia's Jetson Nano module containing a quad-coreARM Cortex-A57 processor (CPU) and 128Maxwell architecture GPU cores. A GoogleEdge TPU was connected via itsM.2 Key E slot. The processors communicated via an onboardPCI Express (PCIe) interface. Shared data was hosted in a 4 GB 64-bitLPDDR4. The Edge TPU adds an 8 MB device memory.Ubuntu Linux 18.04 was the operating system.^[1]

Compared to a conventional system performance increased by 1.95X boost, while energy consumption was reduced by 51%, on a range of benchmarks, includingBlack–Scholes, DCT8X8, DWT,FFT, Histogram, Hotspot,Laplacian, MF, Sobel, SRAD, and GMEAN.^[1]

References

[edit]

^^a ^b ^c ^d ^e ^fMcClure, Paul (February 22, 2024)."Software tweak doubles computer processing speed, halves energy use".New Atlas. Retrieved2024-02-25.
^^a ^b ^c ^dHsu, Kuan-Chieh; Tseng, Hung-Wei (2023-12-08). "Simultaneous and Heterogenous Multithreading".56th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO '23. New York, NY, USA: Association for Computing Machinery. pp. 137–152.doi:10.1145/3613424.3614285.ISBN 979-8-4007-0329-4.

v t e Parallel computing
General	Distributed computing Parallel computing Parallel algorithm Massively parallel Cloud computing High-performance computing Multiprocessing Manycore processor GPGPU Computer network Systolic array
Levels	Bit Instruction Thread Task Data Memory Loop Pipeline
Multithreading	Temporal Simultaneous (SMT) Simultaneous and heterogenous Speculative (SpMT) Preemptive Cooperative Clustered multi-thread (CMT) Hardware scout
Theory	PRAM model PEM model Analysis of parallel algorithms Amdahl's law Gustafson's law Cost efficiency Karp–Flatt metric Slowdown Speedup
Elements	Process Thread Fiber Instruction window Array
Coordination	Multiprocessing Memory coherence Cache coherence Cache invalidation Barrier Synchronization Application checkpointing
Programming	Stream processing Dataflow programming Models Implicit parallelism Explicit parallelism Concurrency Non-blocking algorithm
Hardware	Flynn's taxonomy SISD SIMD Array processing (SIMT) Pipelined processing Associative processing MISD MIMD Dataflow architecture Pipelined processor Superscalar processor Vector processor Multiprocessor symmetric asymmetric Memory shared distributed distributed shared UMA NUMA COMA Massively parallel computer Computer cluster Beowulf cluster Grid computer Hardware acceleration
APIs	Ateji PX Boost Chapel HPX Charm++ Cilk Coarray Fortran CUDA Dryad C++ AMP Global Arrays GPUOpen MPI OpenMP OpenCL OpenHMPP OpenACC Parallel Extensions PVM pthreads RaftLib ROCm UPC TBB ZPL
Problems	Automatic parallelization Deadlock Deterministic algorithm Embarrassingly parallel Parallel slowdown Race condition Software lockout Scalability Starvation
Category: Parallel computing

Retrieved from "https://en.wikipedia.org/w/index.php?title=Simultaneous_and_heterogeneous_multithreading&oldid=1239906888"

Categories:

Hidden categories:

[8]ページ先頭

Movatterモバイル変換

Architecture

Benchmark

See also

References