Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Systolic array

From Wikipedia, the free encyclopedia
A type of parallel computing architecture of tightly coupled nodes

Inparallelcomputer architectures, asystolic array is a homogeneousnetwork of tightly coupleddata processing units (DPUs) called cells ornodes. Each node or DPU independently computes a partial result as a function of the data received from its upstream neighbours, stores the result within itself and passes it downstream. Systolic arrays were first used inColossus, which was an early computer used to break GermanLorenz ciphers duringWorld War II.[1] Due to the classified nature of Colossus, they were independently invented or rediscovered byH. T. Kung andCharles Leiserson who described arrays for many dense linear algebra computations (matrix product, solving systems oflinear equations,LU decomposition, etc.) for banded matrices. Early applications include computinggreatest common divisors of integers and polynomials.[2] They are sometimes classified asmultiple-instruction single-data (MISD) architectures underFlynn's taxonomy, but this classification is questionable because a strong argument can be made to distinguish systolic arrays from any of Flynn's four categories:SISD,SIMD,MISD,MIMD, as discussed later in this article.

The parallel inputdata flows through a network of hard-wiredprocessor nodes, which combine, process,merge orsort the input data into a derived result. Because thewave-like propagation of data through a systolic array resembles thepulse of the human circulatory system, the namesystolic was coined from medical terminology. The name is derived fromsystole as an analogy to the regular pumping of blood by the heart.

Applications

[edit]

Systolic arrays are often hard-wired for specific operations, such as "multiply and accumulate", to perform massivelyparallel integration,convolution,correlation,matrix multiplication or data sorting tasks. They are also used fordynamic programming algorithms, used in DNA and proteinsequence analysis.

Architecture

[edit]

A systolic array typically consists of a largemonolithicnetwork of primitive computingnodes which can be hardwired or software configured for a specific application. The nodes are usually fixed and identical, while the interconnect is programmable. The more generalwavefront processors, by contrast, employ sophisticated and individually programmable nodes which may or may not be monolithic, depending on the array size and design parameters. The other distinction is that systolic arrays rely onsynchronous data transfers, whilewavefront tend to workasynchronously.

Unlike the more commonVon Neumann architecture, where program execution follows a script of instructions stored in common memory,addressed and sequenced under the control of theCPU'sprogram counter (PC), the individual nodes within a systolic array are triggered by the arrival of new data and always process the data in exactly the same way. The actual processing within each node may be hard wired or blockmicro coded, in which case the common node personality can be block programmable.

The systolic array paradigm with data-streams driven by datacounters, is the counterpart of the Von Neumann architecture with instruction-stream driven by a program counter. Because a systolic array usually sends and receives multiple data streams, and multiple data counters are needed to generate these data streams, it supportsdata parallelism.

Goals and benefits

[edit]

A major benefit of systolic arrays is that all operand data and partial results are stored within (passing through) the processor array. There is no need to access external buses, main memory or internal caches during each operation as is the case with Von Neumann orHarvard sequential machines. The sequential limits onparallel performance dictated byAmdahl's Law also do not apply in the same way, because data dependencies are implicitly handled by the programmablenode interconnect and there are no sequential steps in managing the highly parallel data flow.

Systolic arrays are therefore extremely good at artificial intelligence, image processing, pattern recognition, computer vision and other tasks that animal brains do particularly well. Wavefront processors in general can also be very good at machine learning by implementing self configuring neural nets in hardware.

Classification controversy

[edit]

While systolic arrays are officially classified asMISD, their classification is somewhat problematic. Because the input is typically a vectorof independent values, the systolic array is definitely notSISD. Since theseinput values are merged and combined into the result(s) and do not maintain theirindependence as they would in aSIMD vector processing unit, thearray cannot be classified as such. Consequently, the array cannot be classified as aMIMD either, because MIMD can be viewed as a mere collection of smaller SISD andSIMD machines.

Finally, because the dataswarm is transformed as it passes through the array fromnode to node, the multiple nodes are not operating on the same data, which makes the MISD classification amisnomer. The other reason why a systolic array should not qualify as aMISD is the same as the one which disqualifies it from the SISD category: The input data is typically a vector not asingledata value, although one could argue that any given input vector is a single item of data.

In spite of all of the above, systolic arrays are often offered as a classic example of MISD architecture in textbooks onparallel computing and in engineering classes. If the array is viewed from the outside asatomic it should perhaps be classified asSFMuDMeR = single function, multiple data, merged result(s).

Systolic arrays use a pre-defined computational flow graph that connects their nodes.Kahn process networks use a similar flow graph, but are distinguished by the nodes working in lock-step in the systolic array: in a Kahn network, there are FIFO queuesbetween each node.

Detailed description

[edit]

A systolic array is composed of matrix-like rows ofdata processing units called cells. Data processing units (DPUs) are similar tocentral processing units (CPUs), (except for the usual lack of aprogram counter,[3] since operation istransport-triggered, i.e., by the arrival of a data object). Each cell shares the information with its neighbors immediately after processing. The systolic array is often rectangular where data flows across the array between neighbourDPUs, often with different data flowing in different directions. The data streams entering and leaving the ports of the array are generated by auto-sequencing memory units, ASMs. Each ASM includes a datacounter. Inembedded systems a data stream may also be input from and/or output to an external source.

An example of a systolicalgorithm might be designed formatrix multiplication. Onematrix is fed in a row at a time from the top of the array and is passed down the array, the other matrix is fed in a column at a time from the left hand side of the array and passes from left to right. Dummy values are then passed in until each processor has seen one whole row and one whole column. At this point, the result of the multiplication is stored in the array and can now be output a row or a column at a time, flowing down or across the array.[4]

Systolic arrays are arrays ofDPUs which are connected to a small number of nearest neighbour DPUs in a mesh-like topology. DPUs perform a sequence of operations on data that flows between them. Because the traditional systolic array synthesis methods have been practiced by algebraic algorithms, only uniform arrays with only linear pipes can be obtained, so that the architectures are the same in all DPUs. The consequence is, that only applications with regular data dependencies can be implemented on classical systolic arrays. LikeSIMD machines, clocked systolic arrays compute in "lock-step" with each processor undertaking alternate compute | communicate phases. But systolic arrays with asynchronous handshake between DPUs are calledwavefront arrays.One well-known systolic array is Carnegie Mellon University'siWarp processor, which has been manufactured by Intel. An iWarp system has a linear array processor connected by data buses going in both directions.

History

[edit]

Systolic arrays (also known aswavefront processors), were first described byH. T. Kung andCharles E. Leiserson, who published the first paper describing systolic arrays in 1979. However, the first machine known to have used a similar technique was theColossus Mark II in 1944.

Examples

[edit]

Polynomial evaluation

[edit]

Horner's rule for evaluating a polynomial is:

y=((((anx+an1)x+an2)x+an3)x++a1)x+a0.{\displaystyle y=(\ldots (((a_{n}\cdot x+a_{n-1})\cdot x+a_{n-2})\cdot x+a_{n-3})\cdot x+\ldots +a_{1})\cdot x+a_{0}.}

A linear systolic array in which the processors are arranged in pairs: one multiplies its input byx{\displaystyle x} and passes the result to the right, the next addsaj{\displaystyle a_{j}} and passes the result to the right.

Convolution

[edit]

Consider a chain of processing elements (PEs), each performing amultiply-accumulate operation. It processes input data (xi{\displaystyle x_{i}}) and weights (wi{\displaystyle w_{i}}) systolically, meaning data flows through the array in a regular, rhythmic manner. The weights remain stationary within each PE, while the input data and partial sums (yi{\displaystyle y_{i}}) move in opposite directions.

Each PE performs the following operation:yout=yin+wxinxout=xin{\displaystyle {\begin{aligned}y_{out}&=y_{in}+w\cdot x_{in}\\x_{out}&=x_{in}\end{aligned}}}where:

From the left, the input stream is,x3,0,x2,0,x1{\displaystyle \dots ,x_{3},0,x_{2},0,x_{1}}, and from the right, the output stream isy1,y2,y3,{\displaystyle y_{1},y_{2},y_{3},\dots }. Ify1,x1{\displaystyle y_{1},x_{1}} enter the rightmost PE simultaneously, then the leftmost PE outputsy1=w1x1+w2x2+w3x3+y2=w1x2+w2x3+w3x4+{\displaystyle {\begin{aligned}y_{1}&=w_{1}x_{1}+w_{2}x_{2}+w_{3}x_{3}+\cdots \\y_{2}&=w_{1}x_{2}+w_{2}x_{3}+w_{3}x_{4}+\cdots \\&\vdots \end{aligned}}}This is the 1-dimensional convolution. Similarly, n-dimensional convolution can be computed by an n-dimensional array of PEs.

Many other implementations of the 1D convolutions are available, with different data flows.[5]

See[5] Figure 12 for an algorithm that performs on-the-flyleast-squares using one- and two-dimensional systolic arrays.

Implementations

[edit]
  • Cisco PXF network processor is internally organized as systolic array.[6]
  • Google’sTPU is also designed around a systolic array.
  • Paracel FDF4T TestFinder text search system[7]
  • Paracel FDF4G GeneMatcher Biological (DNA and Protein) search system
  • Inferentia chip atAmazon Web Services[8]
  • MIT Eyeriss is a systolic array accelerator for convolutional neural networks.[9][10]

See also

[edit]

Notes

[edit]
  1. ^Colossus - The Greatest Secret in the History of Computing onYouTube
  2. ^Brent, Richard P.; Kung, H.T. (August 1984)."Systolic VLSI Arrays for Polynomial GCD Computation"(PDF).www.eecs.harvard.edu.
  3. ^The Paracel GeneMatcher series of systolic array processors do have aprogram counter. More complicated algorithms are implemented as a series of simple steps, with shifts specified in the instructions.
  4. ^"Systolic Array Matrix Multiplication"(PDF).
  5. ^abKung (January 1982)."Why systolic architectures?".Computer.15 (1):37–46.doi:10.1109/MC.1982.1653825.ISSN 0018-9162.
  6. ^"Cisco 10000 Series Router Performance Routing Engine Installation". Retrieved3 August 2020.
  7. ^"About Paracel".brandprosgroup.com. Paracel. Retrieved4 May 2018.
  8. ^"Announcing availability of Inf1 instances in Amazon SageMaker for high performance and cost-effective machine learning inference". 14 August 2020. Retrieved15 August 2020.
  9. ^"Eyeriss Project".eyeriss.mit.edu. Retrieved21 February 2021.
  10. ^Chen, Yu-Hsin; Emer, Joel;Sze, Vivienne (12 October 2016)."Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks".ACM SIGARCH Computer Architecture News.44 (3):367–379.doi:10.1145/3007787.3001177.ISSN 0163-5964.S2CID 3291270.

References

[edit]
This article includes a list ofgeneral references, butit lacks sufficient correspondinginline citations. Please help toimprove this article byintroducing more precise citations.(April 2011) (Learn how and when to remove this message)
  • H. T. Kung, C. E. Leiserson: Algorithms for VLSI processor arrays; in: C. Mead, L. Conway (eds.): Introduction to VLSI Systems; Addison-Wesley, 1979
  • S. Y. Kung: VLSI Array Processors; Prentice-Hall, Inc., 1988
  • N. Petkov: Systolic Parallel Processing; North Holland Publishing Co, 1992

External links

[edit]
Retrieved from "https://en.wikipedia.org/w/index.php?title=Systolic_array&oldid=1262193811"
Categories:
Hidden categories:

[8]ページ先頭

©2009-2025 Movatter.jp