Movatterモバイル変換

[0]ホーム

Jump to content

Dataflow architecture

Edit links

From Wikipedia, the free encyclopedia

Type of low-level computer architecture

This articleneeds additional citations forverification. Please helpimprove this article byadding citations to reliable sources. Unsourced material may be challenged and removed.
Find sources: "Dataflow architecture" – news ·newspapers ·books ·scholar ·JSTOR(August 2012) (Learn how and when to remove this message)

Dataflow architecture is adataflow-basedcomputer architecture that directly contrasts the traditionalvon Neumann architecture orcontrol flow architecture. Dataflow architectures have noprogram counter, in concept: the executability and execution of instructions is solely determined based on the availability of input arguments to the instructions,^[1] so that the order of instruction execution may be hard to predict.

Although no commercially successful general-purpose computer hardware has used a dataflow architecture, it has been successfully implemented in specialized hardware such as indigital signal processing,network routing,graphics processing,telemetry, and more recently in data warehousing, andartificial intelligence (as: polymorphic dataflow^[2] Convolution Engine,^[3] structure-driven,^[4] dataflowscheduling^[5]). It is also very relevant in many software architectures today includingdatabase engine designs andparallel computing frameworks.^{[citation needed]}

Synchronous dataflow architectures tune to match the workload presented by real-time data path applications such as wire speed packet forwarding. Dataflow architectures that are deterministic in nature enable programmers to manage complex tasks such as processorload balancing, synchronization and accesses to common resources.^[6]

Meanwhile, there is a clash of terminology, since the termdataflow is used for a subarea of parallel programming: fordataflow programming.

History

[edit]

Hardware architectures for dataflow was a major topic incomputer architecture research in the 1970s and early 1980s.Jack Dennis ofMIT pioneered the field of static dataflow architectures while the Manchester Dataflow Machine^[7] and MIT Tagged Token architecture were major projects in dynamic dataflow.

The research, however, never overcame the problems related to:

Efficiently broadcasting data tokens in a massively parallel system.
Efficiently dispatching instruction tokens in a massively parallel system.
Buildingcontent-addressable memory (CAM) large enough to hold all of the dependencies of a real program.

Instructions and their data dependencies proved to be too fine-grained to be effectively distributed in a large network. That is, the time for the instructions and tagged results to travel through a large connection network was longer than the time to do many computations.

Maurice Wilkes wrote in 1995 that "Data flow stands apart as being the most radical of all approaches to parallelism and the one that has been least successful. ... If any practical machine based on data flow ideas and offering real power ever emerges, it will be very different from what the originators of the concept had in mind."^[8]

Out-of-order execution (OOE) has become the dominant computing paradigm since the 1990s. It is a form of restricted dataflow. This paradigm introduced the idea of anexecution window. Theexecution window follows the sequential order of the von Neumann architecture, however within the window, instructions are allowed to be completed in data dependency order. This is accomplished in CPUs that dynamically tag the data dependencies of the code in the execution window. The logical complexity of dynamically keeping track of the data dependencies, restrictsOOE CPUs to a small number of execution units (2-6) and limits the execution window sizes to the range of 32 to 200 instructions, much smaller than envisioned for full dataflow machines.^{[citation needed]}

Dataflow architecture topics

[edit]

Static and dynamic dataflow machines

[edit]

Designs that use conventional memory addresses as data dependency tags are called static dataflow machines. These machines did not allow multiple instances of the same routines to be executed simultaneously because the simple tags could not differentiate between them.

Designs that usecontent-addressable memory (CAM) are called dynamic dataflow machines. They use tags in memory to facilitate parallelism.

Compiler

[edit]

Normally, in the control flow architecture,compilers analyze programsource code for data dependencies between instructions in order to better organize the instruction sequences in the binary output files. The instructions are organized sequentially but the dependency information itself is not recorded in the binaries. Binaries compiled for a dataflow machine contain this dependency information.

A dataflow compiler records these dependencies by creating unique tags for each dependency instead of using variable names. By giving each dependency a unique tag, it allows the non-dependent code segments in the binary to be executedout of order and in parallel. Compiler detects the loops, break statements and various programming control syntax for data flow.

Programs

[edit]

Programs are loaded into the CAM of a dynamic dataflow computer. When all of the tagged operands of an instruction become available (that is, output from previous instructions and/or user input), the instruction is marked as ready for execution by anexecution unit.

This is known asactivating orfiring the instruction. Once an instruction is completed by an execution unit, its output data is sent (with its tag) to the CAM. Any instructions that are dependent upon this particular datum (identified by its tag value) are then marked as ready for execution. In this way, subsequent instructions are executed in proper order, avoidingrace conditions. This order may differ from the sequential order envisioned by the human programmer, the programmed order.

Instructions

[edit]

An instruction, along with its required data operands, is transmitted to an execution unit as a packet, also called aninstruction token. Similarly, output data is transmitted back to the CAM as adata token. The packetization of instructions and results allows for parallel execution of ready instructions on a large scale.

Dataflow networks deliver the instruction tokens to the execution units and return the data tokens to the CAM. In contrast to the conventionalvon Neumann architecture, data tokens are not permanently stored in memory, rather they are transient messages that only exist when in transit to the instruction storage.

Historically

[edit]

In contrast to the above, analog differential analyzers were based purely on hardware in the form of dataflow architecture, with the property that the programming and computations aren't performed by any set of instructions at all and that there usually weren't any memory based decisions made in such programs. The programming is solely based on the configuration by the physical interconnection of specialized computing elements, which basically creates a form of a passive dataflow architecture.

In July 2025, the startup Efficient Computer was reported to have built a dataflow chip called Electron E1.^[9]

References

[edit]

^Veen, Arthur H. (December 1986)."Dataflow Machine Architecture".ACM Computing Surveys.18 (4):365–396.doi:10.1145/27633.28055.S2CID 5467025. Retrieved5 March 2019.
^Maxfield, Max (24 December 2020). "Say Hello to Deep Vision's Polymorphic Dataflow Architecture".Electronic Engineering Journal. Techfocus media.
^"Kinara (formerly Deep Vision)".Kinara. 2022. Retrieved2022-12-11.
^"Hailo".Hailo. Retrieved2022-12-11.
^S. Lie, "Cerebras Architecture Deep Dive: First Look Inside the HW/SW Co-Design for Deep Learning : Cerebras Systems,"2022 IEEE Hot Chips 34 Symposium (HCS), Cupertino, CA, USA, 2022, pp. 1-34, doi: 10.1109/HCS55958.2022.9895479.https://ieeexplore.ieee.org/document/9895479
^"HX300 Family of NPUs and Programmable Ethernet Switches to the Fiber Access Market".EN-Genius (Press release). June 18, 2008. Archived fromthe original on 2011-07-22.
^Manchester Dataflow Research Project, Research Reports: Abstracts, September 1997
^M. V. Wilkes,Computing Perspectives, Morgan Kaufmann, 1995, ISBN 1-55860-317-4, page 79.
^Cutress, Dr Ian (2025-07-24)."Efficient Computer's Electron E1 CPU".More Than Moore. Retrieved2025-08-05.

Processor technologies

Models

Architecture

Instruction set
architectures

Types	Orthogonal instruction set CISC RISC Application-specific EDGE TRIPS VLIW EPIC MISC OISC NISC ZISC VISC architecture Quantum computing Comparison Addressing modes
Instruction sets	Motorola 68000 series VAX PDP-11 x86 ARM Stanford MIPS MIPS MIPS-X Power POWER PowerPC Power ISA Clipper architecture SPARC SuperH DEC Alpha ETRAX CRIS M32R Unicore Itanium OpenRISC RISC-V MicroBlaze LMC System/3x0 S/360 S/370 S/390 z/Architecture Tilera ISA VISC architecture Epiphany architecture Others

Execution

Instruction pipelining	Pipeline stall Operand forwarding Classic RISC pipeline
Hazards	Data dependency Structural Control False sharing
Out-of-order	Scoreboarding Tomasulo's algorithm Reservation station Re-order buffer Register renaming Wide-issue
Speculative	Branch prediction Memory dependence prediction

Parallelism

Level	Bit Bit-serial Word Instruction Pipelining Scalar Superscalar Task Thread Process Data Vector Memory Distributed
Multithreading	Temporal Simultaneous Hyperthreading Simultaneous and heterogenous Speculative Preemptive Cooperative
Flynn's taxonomy	SISD SIMD Array processing (SIMT) Pipelined processing Associative processing SWAR MISD MIMD SPMD

Processor
performance

Transistor count
Instructions per cycle (IPC)
- Cycles per instruction (CPI)
Instructions per second (IPS)
Floating-point operations per second (FLOPS)
Transactions per second (TPS)
Synaptic updates per second (SUPS)
Performance per watt (PPW)
Cache performance metrics
Computer performance by orders of magnitude

Types

By application	Embedded system Microprocessor Microcontroller Mobile Ultra-low-voltage ASIP Soft microprocessor
Systems on chip	System on a chip (SoC) Multiprocessor (MPSoC) Cypress PSoC Network on a chip (NoC)
Hardware accelerators	Coprocessor AI accelerator Graphics processing unit (GPU) Image processor Vision processing unit (VPU) Physics processing unit (PPU) Digital signal processor (DSP) Tensor Processing Unit (TPU) Secure cryptoprocessor Network processor Baseband processor

Word size

Core count

Components

Functional units	Arithmetic logic unit (ALU) Address generation unit (AGU) Floating-point unit (FPU) Memory management unit (MMU) Load–store unit Translation lookaside buffer (TLB) Branch predictor Branch target predictor Integrated memory controller (IMC) Memory management unit Instruction decoder
Logic	Combinational Sequential Glue Logic gate Quantum Array
Registers	Processor register Status register Stack register Register file Memory buffer Memory address register Program counter
Control unit	Hardwired control unit Instruction unit Data buffer Write buffer Microcode ROM Counter
Datapath	Multiplexer Demultiplexer Adder Multiplier CPU Binary decoder Address decoder Sum-addressed decoder Barrel shifter
Circuitry	Integrated circuit 3D Mixed-signal Power management Boolean Digital Analog Quantum Switch

Power
management

v t e Hardware acceleration
Theory	Universal Turing machine Parallel computing Distributed computing
Applications	GPU GPGPU software DirectX Audio Digital signal processing Hardware random number generation Neural processing unit Cryptography TLS Machine vision Custom hardware attack scrypt Networking Data
Implementations	High-level synthesis C to HDL FPGA ASIC CPLD System on a chip Network on a chip
Architectures	Dataflow Transport triggered Multicore Manycore Heterogeneous In-memory computing Systolic array Neuromorphic
Related	Programmable logic Processor design chronology Digital electronics Virtualization Hardware emulation Logic synthesis Embedded systems