Movatterモバイル変換


[0]ホーム

URL:


US20150268963A1 - Execution of data-parallel programs on coarse-grained reconfigurable architecture hardware - Google Patents

Execution of data-parallel programs on coarse-grained reconfigurable architecture hardware
Download PDF

Info

Publication number
US20150268963A1
US20150268963A1US14/642,780US201514642780AUS2015268963A1US 20150268963 A1US20150268963 A1US 20150268963A1US 201514642780 AUS201514642780 AUS 201514642780AUS 2015268963 A1US2015268963 A1US 2015268963A1
Authority
US
United States
Prior art keywords
tokens
units
thread
compute
processing units
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US14/642,780
Inventor
Yoav Etsion
Dani Voitsechov
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Technion Research and Development Foundation Ltd
Original Assignee
Technion Research and Development Foundation Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Technion Research and Development Foundation LtdfiledCriticalTechnion Research and Development Foundation Ltd
Priority to US14/642,780priorityCriticalpatent/US20150268963A1/en
Assigned to TECHNION RESEARCH & DEVELOPMENT FOUNDATION LTD.reassignmentTECHNION RESEARCH & DEVELOPMENT FOUNDATION LTD.ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS).Assignors: ETSION, YOAV, VOITSECHOV, DANI
Publication of US20150268963A1publicationCriticalpatent/US20150268963A1/en
Priority to US15/829,924prioritypatent/US10579390B2/en
Priority to US16/752,750prioritypatent/US11003458B2/en
Abandonedlegal-statusCriticalCurrent

Links

Images

Classifications

Definitions

Landscapes

Abstract

A GPGPU-compatible architecture combines a coarse-grain reconfigurable fabric (CGRF) with a dynamic dataflow execution model to accelerate execution throughput of massively thread-parallel code. The CGRF distributes computation across a fabric of functional units. The compute operations are statically mapped to functional units, and an interconnect is configured to transfer values between functional units.

Description

Claims (38)

1. A method of computing, comprising the steps of:
providing an arrangement of processing units having interconnects therebetween;
statically mapping compute operations to respective processing units;
configuring the interconnects to transfer values between functional units based on connectivity requirements of the compute operations; and
associating instructions for execution of the compute operations with respective control tokens and thread identifiers, the instructions belonging to a plurality of threads and including a first instruction and a second instruction, wherein the thread identifier of the first instruction identifies a first thread and the thread identifier of the second instruction identifies a second thread;
pipelining the instructions through the arrangement responsively to the control tokens to the mapped processing units;
dynamically rescheduling an order of execution of the first instruction with respect to the second instruction in one of the mapped processing units; and
performing the compute operations responsively to the first instruction and the second instruction in the processing units in the rescheduled order of execution.
18. A computing apparatus, comprising:
at least one core of interconnected processing units configured to execute a plurality of threads, the core comprising compute units for executing computer instructions, load/store units, control units for executing control instructions, and special compute units for executing non-pipelined computing operations;
a switch in each of the processing units for configurably establishing connections to other processing units for transferring values from between the processing units across the connections; and
respective private memories in the processing units, the private memories comprising configuration registers that store token routing information, operands and opcodes, and further comprising reservation buffers for holding thread identifiers that are associated with in-flight data moving through the core, wherein the processing units are responsive to input tokens and control tokens, and are operative to transmit result tokens for use as the input tokens of other processing units via the connections, the processing units being operative to store state information and data pertaining to an incompletely executed compute operation while processing a new compute operation.
29. A computer program product, including a non-transitory computer-readable storage medium in which computer program instructions are stored, which instructions, when executed by a computer connected to an arrangement of processing units having interconnects therebetween cause the computer to perform the steps of;
statically mapping compute operations to respective processing units;
configuring the interconnects according to requirements of the compute operations; and
associating process instructions that belong to a plurality of threads with respective control tokens and thread identifiers, the instructions including a first instruction and a second instruction, wherein the thread identifier of the first instruction identifies a first thread and the thread identifier of the second instruction identifies a second thread;
pipelining the process instructions through the arrangement responsively to the control tokens to the mapped processing units;
dynamically rescheduling an order of execution of the first instruction with respect to the second instruction in one of the mapped processing units; and
performing the compute operations responsively to the first instruction and the second instruction in the processing units in the rescheduled order of execution.
US14/642,7802014-03-232015-03-10Execution of data-parallel programs on coarse-grained reconfigurable architecture hardwareAbandonedUS20150268963A1 (en)

Priority Applications (3)

Application NumberPriority DateFiling DateTitle
US14/642,780US20150268963A1 (en)2014-03-232015-03-10Execution of data-parallel programs on coarse-grained reconfigurable architecture hardware
US15/829,924US10579390B2 (en)2014-03-232017-12-03Execution of data-parallel programs on coarse-grained reconfigurable architecture hardware
US16/752,750US11003458B2 (en)2014-03-232020-01-27Execution of data-parallel programs on coarse-grained reconfigurable architecture hardware

Applications Claiming Priority (2)

Application NumberPriority DateFiling DateTitle
US201461969184P2014-03-232014-03-23
US14/642,780US20150268963A1 (en)2014-03-232015-03-10Execution of data-parallel programs on coarse-grained reconfigurable architecture hardware

Related Child Applications (1)

Application NumberTitlePriority DateFiling Date
US15/829,924ContinuationUS10579390B2 (en)2014-03-232017-12-03Execution of data-parallel programs on coarse-grained reconfigurable architecture hardware

Publications (1)

Publication NumberPublication Date
US20150268963A1true US20150268963A1 (en)2015-09-24

Family

ID=54142191

Family Applications (3)

Application NumberTitlePriority DateFiling Date
US14/642,780AbandonedUS20150268963A1 (en)2014-03-232015-03-10Execution of data-parallel programs on coarse-grained reconfigurable architecture hardware
US15/829,924Active2035-10-24US10579390B2 (en)2014-03-232017-12-03Execution of data-parallel programs on coarse-grained reconfigurable architecture hardware
US16/752,750ActiveUS11003458B2 (en)2014-03-232020-01-27Execution of data-parallel programs on coarse-grained reconfigurable architecture hardware

Family Applications After (2)

Application NumberTitlePriority DateFiling Date
US15/829,924Active2035-10-24US10579390B2 (en)2014-03-232017-12-03Execution of data-parallel programs on coarse-grained reconfigurable architecture hardware
US16/752,750ActiveUS11003458B2 (en)2014-03-232020-01-27Execution of data-parallel programs on coarse-grained reconfigurable architecture hardware

Country Status (1)

CountryLink
US (3)US20150268963A1 (en)

Cited By (64)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20160062445A1 (en)*2014-08-282016-03-03Samsung Electronics Co., Ltd.Apparatus and method of controlling power consumption of graphic processing unit (gpu) resources
US20160321039A1 (en)*2015-04-292016-11-03Wave Computing, Inc.Technology mapping onto code fragments
CN108255690A (en)*2018-01-162018-07-06宿州新材云计算服务有限公司 A method for measuring server performance
WO2018140140A1 (en)*2017-01-262018-08-02Wisconsin Alumni Research FoundationReconfigurable, application-specific computer accelerator
US20180225453A1 (en)*2015-11-252018-08-09Leidos Innovations Technology, Inc.Method for detecting a threat and threat detecting apparatus
US20180267809A1 (en)*2017-03-142018-09-20Yuan LiStatic Shared Memory Access for Reconfigurable Parallel Processor
US20180300139A1 (en)*2015-10-292018-10-18Intel CorporationBoosting local memory performance in processor graphics
US20180308209A1 (en)*2017-04-092018-10-25Intel CorporationCompute cluster preemption within a general-purpose graphics processing unit
US20180341526A1 (en)*2015-12-242018-11-29Intel CorporationFacilitating efficient communication and data processing across clusters of computing machines in heterogeneous computing environment
WO2019005443A1 (en)*2017-06-282019-01-03Wisconsin Alumni Research FoundationHigh-speed computer accelerator with pre-programmed functions
US20190035051A1 (en)2017-04-212019-01-31Intel CorporationHandling pipeline submissions across many compute units
CN109712064A (en)*2017-04-242019-05-03英特尔公司 Inference using a mix of low and high precision
US20190279086A1 (en)*2017-08-192019-09-12Wave Computing, Inc.Data flow graph node update for machine learning
US20190303153A1 (en)*2018-04-032019-10-03Intel CorporationApparatus, methods, and systems for unstructured data flow in a configurable spatial accelerator
US10558575B2 (en)2016-12-302020-02-11Intel CorporationProcessors, methods, and systems with a configurable spatial accelerator
US10559057B2 (en)*2018-09-272020-02-11Intel CorporationMethods and apparatus to emulate graphics processing unit instructions
US10565134B2 (en)2017-12-302020-02-18Intel CorporationApparatus, methods, and systems for multicast in a configurable spatial accelerator
US10564980B2 (en)2018-04-032020-02-18Intel CorporationApparatus, methods, and systems for conditional queues in a configurable spatial accelerator
US10572376B2 (en)2016-12-302020-02-25Intel CorporationMemory ordering in acceleration hardware
US10591983B2 (en)2014-03-142020-03-17Wisconsin Alumni Research FoundationComputer accelerator system using a trigger architecture memory access processor
CN110998521A (en)*2017-09-292020-04-10甲骨文国际公司System and method for defining thread specifications
US10659396B2 (en)2015-08-022020-05-19Wave Computing, Inc.Joining data within a reconfigurable fabric
US10678724B1 (en)2018-12-292020-06-09Intel CorporationApparatuses, methods, and systems for in-network storage in a configurable spatial accelerator
CN111352744A (en)*2018-12-212020-06-30图核有限公司Data exchange in a computer
US10817309B2 (en)2017-08-032020-10-27Next Silicon LtdRuntime optimization of configurable hardware
US10817344B2 (en)2017-09-132020-10-27Next Silicon LtdDirected and interconnected grid dataflow architecture
US10817291B2 (en)2019-03-302020-10-27Intel CorporationApparatuses, methods, and systems for swizzle operations in a configurable spatial accelerator
US10853276B2 (en)2013-09-262020-12-01Intel CorporationExecuting distributed memory operations using processing elements connected by distributed channels
US10891240B2 (en)2018-06-302021-01-12Intel CorporationApparatus, methods, and systems for low latency communication in a configurable spatial accelerator
US10915471B2 (en)2019-03-302021-02-09Intel CorporationApparatuses, methods, and systems for memory interface circuit allocation in a configurable spatial accelerator
US10942737B2 (en)2011-12-292021-03-09Intel CorporationMethod, device and system for control signalling in a data path module of a data stream processing engine
US10949328B2 (en)2017-08-192021-03-16Wave Computing, Inc.Data flow graph computation using exceptions
US20210089349A1 (en)*2019-09-242021-03-25Speedata Ltd.Inter-Thread Communication in Multi-Threaded Reconfigurable Coarse-Grain Arrays
US10965536B2 (en)2019-03-302021-03-30Intel CorporationMethods and apparatus to insert buffers in a dataflow graph
US10997102B2 (en)2019-04-012021-05-04Wave Computing, Inc.Multidimensional address generation for direct memory access
US11029927B2 (en)2019-03-302021-06-08Intel CorporationMethods and apparatus to detect and annotate backedges in a dataflow graph
US11037050B2 (en)2019-06-292021-06-15Intel CorporationApparatuses, methods, and systems for memory interface circuit arbitration in a configurable spatial accelerator
US11086816B2 (en)2017-09-282021-08-10Intel CorporationProcessors, methods, and systems for debugging a configurable spatial accelerator
US11106976B2 (en)2017-08-192021-08-31Wave Computing, Inc.Neural network output layer for machine learning
US20210334134A1 (en)*2020-04-282021-10-28Speedata Ltd.Handling Multiple Graphs, Contexts and Programs in a Coarse-Grain Reconfigurable Array Processor
US11200186B2 (en)2018-06-302021-12-14Intel CorporationApparatuses, methods, and systems for operations in a configurable spatial accelerator
US11227030B2 (en)2019-04-012022-01-18Wave Computing, Inc.Matrix multiplication engine using pipelining
US11269526B2 (en)2020-04-232022-03-08Next Silicon LtdInterconnected memory grid with bypassable units
US20220100521A1 (en)*2020-09-292022-03-31Beijing Tsingmicro Intelligent Technology Co., Ltd.Data loading and storage system and method
US11360767B2 (en)*2017-04-282022-06-14Intel CorporationInstructions and logic to perform floating point and integer operations for machine learning
US11422939B2 (en)*2019-12-262022-08-23Intel CorporationShared read—using a request tracker as a temporary read cache
WO2022204450A1 (en)*2021-03-262022-09-29Ascenium, Inc.Parallel processing architecture using speculative encoding
US11481472B2 (en)2019-04-012022-10-25Wave Computing, Inc.Integer matrix multiplication engine using pipelining
WO2023018477A1 (en)*2021-08-122023-02-16Ascenium, Inc.Parallel processing architecture using distributed register files
WO2023183139A1 (en)*2022-03-252023-09-28Micron Technology, Inc.Schedule instructions of a program of data flows for execution in tiles of a coarse grained reconfigurable array
US11815935B2 (en)2022-03-252023-11-14Micron Technology, Inc.Programming a coarse grained reconfigurable array through description of data flow graphs
US11842423B2 (en)2019-03-152023-12-12Intel CorporationDot product operations on sparse matrix elements
US11899614B2 (en)2019-03-152024-02-13Intel CorporationInstruction based control of memory attributes
US11907713B2 (en)2019-12-282024-02-20Intel CorporationApparatuses, methods, and systems for fused operations using sign modification in a processing element of a configurable spatial accelerator
US20240078212A1 (en)*2022-03-242024-03-07Google LlcGeneral-Purpose Systolic Array
US11934308B2 (en)2019-04-012024-03-19Wave Computing, Inc.Processor cluster address generation
US11934342B2 (en)2019-03-152024-03-19Intel CorporationAssistance for hardware prefetch in cache access
EP4143681A4 (en)*2020-04-282024-03-20Speedata Ltd.Coarse-grain reconfigurable array processor with concurrent handling of multiple graphs on a single grid
US12056059B2 (en)2019-03-152024-08-06Intel CorporationSystems and methods for cache optimization
US12086080B2 (en)2020-09-262024-09-10Intel CorporationApparatuses, methods, and systems for a configurable accelerator having dataflow execution circuits
US12299496B1 (en)*2022-02-222025-05-13Amazon Technologies, Inc.Bulk loader scaling
US12321389B1 (en)2021-12-102025-06-03Amazon Technologies, Inc.Dynamic bounded memory allocation
US12361600B2 (en)2019-11-152025-07-15Intel CorporationSystolic arithmetic on sparse data
US12443449B2 (en)*2019-11-152025-10-14Nvidia CorporationTechniques for modifying an executable graph to perform a workload associated with a new task graph

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US10346145B2 (en)*2017-06-232019-07-09Intel CorporationLoop execution with predicate computing for dataflow machines
US11048661B2 (en)2018-04-162021-06-29Simple Machines Inc.Systems and methods for stream-dataflow acceleration wherein a delay is implemented so as to equalize arrival times of data packets at a destination functional unit
CN109165119B (en)*2018-08-072021-05-14杭州金荔枝科技有限公司Electronic commerce data processing method and system
CN109144702B (en)*2018-09-062021-12-07兰州大学Multi-objective optimization automatic mapping scheduling method for row-column parallel coarse-grained reconfigurable array
US12038922B2 (en)2021-10-172024-07-16Speedata Ltd.Pipelined hardware-implemented database query processing

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
JP6059413B2 (en)*2005-04-282017-01-11クアルコム,インコーポレイテッド Reconfigurable instruction cell array

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Shen et al.; Modern Processor Design: Fundamentals of Superscalar Processors; 2002; McGraw-Hill*

Cited By (134)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US10942737B2 (en)2011-12-292021-03-09Intel CorporationMethod, device and system for control signalling in a data path module of a data stream processing engine
US10853276B2 (en)2013-09-262020-12-01Intel CorporationExecuting distributed memory operations using processing elements connected by distributed channels
US10591983B2 (en)2014-03-142020-03-17Wisconsin Alumni Research FoundationComputer accelerator system using a trigger architecture memory access processor
US9891692B2 (en)*2014-08-282018-02-13Samsung Electronics Co., Ltd.Apparatus and method of controlling power consumption of graphic processing unit (GPU) resources
US20160062445A1 (en)*2014-08-282016-03-03Samsung Electronics Co., Ltd.Apparatus and method of controlling power consumption of graphic processing unit (gpu) resources
US20160321039A1 (en)*2015-04-292016-11-03Wave Computing, Inc.Technology mapping onto code fragments
US10659396B2 (en)2015-08-022020-05-19Wave Computing, Inc.Joining data within a reconfigurable fabric
US20180300139A1 (en)*2015-10-292018-10-18Intel CorporationBoosting local memory performance in processor graphics
US10768935B2 (en)*2015-10-292020-09-08Intel CorporationBoosting local memory performance in processor graphics
US20200371804A1 (en)*2015-10-292020-11-26Intel CorporationBoosting local memory performance in processor graphics
US20180225453A1 (en)*2015-11-252018-08-09Leidos Innovations Technology, Inc.Method for detecting a threat and threat detecting apparatus
US20180341526A1 (en)*2015-12-242018-11-29Intel CorporationFacilitating efficient communication and data processing across clusters of computing machines in heterogeneous computing environment
US11550632B2 (en)*2015-12-242023-01-10Intel CorporationFacilitating efficient communication and data processing across clusters of computing machines in heterogeneous computing environment
US10558575B2 (en)2016-12-302020-02-11Intel CorporationProcessors, methods, and systems with a configurable spatial accelerator
US10572376B2 (en)2016-12-302020-02-25Intel CorporationMemory ordering in acceleration hardware
WO2018140140A1 (en)*2017-01-262018-08-02Wisconsin Alumni Research FoundationReconfigurable, application-specific computer accelerator
CN110214309A (en)*2017-01-262019-09-06威斯康星校友研究基金会Reconfigurable special purpose computer accelerator
US11853244B2 (en)2017-01-262023-12-26Wisconsin Alumni Research FoundationReconfigurable computer accelerator providing stream processor and dataflow processor
EP3596609A4 (en)*2017-03-142020-01-22Azurengine Technologies Zhuhai Inc.Reconfigurable parallel processing
CN114168526A (en)*2017-03-142022-03-11珠海市芯动力科技有限公司 Reconfigurable Parallel Processing
US10956360B2 (en)*2017-03-142021-03-23Azurengine Technologies Zhuhai Inc.Static shared memory access with one piece of input data to be reused for successive execution of one instruction in a reconfigurable parallel processor
US20180267809A1 (en)*2017-03-142018-09-20Yuan LiStatic Shared Memory Access for Reconfigurable Parallel Processor
US10776310B2 (en)*2017-03-142020-09-15Azurengine Technologies Zhuhai Inc.Reconfigurable parallel processor with a plurality of chained memory ports
US10733139B2 (en)*2017-03-142020-08-04Azurengine Technologies Zhuhai Inc.Private memory access for a reconfigurable parallel processor using a plurality of chained memory ports
CN114003547A (en)*2017-03-142022-02-01珠海市芯动力科技有限公司 Reconfigurable Parallel Processing
US10776311B2 (en)2017-03-142020-09-15Azurengine Technologies Zhuhai Inc.Circular reconfiguration for a reconfigurable parallel processor using a plurality of chained memory ports
CN114168525A (en)*2017-03-142022-03-11珠海市芯动力科技有限公司 Reconfigurable Parallel Processing
CN110494851A (en)*2017-03-142019-11-22珠海市芯动力科技有限公司Restructural parallel processing
US10776312B2 (en)*2017-03-142020-09-15Azurengine Technologies Zhuhai Inc.Shared memory access for a reconfigurable parallel processor with a plurality of chained memory ports
US20180308209A1 (en)*2017-04-092018-10-25Intel CorporationCompute cluster preemption within a general-purpose graphics processing unit
US10460417B2 (en)*2017-04-092019-10-29Intel CorporationCompute cluster preemption within a general-purpose graphics processing unit
US11715174B2 (en)2017-04-092023-08-01Intel CorporationCompute cluster preemption within a general-purpose graphics processing unit
US11244420B2 (en)2017-04-212022-02-08Intel CorporationHandling pipeline submissions across many compute units
US10896479B2 (en)2017-04-212021-01-19Intel CorporationHandling pipeline submissions across many compute units
US20190035051A1 (en)2017-04-212019-01-31Intel CorporationHandling pipeline submissions across many compute units
US10497087B2 (en)2017-04-212019-12-03Intel CorporationHandling pipeline submissions across many compute units
US10977762B2 (en)2017-04-212021-04-13Intel CorporationHandling pipeline submissions across many compute units
US11803934B2 (en)2017-04-212023-10-31Intel CorporationHandling pipeline submissions across many compute units
US11620723B2 (en)2017-04-212023-04-04Intel CorporationHandling pipeline submissions across many compute units
US12073489B2 (en)2017-04-212024-08-27Intel CorporationHandling pipeline submissions across many compute units
US12411695B2 (en)2017-04-242025-09-09Intel CorporationMulticore processor with each core having independent floating point datapath and integer datapath
CN109712064A (en)*2017-04-242019-05-03英特尔公司 Inference using a mix of low and high precision
US12175252B2 (en)2017-04-242024-12-24Intel CorporationConcurrent multi-datatype execution within a processing resource
US12217053B2 (en)2017-04-282025-02-04Intel CorporationInstructions and logic to perform floating point and integer operations for machine learning
US11720355B2 (en)2017-04-282023-08-08Intel CorporationInstructions and logic to perform floating point and integer operations for machine learning
US12141578B2 (en)2017-04-282024-11-12Intel CorporationInstructions and logic to perform floating point and integer operations for machine learning
US12039331B2 (en)2017-04-282024-07-16Intel CorporationInstructions and logic to perform floating point and integer operations for machine learning
US11360767B2 (en)*2017-04-282022-06-14Intel CorporationInstructions and logic to perform floating point and integer operations for machine learning
CN110799955A (en)*2017-06-282020-02-14威斯康星校友研究基金会High speed computer accelerator with pre-programmed function
KR102349138B1 (en)*2017-06-282022-01-10위스콘신 얼럼나이 리서어치 화운데이션 High-speed computer accelerators with pre-programmed functions
KR20200013715A (en)*2017-06-282020-02-07위스콘신 얼럼나이 리서어치 화운데이션 High speed computer accelerator with preprogrammed functions
US11151077B2 (en)2017-06-282021-10-19Wisconsin Alumni Research FoundationComputer architecture with fixed program dataflow elements and stream processor
WO2019005443A1 (en)*2017-06-282019-01-03Wisconsin Alumni Research FoundationHigh-speed computer accelerator with pre-programmed functions
US10817309B2 (en)2017-08-032020-10-27Next Silicon LtdRuntime optimization of configurable hardware
US20190279086A1 (en)*2017-08-192019-09-12Wave Computing, Inc.Data flow graph node update for machine learning
US11106976B2 (en)2017-08-192021-08-31Wave Computing, Inc.Neural network output layer for machine learning
US10949328B2 (en)2017-08-192021-03-16Wave Computing, Inc.Data flow graph computation using exceptions
US10817344B2 (en)2017-09-132020-10-27Next Silicon LtdDirected and interconnected grid dataflow architecture
US11086816B2 (en)2017-09-282021-08-10Intel CorporationProcessors, methods, and systems for debugging a configurable spatial accelerator
EP3688578A4 (en)*2017-09-292021-06-23Oracle International Corporation SYSTEMS AND PROCEDURES FOR THE DEFINITION OF THREAD SPECIFICATIONS
CN110998521A (en)*2017-09-292020-04-10甲骨文国际公司System and method for defining thread specifications
US10565134B2 (en)2017-12-302020-02-18Intel CorporationApparatus, methods, and systems for multicast in a configurable spatial accelerator
CN108255690A (en)*2018-01-162018-07-06宿州新材云计算服务有限公司 A method for measuring server performance
EP3776228A4 (en)*2018-04-032022-01-12INTEL CorporationApparatuses, methods, and systems for unstructured data flow in a configurable spatial accelerator
US11307873B2 (en)*2018-04-032022-04-19Intel CorporationApparatus, methods, and systems for unstructured data flow in a configurable spatial accelerator with predicate propagation and merging
US10564980B2 (en)2018-04-032020-02-18Intel CorporationApparatus, methods, and systems for conditional queues in a configurable spatial accelerator
US20190303153A1 (en)*2018-04-032019-10-03Intel CorporationApparatus, methods, and systems for unstructured data flow in a configurable spatial accelerator
US10891240B2 (en)2018-06-302021-01-12Intel CorporationApparatus, methods, and systems for low latency communication in a configurable spatial accelerator
US11593295B2 (en)2018-06-302023-02-28Intel CorporationApparatuses, methods, and systems for operations in a configurable spatial accelerator
US11200186B2 (en)2018-06-302021-12-14Intel CorporationApparatuses, methods, and systems for operations in a configurable spatial accelerator
US11132761B2 (en)*2018-09-272021-09-28Intel CorporationMethods and apparatus to emulate graphics processing unit instructions
US11694299B2 (en)2018-09-272023-07-04Intel CorporationMethods and apparatus to emulate graphics processing unit instructions
US10559057B2 (en)*2018-09-272020-02-11Intel CorporationMethods and apparatus to emulate graphics processing unit instructions
CN111352744A (en)*2018-12-212020-06-30图核有限公司Data exchange in a computer
US10678724B1 (en)2018-12-292020-06-09Intel CorporationApparatuses, methods, and systems for in-network storage in a configurable spatial accelerator
US12153541B2 (en)2019-03-152024-11-26Intel CorporationCache structure and utilization
US12093210B2 (en)2019-03-152024-09-17Intel CorporationCompression techniques
US12386779B2 (en)2019-03-152025-08-12Intel CorporationDynamic memory reconfiguration
US12321310B2 (en)2019-03-152025-06-03Intel CorporationImplicit fence for write messages
US12293431B2 (en)2019-03-152025-05-06Intel CorporationSparse optimizations for a matrix accelerator architecture
US12242414B2 (en)2019-03-152025-03-04Intel CorporationData initialization techniques
US12066975B2 (en)2019-03-152024-08-20Intel CorporationCache structure and utilization
US12079155B2 (en)2019-03-152024-09-03Intel CorporationGraphics processor operation scheduling for deterministic latency
US12210477B2 (en)2019-03-152025-01-28Intel CorporationSystems and methods for improving cache efficiency and utilization
US12204487B2 (en)2019-03-152025-01-21Intel CorporationGraphics processor data access and sharing
US12198222B2 (en)2019-03-152025-01-14Intel CorporationArchitecture for block sparse operations on a systolic array
US12013808B2 (en)2019-03-152024-06-18Intel CorporationMulti-tile architecture for graphics operations
US12182035B2 (en)2019-03-152024-12-31Intel CorporationSystems and methods for cache optimization
US11842423B2 (en)2019-03-152023-12-12Intel CorporationDot product operations on sparse matrix elements
US12056059B2 (en)2019-03-152024-08-06Intel CorporationSystems and methods for cache optimization
US12007935B2 (en)2019-03-152024-06-11Intel CorporationGraphics processors and graphics processing units having dot product accumulate instruction for hybrid floating point format
US11899614B2 (en)2019-03-152024-02-13Intel CorporationInstruction based control of memory attributes
US11995029B2 (en)2019-03-152024-05-28Intel CorporationMulti-tile memory management for detecting cross tile access providing multi-tile inference scaling and providing page migration
US12141094B2 (en)2019-03-152024-11-12Intel CorporationSystolic disaggregation within a matrix accelerator architecture
US12124383B2 (en)2019-03-152024-10-22Intel CorporationSystems and methods for cache optimization
US12099461B2 (en)2019-03-152024-09-24Intel CorporationMulti-tile memory management
US11934342B2 (en)2019-03-152024-03-19Intel CorporationAssistance for hardware prefetch in cache access
US12182062B1 (en)2019-03-152024-12-31Intel CorporationMulti-tile memory management
US11954063B2 (en)2019-03-152024-04-09Intel CorporationGraphics processors and graphics processing units having dot product accumulate instruction for hybrid floating point format
US11954062B2 (en)2019-03-152024-04-09Intel CorporationDynamic memory reconfiguration
US10965536B2 (en)2019-03-302021-03-30Intel CorporationMethods and apparatus to insert buffers in a dataflow graph
US10915471B2 (en)2019-03-302021-02-09Intel CorporationApparatuses, methods, and systems for memory interface circuit allocation in a configurable spatial accelerator
US11029927B2 (en)2019-03-302021-06-08Intel CorporationMethods and apparatus to detect and annotate backedges in a dataflow graph
US11693633B2 (en)2019-03-302023-07-04Intel CorporationMethods and apparatus to detect and annotate backedges in a dataflow graph
US10817291B2 (en)2019-03-302020-10-27Intel CorporationApparatuses, methods, and systems for swizzle operations in a configurable spatial accelerator
US10997102B2 (en)2019-04-012021-05-04Wave Computing, Inc.Multidimensional address generation for direct memory access
US11934308B2 (en)2019-04-012024-03-19Wave Computing, Inc.Processor cluster address generation
US11481472B2 (en)2019-04-012022-10-25Wave Computing, Inc.Integer matrix multiplication engine using pipelining
US11227030B2 (en)2019-04-012022-01-18Wave Computing, Inc.Matrix multiplication engine using pipelining
US11037050B2 (en)2019-06-292021-06-15Intel CorporationApparatuses, methods, and systems for memory interface circuit arbitration in a configurable spatial accelerator
US20210089349A1 (en)*2019-09-242021-03-25Speedata Ltd.Inter-Thread Communication in Multi-Threaded Reconfigurable Coarse-Grain Arrays
US11900156B2 (en)*2019-09-242024-02-13Speedata Ltd.Inter-thread communication in multi-threaded reconfigurable coarse-grain arrays
US12361600B2 (en)2019-11-152025-07-15Intel CorporationSystolic arithmetic on sparse data
US12443449B2 (en)*2019-11-152025-10-14Nvidia CorporationTechniques for modifying an executable graph to perform a workload associated with a new task graph
US11422939B2 (en)*2019-12-262022-08-23Intel CorporationShared read—using a request tracker as a temporary read cache
US11907713B2 (en)2019-12-282024-02-20Intel CorporationApparatuses, methods, and systems for fused operations using sign modification in a processing element of a configurable spatial accelerator
US11644990B2 (en)2020-04-232023-05-09Next Silicon LtdInterconnected memory grid with bypassable units
US11269526B2 (en)2020-04-232022-03-08Next Silicon LtdInterconnected memory grid with bypassable units
US11354157B2 (en)*2020-04-282022-06-07Speedata Ltd.Handling multiple graphs, contexts and programs in a coarse-grain reconfigurable array processor
US20210334134A1 (en)*2020-04-282021-10-28Speedata Ltd.Handling Multiple Graphs, Contexts and Programs in a Coarse-Grain Reconfigurable Array Processor
EP4143681A4 (en)*2020-04-282024-03-20Speedata Ltd.Coarse-grain reconfigurable array processor with concurrent handling of multiple graphs on a single grid
EP4143682A4 (en)*2020-04-282024-03-13Speedata Ltd.Handling multiple graphs, contexts and programs in a coarse-grain reconfigurable array processor
US12086080B2 (en)2020-09-262024-09-10Intel CorporationApparatuses, methods, and systems for a configurable accelerator having dataflow execution circuits
US20220100521A1 (en)*2020-09-292022-03-31Beijing Tsingmicro Intelligent Technology Co., Ltd.Data loading and storage system and method
US12124853B2 (en)*2020-09-292024-10-22Beijing Tsingmicro Intelligent Technology Co., Ltd.Data loading and storage system and method
WO2022204450A1 (en)*2021-03-262022-09-29Ascenium, Inc.Parallel processing architecture using speculative encoding
WO2023018477A1 (en)*2021-08-122023-02-16Ascenium, Inc.Parallel processing architecture using distributed register files
US12321389B1 (en)2021-12-102025-06-03Amazon Technologies, Inc.Dynamic bounded memory allocation
US12299496B1 (en)*2022-02-222025-05-13Amazon Technologies, Inc.Bulk loader scaling
US12287756B2 (en)*2022-03-242025-04-29Google LlcGeneral-purpose systolic array
US20240078212A1 (en)*2022-03-242024-03-07Google LlcGeneral-Purpose Systolic Array
WO2023183139A1 (en)*2022-03-252023-09-28Micron Technology, Inc.Schedule instructions of a program of data flows for execution in tiles of a coarse grained reconfigurable array
US11815935B2 (en)2022-03-252023-11-14Micron Technology, Inc.Programming a coarse grained reconfigurable array through description of data flow graphs
US12039335B2 (en)2022-03-252024-07-16Micron Technology, Inc.Schedule instructions of a program of data flows for execution in tiles of a coarse grained reconfigurable array

Also Published As

Publication numberPublication date
US20180101387A1 (en)2018-04-12
US11003458B2 (en)2021-05-11
US20200159539A1 (en)2020-05-21
US10579390B2 (en)2020-03-03

Similar Documents

PublicationPublication DateTitle
US11003458B2 (en)Execution of data-parallel programs on coarse-grained reconfigurable architecture hardware
Voitsechov et al.Single-graph multiple flows: Energy efficient design alternative for gpgpus
Weng et al.A hybrid systolic-dataflow architecture for inductive matrix algorithms
Sankaralingam et al.Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture
Sankaralingam et al.Trips: A polymorphous architecture for exploiting ilp, tlp, and dlp
US9158575B2 (en)Multithreaded processor array with heterogeneous function blocks communicating tokens via self-routing switch fabrics
US7840914B1 (en)Distributing computations in a parallel processing environment
Narasiman et al.Improving GPU performance via large warps and two-level warp scheduling
Olofsson et al.Kickstarting high-performance energy-efficient manycore architectures with epiphany
US20100122105A1 (en)Reconfigurable instruction cell array
Voitsechov et al.Inter-thread communication in multithreaded, reconfigurable coarse-grain arrays
Wang et al.MP-Tomasulo: A dependency-aware automatic parallel execution engine for sequential programs
Riedel et al.MemPool: A scalable manycore architecture with a low-latency shared L1 memory
Dai et al.Enabling adaptive loop pipelining in high-level synthesis
Rákossy et al.Design and analysis of layered coarse-grained reconfigurable architecture
Madhu et al.Compiling HPC kernels for the REDEFINE CGRA
Wang et al.A multiple SIMD, multiple data (MSMD) architecture: Parallel execution of dynamic and static SIMD fragments
Gou et al.Elastic pipeline: addressing GPU on-chip shared memory bank conflicts
Mische et al.Reduced complexity many-core: timing predictability due to message-passing
Voitsechov et al.Control flow coalescing on a hybrid dataflow/von Neumann GPGPU
Liu et al.Pattern-based dynamic compilation system for CGRAs with online configuration transformation
Braak et al.R-gpu: A reconfigurable gpu architecture
Lu et al.Minimizing pipeline stalls in distributed-controlled coarse-grained reconfigurable arrays with triggered instruction issue and execution
Jeong et al.Evaluator-executor transformation for efficient pipelining of loops with conditionals
FungDynamic warp formation: exploiting thread scheduling for efficient MIMD control flow on SIMD graphics hardware

Legal Events

DateCodeTitleDescription
ASAssignment

Owner name:TECHNION RESEARCH & DEVELOPMENT FOUNDATION LTD., I

Free format text:ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:ETSION, YOAV;VOITSECHOV, DANI;REEL/FRAME:035166/0187

Effective date:20150308

STCBInformation on status: application discontinuation

Free format text:ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION


[8]ページ先頭

©2009-2025 Movatter.jp