Movatterモバイル変換


[0]ホーム

URL:


CN109993297A - A kind of the sparse convolution neural network accelerator and its accelerated method of load balancing - Google Patents

A kind of the sparse convolution neural network accelerator and its accelerated method of load balancing
Download PDF

Info

Publication number
CN109993297A
CN109993297ACN201910259591.7ACN201910259591ACN109993297ACN 109993297 ACN109993297 ACN 109993297ACN 201910259591 ACN201910259591 ACN 201910259591ACN 109993297 ACN109993297 ACN 109993297A
Authority
CN
China
Prior art keywords
convolution
data
load balancing
array
computing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910259591.7A
Other languages
Chinese (zh)
Inventor
王瑶
朱志炜
秦子迪
苏岩
王宇宣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Jixiang Sensing And Imaging Technology Research Institute Co Ltd
Original Assignee
Nanjing Jixiang Sensing And Imaging Technology Research Institute Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Jixiang Sensing And Imaging Technology Research Institute Co LtdfiledCriticalNanjing Jixiang Sensing And Imaging Technology Research Institute Co Ltd
Priority to CN201910259591.7ApriorityCriticalpatent/CN109993297A/en
Publication of CN109993297ApublicationCriticalpatent/CN109993297A/en
Pendinglegal-statusCriticalCurrent

Links

Classifications

Landscapes

Abstract

The invention discloses the sparse convolution neural network accelerators and its accelerated method of a kind of load balancing.Accelerator includes master controller, data distribution module, the computing array of convolution algorithm, output result cache module, linear activation primitive unit, pond unit, online coding unit and outer chip dynamic memory.The solution of the present invention can realize the computing array high efficiency operation of convolution algorithm under the conditions of seldom storage resource, guarantee the high reusability of input stimulus and weighted data, the load balancing and high usage of computing array;Computing array supports the Parallel Scheduling that two levels between different characteristic figure between the convolution algorithm and ranks of different size different scales are realized by way of static configuration simultaneously, has good applicability and scalability.

Description

A kind of the sparse convolution neural network accelerator and its accelerated method of load balancing
Technical field
The present invention relates to the sparse convolution neural network accelerators and its accelerated method of a kind of load balancing, belong to depthPractise the technical field of algorithm.
Background technique
In recent years, deep learning algorithm computer vision, natural language processing and in terms of obtainIt is widely applied the effect with brilliance, and convolutional neural networks CNN is one of most important one algorithm.Convolutional neural networks mouldThe higher accuracy rate of type often means that the deeper network number of plies, and more network parameters and operand, wherein 90%Calculating all concentrates on convolutional layer, therefore for the high-efficiency operation convolutional neural networks preferably in embedded system, optimizationThe Energy Efficiency Ratio of convolution operation is imperative.
There are two the main features of convolutional neural networks CNN convolutional layer operation: first is that the data volume of operation is big, convolution algorithmRequired feature image and weighted data is in large scale, carries out rarefaction to it and compress storage and can save well data to depositStorage unit maximumlly utilizes data transfer bandwidth;Second is that operational data stream and control stream are complicated, convolution algorithm is needed according to volumeProduct dimensional information handles multiple channels of multiple convolution kernels simultaneously, guarantees that the flowing water of operation carries out.
The convolutional neural networks of rarefaction will increase invalid meter in calculating process due to the irregular distribution of nonzero elementIt calculates, causes calculation resources vacancy rate high.
Summary of the invention
In view of the above-mentioned problems of the prior art, the present invention is intended to provide a kind of sparse convolution of high efficiency load balancingNeural network accelerator, with realize weight and excited data reusability are high, volume of transmitted data is small, it is expansible can degree of parallelism it is high andRequired hardware store resource and the few purpose of DSP resource.It is a further object of the present invention to provide a kind of adding using the acceleratorFast method.
Accelerator of the present invention the technical solution adopted is that:
A kind of sparse convolution neural network accelerator of load balancing, comprising: master controller, for controlling convolution algorithmSignal stream and data flow are controlled, data are handled and are saved;Data distribution module, according to the segment partition scheme pair of convolution algorithmComputing array carries out weighted data distribution;The computing array of convolution algorithm, the multiply-add operation for completing sparse convolution operates, defeatedThe result of part sum out;Result cache module is exported, the result for the part sum to computing array carries out cumulative caching, and wholeUnified format is managed into, the characteristic pattern result of processing and pond to be activated is exported;Linear activation primitive unit, for cumulative completionPart and result biasing set and activation primitive operation;Pond unit, for the pond through activation primitive treated resultChange operation;Online coding unit, for carrying out the excitation value for still needing to carry out subsequent convolutional layer operation in line coding;Dynamic outside pieceMemory, for storing the characteristic pattern of raw image data, the intermediate result of computing array operation and final output.
A kind of accelerated method of the sparse convolution neural network accelerator of load balancing of the present invention, comprising the following steps:
1) beta pruning is carried out to convolutional neural networks Model Weight data, according to the scale parameter logistic of weighted data according to progressThen grouping takes identical prune approach to carry out each group weighted data sparse on the basis of guaranteeing model entirety accuracy rateChange processing;
2) the sparse convolution operation mapping scheme for formulating load balancing, the convolutional neural networks after rarefaction are mapped to and are addedOn the computing array of the convolution algorithm of fast device;
3) accelerator guarantees the stream of convolution algorithm according to the configuration information reconstruction calculations array and storage array of mapping schemeWater carries out;
4) main controller controls data distribution module completes the distribution of weighted data and excited data, and computing array is transportedIt calculates, exports conventional part and result;
5) it is added up to the conventional part and result and is linearly corrected, i.e., completion biasing is set operates with activation primitive;
6) the pondization operation of respective cells core size and step-length is carried out according to current convolutional layer pond demand;
7) judge whether current convolution layer operation is the last layer, if it is not, then carrying out in line coding, after codingExcitation result is sent to next layer of convolution, if it is, being output to outer chip dynamic memory, completes the acceleration of convolutional neural networks.
Compared with prior art, the invention has the advantages that
The sparse convolution neural network accelerator and its accelerated method of a kind of load balancing provided by the invention, maximumllyUsing the sparse characteristic of convolution algorithm data, it can realize that the computing array of convolution algorithm is high under the conditions of seldom storage resourceEfficiency operation guarantees the high reusability of input stimulus and weighted data, the load balancing and high usage of operation array;It counts simultaneouslyArray is calculated to support to realize by way of static configuration between the convolution algorithm and ranks of different size different scales and different spiesThe Parallel Scheduling of two levels, has good applicability and scalability between sign figure.Design of the invention can meet wellThe demand of the low-power consumption high energy efficiency ratio of convolutional neural networks is run under embedded system at present.
Detailed description of the invention
Fig. 1 is the sparse convolution network accelerating method schematic diagram of load balancing.
Fig. 2 is weight prune approach schematic diagram.
Fig. 3 is hardware accelerator overall structure diagram.
Fig. 4 is convolution algorithm mapping mode schematic diagram.
Fig. 5 is convolution algorithm schematic diagram in PE group.
Fig. 6 is the realization schematic diagram that PE is array-supported balanced and storage is shared.
Specific embodiment
The present invention program is described in detail with reference to the accompanying drawing.
It is as shown in Figure 1 the sparse convolution network operations method flow schematic diagram of load balancing, it first can be to convolutional NeuralNetwork model weighted data carries out beta pruning, according to the scale parameter logistic of weighted data according to being grouped, is then guaranteeing modelIdentical prune approach is taken to carry out LS-SVM sparseness each group weighted data on the basis of whole accuracy rate;Then according to convolutionThe sparse convolution operation mapping scheme of operation input feature vector figure and convolution kernel dimensioned load balancing, by the convolution after rarefactionNeural network is mapped on PE (Process Element arithmetic element) array of the convolution algorithm of hardware accelerator;Then hardPart accelerator reconstructs PE array and storage array according to the configuration information of mapping scheme, guarantees that the flowing water of convolution algorithm carries out;AddThe master controller of fast device can control the distribution for completing weighted data and excited data, and PE array carries out operation, export conventional partAnd result;Linear amending unit adds up to part and result and is linearly corrected, i.e., completion biasing is set operates with activation primitive;Pond unit carries out the pondization operation of respective cells core size and step-length, including selection maximum according to current convolutional layer pond demandIt is worth pondization or average value pond;Finally judge whether current convolution layer operation is the last layer, if it is not, then carrying outExcitation result after coding is sent to next layer of convolution, if it is, being output to piece external storage, completes entire convolution by line codingAccelerate.
The sparse convolution operation mapping scheme of load balancing includes convolution algorithm mapping mode, PE array grouping scheme, defeatedEnter the distribution multiplex mode and PE array operation Parallel Scheduling mechanism of feature image and weighted data.
Convolution algorithm mapping mode: input feature vector picture is transformed into a matrix according to row (column) dimension, by weighted dataA vector, which is launched into, according to output channel dimension passes through design so that convolution algorithm is converted to matrix-vector multiplicationSparse Matrix-Vector multiplication unit can skip the zero in input feature vector picture and weighted data well, guarantee whole fortuneThe high efficiency of calculation.
PE array is grouped scheme: completing to divide by master controller static configuration according to the dimensional parameters information of every layer of convolution algorithmGroup operation, when PE number is greater than three dimensional convolution kernel total number, one group can calculate all output characteristic pattern channels, on this basis,Remaining PE is grouped by same number, is responsible for calculating not going together for output characteristic pattern;When PE number is less than three dimensional convolution kernel total number,One group of calculating exports the maximum approximate number of characteristic pattern port number, and the principle being grouped in this way is to guarantee each PE arithmetic speed matching,PE array vacancy rate is low.
The distribution multiplex mode of input feature vector picture and weighted data: entire PE array is by one piece of shared on-chip memoryThe identical excited data of synchronization distribution is as matrix needed for operation, by data distribution module according to the control information of piecemeal operationWeighted data needed for distributing each PE essentially consists in different PE's as vector needed for operation, the multiplexing of input feature vector pictureIt uses simultaneously, the multiplexing of weight and the same PE replace weighted data after matrix between the multiplexing of weighted data essentially consists in different groupsUtilization again without distribution.
PE array operation Parallel Scheduling mechanism: PE array needs to export according to convolutional layer in operation the size of feature imageInformation determines that different grouping is to complete the output of same output feature image difference row (column), or complete different output characteristic patternsThe operation of piece.This ensure that PE array can carry out Parallel Scheduling in two levels, first is that in the layer of single features pictureParallel, second is that different characteristic picture simultaneously and concurrently.
A kind of sparse convolution neural network speeding scheme of load balancing of the present embodiment includes two portions of software and hardwarePoint, as shown in Fig. 2, being software section Pruning strategy schematic diagram in figure.Pruning strategy is described as follows: for initial intensive nerveNetwork connection can be grouped it according to the connection number and neuron number of network, and each grouping prune approach is identical with position,That is for the neuron of each convolution kernel group as connection type, the weighted data only connected is different.With input feature vectorFor figure is W*W*C, (W is characterized figure width and height dimensions, and C is input channel number), convolution kernel is having a size of R*R*C*N, and (R is volumeThe width of product core and high size, C are convolution kernel port number, and N is convolution kernel number namely output channel number), beta pruning when, can be firstThe convolution kernel of R*R*C is classified as a convolution kernel group, amount to it is N number of, for each convolution kernel, the position phase of neutral element in themTogether;If model needs are not achieved in accuracy rate after beta pruning, convolution kernel group size can be adjusted, takes R*R*C*N1 (approximate number that N1 is N)Carry out beta pruning.
It is illustrated in figure 3 the sparse convolution neural network accelerator structure schematic diagram of hardware components.Overall structure is mainly wrappedContain: master controller, the convolution algorithm since host computer CPU receives instruction, for generating the control signal of control convolution algorithmStream and data flow;Data distribution module carries out weighted data distribution to PE according to the segment partition scheme of convolution algorithm;Convolution algorithmPE (Process Element arithmetic element) array is grouped according to the configuration information of master controller and completes sparse convolutionMultiply-add operation operation, exports convolution results or part and result;Result cache module is exported, part and result to PE carry out tiredAdd caching, is organized into after unified format and is sent to subsequent cell and is operated;Linear activation primitive unit, completes convolution algorithm resultBiasing set and activation primitive operation;Pond unit completes the maximum value pondization operation of result;Online coding unit, to centreAs a result online CSR (storage of compression loose line) coding is carried out, to guarantee that the result of output meets the data of subsequent convolutional layer operationCall format;Outer chip dynamic memory DDR4, for storing raw image data, interlayer intermediate result and convolutional layer final outputAs a result.
Data distribution module includes the fetch configurable on-chip memory storage unit of address calculation, on piece and dataThe FIFO group of format caching conversion.The configuration information that data distribution module can be sent according to the master controller received, by fetchingAddress calculation completes the cache flush mode to outer chip dynamic memory DDR4, and the data taken out are cached to via AXI4 interfaceThe on-chip memory storage unit of on piece weight, a step of going forward side by side format, and distribution is cached in corresponding FIFO, waitOperation sends data.
The PE array of convolution algorithm includes multiple matrix-vector multiplication computing units, can be wanted according to static configuration informationIt asks, completes in the layer of feature image or interlayer parallel-convolution operates, export part and the result of convolution algorithm.Multiple PE are mono- simultaneouslyThe storage of member is common on-chip memory, and in view of the design of Pruning strategy and hardware structure, multiple PE can be using seldomUnder conditions of storage resource, reaches and jump zero acceleration calculating and the matching of difference PE arithmetic speed during calculating sparse convolution.
Matrix-vector multiplication computing unit includes flowing water controller module, weight non-zero detection module, pointer control module, swashsEncourage decompression module, MLA operation unit module and public on-chip memory storage.Weight non-zero detection module can be data distributionThe weighted data that module is sent carries out non-zero detection, only transmits nonzero value location information corresponding with its to PE unit;Pointer controlMolding block and excitation decompression module can take out non-zero weight value according to corresponding non-zero weight value from common on-chip memoryExcitation value needed for corresponding operation, while each PE unit is sent in case operation;MLA operation unit is mainly responsible for matrixVector multiply in multiplication and additional calculation.
It is illustrated in figure 4 convolution algorithm mapping mode schematic diagram, by taking input feature vector figure is W*W*C as an example, (W is characterized figureWidth and height dimensions, C are input channel number), convolution kernel is having a size of R*R*C*N, and (R is the wide and high size of convolution kernel, and C is convolutionCore port number, N are convolution kernel number namely output channel number), F is output characteristic pattern size;It is determined each by N size firstThe number Num_PE of PE unit in PE group can allow Num_PE to be equal to N, each group of a batch operation can if PE total number is greater than NWith immediately arrive at output all channels of characteristic pattern as a result, otherwise just allow Num_PE for the approximate number M of N, integer batch operation output is specialSign figure passage portion as a result, guaranteeing that certain PE will not be idle;The group number Group_PE of PE is true by PE total number and Num_PEIt is fixed, if one group of operation that can have completed all output channels, different groups are responsible for exporting not going together for characteristic pattern, i.e.,As shown in 2 operation of the PE group division of labor in figure.
Convolution algorithm complete for one layer, a PE group is by Num_PE PE unit (i.e. matrix-vector multiplication unit) structureAt each matrix-vector multiplication unit is responsible for exporting several rows in a channel of characteristic pattern, if wherein first time operation can exportThe first row of dry row, specific line number are determined that it is storage that matrix is corresponding in matrix-vector multiplication by the matrix size of matrix-vector multiplicationIn the shared excited data being locally stored in on-chip memory, corresponding vector is the weight number sent by data distribution moduleAccording to;For other PE groups, the subsequent rows that operation content can be output characteristic pattern that is, as shown in Figure 3 can alsoTo be the convolution algorithm of other input feature vector figures, it can it is two different parallel to meet row-column parallel calculation and different characteristic figure in layerParallel modes of operation.
It is illustrated in figure 5 convolution algorithm schematic diagram in PE group, input feature vector figure and difference are indicated with different numerical valueThe value of different location on convolution kernel, the matrix-vector multiplication scale that example is taken is the matrix of 2*12 and the vector of 12*1, so PEThe vector that each operation output result is 2*1, it is three channel 12* of convolution kernel 1 that PE1 vector in first time operation is corresponding1, it is three channels of (1,2,4,5) and (2,3,5,6) corresponding position in activating image that matrix is corresponding, is carrying out multiply-add operationOutput result is to export the front two row of first channel first row of characteristic pattern afterwards, and subsequent matrices can first update, that is, take (4,5,7,8) and the excitation value of (5,6,8,9) position, output result are to export the front two row of first channel secondary series of characteristic pattern;It is exportingAfter all column datas of corresponding row, the corresponding weighted data of vector will do it update, i.e., rear extended meeting output third channel is defeatedResult out.And it is exactly after weighted data updates, to become calculating defeated in second channel for calculating output characteristic pattern that PE2 is correspondingThe 4th channel out.
It is illustrated in figure 6 the realization schematic diagram that PE is array-supported balanced and storage is shared, the shared on piece of PE array is depositedReservoir stores according to the nonzero value of the input stimulus of CSR (storage of compression loose line) format storage and their index and refers toNeedle, the position of the weight vectors nonzero value sent according to data distribution module take out corresponding excitation and carry out multiply-add operation, due toThe interior all weight vectors of PE group are identical according to the position of its nonzero element of software Pruning strategy, so required for each PECorrespondence excitation value be also identical, it is only necessary to seldom memory saves a excitation value, and decodes while being sent to PE i.e.The matrix requirements of PE array can be met.And for all PE, carry out the non-of matrix and vector in matrix-vector multiplicationNull position is identical, therefore PE array computation speed matches, and reaches the purpose of design of the low storage load balancing of operation array.At the same time, different PE groups can also share the weighted data of distribution, realize the high reusability of excitation and weight.
It to sum up narrates, the accelerated method for sparse convolution neural network proposed using the embodiment of the present invention, Ke YiyouStorage hardware resource is saved on effect ground, improves the reusability of input feature vector figure and weight, and the load that can be realized PE array is equalWeighing apparatus, carrying out static configuration to PE array can satisfy different concurrent operation requirements, guarantee the high usage of PE array, whole to improveThe data throughput of system system, reaches very high Energy Efficiency Ratio, the embedded system suitable for low-power consumption.

Claims (8)

2. a kind of sparse convolution neural network accelerator of load balancing according to claim 1, which is characterized in that describedThe computing array of convolution algorithm includes matrix-vector multiplication computing unit, and the matrix-vector multiplication computing unit includes flowing water controllerModule, weight non-zero detection module, pointer control module, excitation decompression module, MLA operation unit module and public on piece are depositedReservoir;The weighted data that the weight non-zero detection module is used to send data distribution module carries out non-zero detection, and only passesDefeated nonzero value location information corresponding with its is to computing unit;The pointer control module and excitation decompression module are used for according to rightThe non-zero weight value answered is sent simultaneously from excitation value needed for non-zero weight is worth corresponding operation is taken out in public on-chip memoryTo each computing unit;The MLA operation unit for operation matrix vector multiply in multiplication and addition.
6. a kind of accelerated method of the sparse convolution neural network accelerator of load balancing according to claim 4, specialSign is that the computing array is grouped scheme specifically: the dimensional parameters information according to every layer of convolution algorithm is quiet by master controllerState configures into grouping operation, and when computing unit number is greater than three dimensional convolution kernel total number, a group pattern is all for calculatingCharacteristic pattern channel is exported, on this basis, remaining computing unit is grouped by same number, is responsible for calculating the difference of output characteristic patternRow;When computing unit number is less than three dimensional convolution kernel total number, a group pattern is for calculating output characteristic pattern port number mostBig approximate number.
CN201910259591.7A2019-04-022019-04-02A kind of the sparse convolution neural network accelerator and its accelerated method of load balancingPendingCN109993297A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201910259591.7ACN109993297A (en)2019-04-022019-04-02A kind of the sparse convolution neural network accelerator and its accelerated method of load balancing

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201910259591.7ACN109993297A (en)2019-04-022019-04-02A kind of the sparse convolution neural network accelerator and its accelerated method of load balancing

Publications (1)

Publication NumberPublication Date
CN109993297Atrue CN109993297A (en)2019-07-09

Family

ID=67132262

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201910259591.7APendingCN109993297A (en)2019-04-022019-04-02A kind of the sparse convolution neural network accelerator and its accelerated method of load balancing

Country Status (1)

CountryLink
CN (1)CN109993297A (en)

Cited By (68)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN110516801A (en)*2019-08-052019-11-29西安交通大学 A High Throughput Dynamically Reconfigurable Convolutional Neural Network Accelerator Architecture
CN110543900A (en)*2019-08-212019-12-06北京市商汤科技开发有限公司Image processing method and device, electronic equipment and storage medium
CN110738310A (en)*2019-10-082020-01-31清华大学sparse neural network accelerators and implementation method thereof
CN110807513A (en)*2019-10-232020-02-18中国人民解放军国防科技大学Convolutional neural network accelerator based on Winograd sparse algorithm
CN110852422A (en)*2019-11-122020-02-28吉林大学 Convolutional Neural Network Optimization Method and Device Based on Pulse Array
CN110991631A (en)*2019-11-282020-04-10福州大学Neural network acceleration system based on FPGA
CN111047008A (en)*2019-11-122020-04-21天津大学Convolutional neural network accelerator and acceleration method
CN111047010A (en)*2019-11-252020-04-21天津大学 Method and device for reducing first-layer convolution calculation delay of CNN accelerator
CN111062472A (en)*2019-12-112020-04-24浙江大学 A sparse neural network accelerator based on structured pruning and its acceleration method
CN111079919A (en)*2019-11-212020-04-28清华大学Memory computing architecture supporting weight sparsity and data output method thereof
CN111178508A (en)*2019-12-272020-05-19珠海亿智电子科技有限公司Operation device and method for executing full connection layer in convolutional neural network
CN111199277A (en)*2020-01-102020-05-26中山大学 A Convolutional Neural Network Accelerator
CN111240743A (en)*2020-01-032020-06-05上海兆芯集成电路有限公司 artificial intelligence integrated circuit
CN111368988A (en)*2020-02-282020-07-03北京航空航天大学Deep learning training hardware accelerator utilizing sparsity
CN111401532A (en)*2020-04-282020-07-10南京宁麒智能计算芯片研究院有限公司Convolutional neural network reasoning accelerator and acceleration method
CN111401554A (en)*2020-03-122020-07-10交叉信息核心技术研究院(西安)有限公司Accelerator of convolutional neural network supporting multi-granularity sparsity and multi-mode quantization
CN111415004A (en)*2020-03-172020-07-14北京百度网讯科技有限公司 Method and apparatus for outputting information
CN111445013A (en)*2020-04-282020-07-24南京大学 A non-zero detector for convolutional neural networks and its method
CN111445012A (en)*2020-04-282020-07-24南京大学 An FPGA-based packet convolution hardware accelerator and method thereof
CN111667052A (en)*2020-05-272020-09-15上海赛昉科技有限公司Standard and nonstandard volume consistency transformation method for special neural network accelerator
CN111667051A (en)*2020-05-272020-09-15上海赛昉科技有限公司Neural network accelerator suitable for edge equipment and neural network acceleration calculation method
CN111738433A (en)*2020-05-222020-10-02华南理工大学 A Reconfigurable Convolution Hardware Accelerator
CN111782356A (en)*2020-06-032020-10-16上海交通大学 Data flow method and system for weight sparse neural network chip
CN111882028A (en)*2020-06-082020-11-03北京大学深圳研究生院Convolution operation device for convolution neural network
CN111914999A (en)*2020-07-302020-11-10云知声智能科技股份有限公司Method and equipment for reducing calculation bandwidth of neural network accelerator
CN111967587A (en)*2020-07-272020-11-20复旦大学Arithmetic unit array structure for neural network processing
CN112052941A (en)*2020-09-102020-12-08南京大学 An efficient storage and computing system and its computing method applied to the convolutional layer of CNN network
CN112418417A (en)*2020-09-242021-02-26北京计算机技术及应用研究所Convolution neural network acceleration device and method based on SIMD technology
CN112506436A (en)*2020-12-112021-03-16西北工业大学High-efficiency data dynamic storage allocation method for convolutional neural network accelerator
CN112766453A (en)*2019-10-212021-05-07华为技术有限公司Data processing device and data processing method
CN112836803A (en)*2021-02-042021-05-25珠海亿智电子科技有限公司Data placement method for improving convolution operation efficiency
CN112989270A (en)*2021-04-272021-06-18南京风兴科技有限公司Convolution calculating device based on hybrid parallel
CN113077047A (en)*2021-04-082021-07-06华南理工大学Convolutional neural network accelerator based on feature map sparsity
CN113128688A (en)*2021-04-142021-07-16北京航空航天大学General AI parallel reasoning acceleration structure and reasoning equipment
CN113159302A (en)*2020-12-152021-07-23浙江大学Routing structure for reconfigurable neural network processor
CN113191493A (en)*2021-04-272021-07-30北京工业大学Convolutional neural network accelerator based on FPGA parallelism self-adaptation
CN113313251A (en)*2021-05-132021-08-27中国科学院计算技术研究所Deep separable convolution fusion method and system based on data stream architecture
CN113435570A (en)*2021-05-072021-09-24西安电子科技大学Programmable convolutional neural network processor, method, device, medium, and terminal
CN113486200A (en)*2021-07-122021-10-08北京大学深圳研究生院Data processing method, processor and system of sparse neural network
CN113496274A (en)*2020-03-202021-10-12郑桂忠Quantification method and system based on operation circuit architecture in memory
CN113591025A (en)*2021-08-032021-11-02深圳思谋信息科技有限公司Feature map processing method and device, convolutional neural network accelerator and medium
CN113705794A (en)*2021-09-082021-11-26上海交通大学Neural network accelerator design method based on dynamic activation bit sparsity
CN113705784A (en)*2021-08-202021-11-26江南大学Neural network weight coding method based on matrix sharing and hardware system
CN113791754A (en)*2021-09-102021-12-14中科寒武纪科技股份有限公司Arithmetic circuit, chip and board card
CN113902097A (en)*2021-09-302022-01-07南京大学 Run-length coding accelerator and method for sparse CNN neural network model
CN113900803A (en)*2021-09-302022-01-07北京航空航天大学杭州创新研究院MPSoC-oriented sparse network load balancing scheduling method
CN113919477A (en)*2020-07-082022-01-11嘉楠明芯(北京)科技有限公司 A kind of acceleration method and device of convolutional neural network
CN113946538A (en)*2021-09-232022-01-18南京大学Convolutional layer fusion storage device and method based on line cache mechanism
CN114065927A (en)*2021-11-222022-02-18中国工程物理研究院电子工程研究所Excitation data blocking processing method of hardware accelerator and hardware accelerator
CN114092708A (en)*2021-11-122022-02-25北京百度网讯科技有限公司 Feature image processing method, device and storage medium
CN114327629A (en)*2021-12-282022-04-12北京航天自动控制研究所 A Two-Dimensional Multi-Channel Convolution Hardware Accelerator Based on FPGA
CN114595813A (en)*2022-02-142022-06-07清华大学Heterogeneous acceleration processor and data calculation method
EP4007971A1 (en)*2019-09-252022-06-08DeepMind Technologies LimitedFast sparse neural networks
WO2022134465A1 (en)*2020-12-242022-06-30北京清微智能科技有限公司Sparse data processing method for accelerating operation of re-configurable processor, and device
CN114723029A (en)*2022-05-052022-07-08中山大学DCNN accelerator based on hybrid multi-row data flow strategy
CN114742216A (en)*2022-04-192022-07-12南京大学 A Heterogeneous Training Accelerator Based on Reverse Pipeline
CN114780910A (en)*2022-06-162022-07-22千芯半导体科技(北京)有限公司Hardware system and calculation method for sparse convolution calculation
CN114912596A (en)*2022-05-132022-08-16上海交通大学Sparse convolution neural network-oriented multi-chip system and method thereof
CN115145839A (en)*2021-03-312022-10-04广东高云半导体科技股份有限公司Deep convolution accelerator and method for accelerating deep convolution by using same
CN115529475A (en)*2021-12-292022-12-27北京智美互联科技有限公司Method and system for detecting video flow content and controlling wind
CN115829000A (en)*2022-10-312023-03-21杭州嘉楠耘智信息科技有限公司 Data processing method, device, electronic device and storage medium
CN115879530A (en)*2023-03-022023-03-31湖北大学 A method for array structure optimization of RRAM in-memory computing system
CN116029332A (en)*2023-02-222023-04-28南京大学On-chip fine tuning method and device based on LSTM network
CN116261736A (en)*2020-06-122023-06-13墨芯国际有限公司 Method and system for double sparse convolution processing and parallelization
CN116432709A (en)*2023-04-192023-07-14东南大学苏州研究院 A Sparsification Method and Accelerator Design for Object Detection Network
CN116911357A (en)*2023-07-112023-10-20西安电子科技大学 Convolutional computing accelerator and acceleration method based on CSR coding
CN117290279A (en)*2023-11-242023-12-26深存科技(无锡)有限公司Shared tight coupling based general computing accelerator
CN118070855A (en)*2024-04-182024-05-24南京邮电大学Convolutional neural network accelerator based on RISC-V architecture

Citations (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN107229967A (en)*2016-08-222017-10-03北京深鉴智能科技有限公司A kind of hardware accelerator and method that rarefaction GRU neutral nets are realized based on FPGA
CN107239824A (en)*2016-12-052017-10-10北京深鉴智能科技有限公司Apparatus and method for realizing sparse convolution neutral net accelerator
CN108932548A (en)*2018-05-222018-12-04中国科学技术大学苏州研究院A kind of degree of rarefication neural network acceleration system based on FPGA
CN109472350A (en)*2018-10-302019-03-15南京大学 A Neural Network Acceleration System Based on Block Circular Sparse Matrix

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN107229967A (en)*2016-08-222017-10-03北京深鉴智能科技有限公司A kind of hardware accelerator and method that rarefaction GRU neutral nets are realized based on FPGA
CN107239824A (en)*2016-12-052017-10-10北京深鉴智能科技有限公司Apparatus and method for realizing sparse convolution neutral net accelerator
CN108932548A (en)*2018-05-222018-12-04中国科学技术大学苏州研究院A kind of degree of rarefication neural network acceleration system based on FPGA
CN109472350A (en)*2018-10-302019-03-15南京大学 A Neural Network Acceleration System Based on Block Circular Sparse Matrix

Cited By (103)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN110516801A (en)*2019-08-052019-11-29西安交通大学 A High Throughput Dynamically Reconfigurable Convolutional Neural Network Accelerator Architecture
CN110516801B (en)*2019-08-052022-04-22西安交通大学High-throughput-rate dynamic reconfigurable convolutional neural network accelerator
CN110543900A (en)*2019-08-212019-12-06北京市商汤科技开发有限公司Image processing method and device, electronic equipment and storage medium
EP4007971A1 (en)*2019-09-252022-06-08DeepMind Technologies LimitedFast sparse neural networks
CN110738310A (en)*2019-10-082020-01-31清华大学sparse neural network accelerators and implementation method thereof
CN110738310B (en)*2019-10-082022-02-01清华大学Sparse neural network accelerator and implementation method thereof
CN112766453A (en)*2019-10-212021-05-07华为技术有限公司Data processing device and data processing method
CN110807513A (en)*2019-10-232020-02-18中国人民解放军国防科技大学Convolutional neural network accelerator based on Winograd sparse algorithm
CN110852422A (en)*2019-11-122020-02-28吉林大学 Convolutional Neural Network Optimization Method and Device Based on Pulse Array
CN111047008A (en)*2019-11-122020-04-21天津大学Convolutional neural network accelerator and acceleration method
CN111047008B (en)*2019-11-122023-08-01天津大学Convolutional neural network accelerator and acceleration method
CN111079919A (en)*2019-11-212020-04-28清华大学Memory computing architecture supporting weight sparsity and data output method thereof
CN111079919B (en)*2019-11-212022-05-20清华大学 An in-memory computing architecture supporting weight sparse and its data output method
CN111047010A (en)*2019-11-252020-04-21天津大学 Method and device for reducing first-layer convolution calculation delay of CNN accelerator
CN110991631A (en)*2019-11-282020-04-10福州大学Neural network acceleration system based on FPGA
CN111062472A (en)*2019-12-112020-04-24浙江大学 A sparse neural network accelerator based on structured pruning and its acceleration method
CN111178508A (en)*2019-12-272020-05-19珠海亿智电子科技有限公司Operation device and method for executing full connection layer in convolutional neural network
CN111178508B (en)*2019-12-272024-04-05珠海亿智电子科技有限公司Computing device and method for executing full connection layer in convolutional neural network
CN111240743B (en)*2020-01-032022-06-03格兰菲智能科技有限公司Artificial intelligence integrated circuit
CN111240743A (en)*2020-01-032020-06-05上海兆芯集成电路有限公司 artificial intelligence integrated circuit
CN111199277B (en)*2020-01-102023-05-23中山大学Convolutional neural network accelerator
CN111199277A (en)*2020-01-102020-05-26中山大学 A Convolutional Neural Network Accelerator
CN111368988A (en)*2020-02-282020-07-03北京航空航天大学Deep learning training hardware accelerator utilizing sparsity
CN111368988B (en)*2020-02-282022-12-20北京航空航天大学 A Deep Learning Training Hardware Accelerator Exploiting Sparsity
CN111401554B (en)*2020-03-122023-03-24交叉信息核心技术研究院(西安)有限公司Accelerator of convolutional neural network supporting multi-granularity sparsity and multi-mode quantization
CN111401554A (en)*2020-03-122020-07-10交叉信息核心技术研究院(西安)有限公司Accelerator of convolutional neural network supporting multi-granularity sparsity and multi-mode quantization
CN111415004A (en)*2020-03-172020-07-14北京百度网讯科技有限公司 Method and apparatus for outputting information
CN111415004B (en)*2020-03-172023-11-03阿波罗智联(北京)科技有限公司Method and device for outputting information
CN113496274A (en)*2020-03-202021-10-12郑桂忠Quantification method and system based on operation circuit architecture in memory
CN111445013A (en)*2020-04-282020-07-24南京大学 A non-zero detector for convolutional neural networks and its method
CN111445012A (en)*2020-04-282020-07-24南京大学 An FPGA-based packet convolution hardware accelerator and method thereof
CN111401532A (en)*2020-04-282020-07-10南京宁麒智能计算芯片研究院有限公司Convolutional neural network reasoning accelerator and acceleration method
CN111738433A (en)*2020-05-222020-10-02华南理工大学 A Reconfigurable Convolution Hardware Accelerator
CN111738433B (en)*2020-05-222023-09-26华南理工大学Reconfigurable convolution hardware accelerator
CN111667052B (en)*2020-05-272023-04-25上海赛昉科技有限公司Standard and nonstandard convolution consistency transformation method of special neural network accelerator
CN111667052A (en)*2020-05-272020-09-15上海赛昉科技有限公司Standard and nonstandard volume consistency transformation method for special neural network accelerator
CN111667051A (en)*2020-05-272020-09-15上海赛昉科技有限公司Neural network accelerator suitable for edge equipment and neural network acceleration calculation method
CN111782356A (en)*2020-06-032020-10-16上海交通大学 Data flow method and system for weight sparse neural network chip
CN111782356B (en)*2020-06-032022-04-08上海交通大学Data flow method and system of weight sparse neural network chip
CN111882028A (en)*2020-06-082020-11-03北京大学深圳研究生院Convolution operation device for convolution neural network
CN116261736A (en)*2020-06-122023-06-13墨芯国际有限公司 Method and system for double sparse convolution processing and parallelization
CN113919477A (en)*2020-07-082022-01-11嘉楠明芯(北京)科技有限公司 A kind of acceleration method and device of convolutional neural network
CN111967587A (en)*2020-07-272020-11-20复旦大学Arithmetic unit array structure for neural network processing
CN111967587B (en)*2020-07-272024-03-29复旦大学 A method of constructing an arithmetic unit array structure for neural network processing
CN111914999A (en)*2020-07-302020-11-10云知声智能科技股份有限公司Method and equipment for reducing calculation bandwidth of neural network accelerator
CN111914999B (en)*2020-07-302024-04-19云知声智能科技股份有限公司Method and equipment for reducing calculation bandwidth of neural network accelerator
CN112052941B (en)*2020-09-102024-02-20南京大学Efficient memory calculation system applied to CNN (computer numerical network) convolution layer and operation method thereof
CN112052941A (en)*2020-09-102020-12-08南京大学 An efficient storage and computing system and its computing method applied to the convolutional layer of CNN network
CN112418417B (en)*2020-09-242024-02-27北京计算机技术及应用研究所Convolutional neural network acceleration device and method based on SIMD technology
CN112418417A (en)*2020-09-242021-02-26北京计算机技术及应用研究所Convolution neural network acceleration device and method based on SIMD technology
CN112506436A (en)*2020-12-112021-03-16西北工业大学High-efficiency data dynamic storage allocation method for convolutional neural network accelerator
CN112506436B (en)*2020-12-112023-01-31西北工业大学High-efficiency data dynamic storage allocation method for convolutional neural network accelerator
CN113159302A (en)*2020-12-152021-07-23浙江大学Routing structure for reconfigurable neural network processor
WO2022134465A1 (en)*2020-12-242022-06-30北京清微智能科技有限公司Sparse data processing method for accelerating operation of re-configurable processor, and device
CN112836803A (en)*2021-02-042021-05-25珠海亿智电子科技有限公司Data placement method for improving convolution operation efficiency
CN115145839A (en)*2021-03-312022-10-04广东高云半导体科技股份有限公司Deep convolution accelerator and method for accelerating deep convolution by using same
CN115145839B (en)*2021-03-312024-05-14广东高云半导体科技股份有限公司Depth convolution accelerator and method for accelerating depth convolution
CN113077047B (en)*2021-04-082023-08-22华南理工大学Convolutional neural network accelerator based on feature map sparsity
CN113077047A (en)*2021-04-082021-07-06华南理工大学Convolutional neural network accelerator based on feature map sparsity
CN113128688A (en)*2021-04-142021-07-16北京航空航天大学General AI parallel reasoning acceleration structure and reasoning equipment
CN113128688B (en)*2021-04-142022-10-21北京航空航天大学General AI parallel reasoning acceleration structure and reasoning equipment
CN113191493B (en)*2021-04-272024-05-28北京工业大学Convolutional neural network accelerator based on FPGA parallelism self-adaption
CN112989270A (en)*2021-04-272021-06-18南京风兴科技有限公司Convolution calculating device based on hybrid parallel
CN112989270B (en)*2021-04-272024-11-22南京风兴科技有限公司 A convolution computing device based on hybrid parallelism
CN113191493A (en)*2021-04-272021-07-30北京工业大学Convolutional neural network accelerator based on FPGA parallelism self-adaptation
CN113435570B (en)*2021-05-072024-05-31西安电子科技大学Programmable convolutional neural network processor, method, device, medium and terminal
CN113435570A (en)*2021-05-072021-09-24西安电子科技大学Programmable convolutional neural network processor, method, device, medium, and terminal
CN113313251B (en)*2021-05-132023-05-23中国科学院计算技术研究所Depth separable convolution fusion method and system based on data flow architecture
CN113313251A (en)*2021-05-132021-08-27中国科学院计算技术研究所Deep separable convolution fusion method and system based on data stream architecture
CN113486200A (en)*2021-07-122021-10-08北京大学深圳研究生院Data processing method, processor and system of sparse neural network
CN113591025A (en)*2021-08-032021-11-02深圳思谋信息科技有限公司Feature map processing method and device, convolutional neural network accelerator and medium
CN113705784A (en)*2021-08-202021-11-26江南大学Neural network weight coding method based on matrix sharing and hardware system
CN113705784B (en)*2021-08-202025-01-24江南大学 A neural network weight encoding method and hardware system based on matrix sharing
CN113705794A (en)*2021-09-082021-11-26上海交通大学Neural network accelerator design method based on dynamic activation bit sparsity
CN113705794B (en)*2021-09-082023-09-01上海交通大学 A Neural Network Accelerator Design Method Based on Dynamic Activation Bit Sparse
CN113791754A (en)*2021-09-102021-12-14中科寒武纪科技股份有限公司Arithmetic circuit, chip and board card
CN113946538A (en)*2021-09-232022-01-18南京大学Convolutional layer fusion storage device and method based on line cache mechanism
CN113946538B (en)*2021-09-232024-04-12南京大学Convolutional layer fusion storage device and method based on line caching mechanism
CN113900803A (en)*2021-09-302022-01-07北京航空航天大学杭州创新研究院MPSoC-oriented sparse network load balancing scheduling method
CN113902097A (en)*2021-09-302022-01-07南京大学 Run-length coding accelerator and method for sparse CNN neural network model
CN114092708A (en)*2021-11-122022-02-25北京百度网讯科技有限公司 Feature image processing method, device and storage medium
CN114065927B (en)*2021-11-222023-05-05中国工程物理研究院电子工程研究所Excitation data block processing method of hardware accelerator and hardware accelerator
CN114065927A (en)*2021-11-222022-02-18中国工程物理研究院电子工程研究所Excitation data blocking processing method of hardware accelerator and hardware accelerator
CN114327629A (en)*2021-12-282022-04-12北京航天自动控制研究所 A Two-Dimensional Multi-Channel Convolution Hardware Accelerator Based on FPGA
CN115529475A (en)*2021-12-292022-12-27北京智美互联科技有限公司Method and system for detecting video flow content and controlling wind
CN114595813B (en)*2022-02-142024-09-06清华大学 Heterogeneous acceleration processor and data computing method
CN114595813A (en)*2022-02-142022-06-07清华大学Heterogeneous acceleration processor and data calculation method
CN114742216A (en)*2022-04-192022-07-12南京大学 A Heterogeneous Training Accelerator Based on Reverse Pipeline
CN114742216B (en)*2022-04-192025-06-10南京大学 A heterogeneous training accelerator based on reverse pipeline
CN114723029B (en)*2022-05-052025-04-25中山大学 A DCNN accelerator based on hybrid multi-line data flow strategy
CN114723029A (en)*2022-05-052022-07-08中山大学DCNN accelerator based on hybrid multi-row data flow strategy
CN114912596A (en)*2022-05-132022-08-16上海交通大学Sparse convolution neural network-oriented multi-chip system and method thereof
CN114780910A (en)*2022-06-162022-07-22千芯半导体科技(北京)有限公司Hardware system and calculation method for sparse convolution calculation
CN115829000A (en)*2022-10-312023-03-21杭州嘉楠耘智信息科技有限公司 Data processing method, device, electronic device and storage medium
CN116029332B (en)*2023-02-222023-08-22南京大学 On-chip fine-tuning method and device based on LSTM network
CN116029332A (en)*2023-02-222023-04-28南京大学On-chip fine tuning method and device based on LSTM network
CN115879530A (en)*2023-03-022023-03-31湖北大学 A method for array structure optimization of RRAM in-memory computing system
CN116432709A (en)*2023-04-192023-07-14东南大学苏州研究院 A Sparsification Method and Accelerator Design for Object Detection Network
CN116911357A (en)*2023-07-112023-10-20西安电子科技大学 Convolutional computing accelerator and acceleration method based on CSR coding
CN116911357B (en)*2023-07-112025-09-30西安电子科技大学 Convolution calculation accelerator and acceleration method based on CSR coding
CN117290279B (en)*2023-11-242024-01-26深存科技(无锡)有限公司Shared tight coupling based general computing accelerator
CN117290279A (en)*2023-11-242023-12-26深存科技(无锡)有限公司Shared tight coupling based general computing accelerator
CN118070855A (en)*2024-04-182024-05-24南京邮电大学Convolutional neural network accelerator based on RISC-V architecture

Similar Documents

PublicationPublication DateTitle
CN109993297A (en)A kind of the sparse convolution neural network accelerator and its accelerated method of load balancing
JP7266065B2 (en) System, computer implementation method and computer program for deep neural networks
KR102120396B1 (en) Accelerator for deep neural networks
CN111626414B (en)Dynamic multi-precision neural network acceleration unit
CN106779060B (en)A kind of calculation method for the depth convolutional neural networks realized suitable for hardware design
US8468109B2 (en)Architecture, system and method for artificial neural network implementation
CN111062472B (en) A Sparse Neural Network Accelerator and Acceleration Method Based on Structured Pruning
Moini et al.A resource-limited hardware accelerator for convolutional neural networks in embedded vision applications
JP2021515300A (en) Neural network accelerator
CN113240570B (en) A GEMM computing accelerator and an image processing acceleration method based on GoogLeNet
CN109472356A (en)A kind of accelerator and method of restructural neural network algorithm
CN109472350A (en) A Neural Network Acceleration System Based on Block Circular Sparse Matrix
CN109102065A (en)A kind of convolutional neural networks accelerator based on PSoC
CN107301456A (en)Deep neural network multinuclear based on vector processor speeds up to method
CN106875012A (en)A kind of streamlined acceleration system of the depth convolutional neural networks based on FPGA
KR20130090147A (en)Neural network computing apparatus and system, and method thereof
US11886359B2 (en)AI accelerator apparatus using in-memory compute chiplet devices for transformer workloads
CN209231976U (en)A kind of accelerator of restructural neural network algorithm
CN113486298A (en)Model compression method and matrix multiplication module based on Transformer neural network
CN112418396A (en) A sparse activation-aware neural network accelerator based on FPGA
CN108170640A (en)The method of its progress operation of neural network computing device and application
CN115357850A (en) Transformer model irregular sparse matrix multiplication method and hardware architecture
CN114201287A (en)Method for cooperatively processing data based on CPU + GPU heterogeneous platform
Yazdani et al.LSTM-sharp: An adaptable, energy-efficient hardware accelerator for long short-term memory
CN116484909A (en)Vector engine processing method and device for artificial intelligent chip

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
RJ01Rejection of invention patent application after publication
RJ01Rejection of invention patent application after publication

Application publication date:20190709


[8]ページ先頭

©2009-2025 Movatter.jp