Movatterモバイル変換


[0]ホーム

URL:


CN106991472A - A kind of fusion ReLU activation primitives and the vectorization implementation method in maximum pond - Google Patents

A kind of fusion ReLU activation primitives and the vectorization implementation method in maximum pond
Download PDF

Info

Publication number
CN106991472A
CN106991472ACN201710201376.2ACN201710201376ACN106991472ACN 106991472 ACN106991472 ACN 106991472ACN 201710201376 ACN201710201376 ACN 201710201376ACN 106991472 ACN106991472 ACN 106991472A
Authority
CN
China
Prior art keywords
maximum
matrix
pond
relu activation
maximum pond
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710201376.2A
Other languages
Chinese (zh)
Inventor
郭阳
张军阳
扈啸
王慧丽
胡敏慧
王子聪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense TechnologyfiledCriticalNational University of Defense Technology
Priority to CN201710201376.2ApriorityCriticalpatent/CN106991472A/en
Publication of CN106991472ApublicationCriticalpatent/CN106991472A/en
Pendinglegal-statusCriticalCurrent

Links

Classifications

Landscapes

Abstract

Translated fromChinese

本发明公开了一种融合ReLU激活函数与最大值池化的向量化实现方法,其步骤为:S1:计算矩阵A的ReLU激活函数值;S2:计算步骤S1中ReLU激活函数处理后的矩阵的最大值池化;S3:重复步骤S1和步骤S2直至遍历完矩阵A的所有子块,最终完成整个矩阵A的ReLU激活函数处理和最大值池化操作。本发明具有原理简单、实现方便、能充分挖掘向量处理器的并行计算能力及算法的并行性等优点。

The present invention discloses a vectorized implementation method of integrating ReLU activation function and maximum pooling, the steps of which are: S1: calculate the ReLU activation function value of matrix A; S2: calculate the value of the matrix processed by the ReLU activation function in step S1 Maximum value pooling; S3: Repeat step S1 and step S2 until all sub-blocks of matrix A are traversed, and finally the ReLU activation function processing and maximum value pooling operation of the entire matrix A are completed. The invention has the advantages of simple principle, convenient implementation, and can fully exploit the parallel computing capability of the vector processor and the parallelism of the algorithm.

Description

A kind of fusion ReLU activation primitives and the vectorization implementation method in maximum pond
Technical field
Present invention relates generally to convolutional neural networks technical field, a kind of fusion ReLU activation primitives and maximum are refered in particular toThe vectorization implementation method in pond.
Background technology
In the 1960s, Hubel and Wiesel is used for the god of local sensitivity and set direction in research cat cortexFind that its unique network structure can be effectively reduced the complexity of Feedback Neural Network during through member, then propose convolution godThrough network (Convolutional Neural Network, CNN).Currently, convolutional neural networks have become numerous subject necksOne of the study hotspot in domain, particularly in pattern classification field, because the network avoids the complicated early stage pretreatment to image,Original image can be directly inputted, thus has obtained more being widely applied.
Usually, typical convolutional neural networks computation model include convolutional layer, pond layer, full articulamentum and afterContinuous grader, such as SVMs (Support Vector Machine, SVM).Wherein related in convolutional neural networks modelAnd to calculating type mainly have:The convolutional calculation of matrix, the processing of activation primitive;Such as, linear activation primitive f (x)=x orNonlinear activation functionDeng, and matrix pondization operation, including maximum pond (max pooling) andAverage value pond (average pooling), it is final to convolution god finally by matrix operation and some processing surmounted functionThe output of model through network is predicted, and completes the process of object correlation identification.Because convolutional neural networks model is by notSame convolutional layer and pond layer alternating iteration, therefore the amount of calculation of convolutional neural networks model is very huge.Therefore, how to accelerateThe computational efficiency of the model is all an important research contents in current academia and industrial quarters.
The activation primitive model used in current convolutional neural networks model mainly includes linear activation primitive and non-linearThe major class of activation primitive two, about ten is as many as several, and correct linear unit, i.e. ReLU (Rectified Linear Units,ReLU) one kind of activation primitive exactly most common of which, its mathematic(al) representation be f (x)=max (0, x), it can be seen that work as inputWhen signal x is less than 0, output is all 0, during more than 0, and output is equal to input.The outstanding advantages of ReLU functions are that one side suppresses;PhaseThe excited border broad to other activation primitives, with characteristics such as sparse activities.In terms of Neuscience, neuroscientistIt also found the sparse activity of neuron, 2001, on the observational learning that Attwell et al. is consumed based on cerebral energy, pushed awaySurveying neuron coding work mode has an openness and distributivity, the god that Lennie in 2003 et al. estimation brains are activated simultaneouslyThrough member only 1~4%, the openness of neuron work is further demonstrated that.In terms of signal, i.e., neuron is simultaneously only to defeatedEnter the small part selective response of signal, a large amount of signals can so improve the precision of study by shielding deliberately, more preferably,Quickly extract sparse features.Therefore, from the point of view of this openness angle, ReLU functions are into approximately meeting human neuronal mouldThe best model of type.
In convolutional neural networks model, view data after being handled by activation primitive, it is necessary to carry out the calculating of next stage,That is, pondization is operated, and pondization operation mainly includes maximum pondization and average value pond, and maximum pond refers to take out pond windowIn maximum as the pond window output, and average value pond refer to take out pond window in all elements average valueIt is used as the output of the pond window.Either average value pondization or maximum pond, its purpose are provided to not significantlyInfluence farthest to reduce the dimension of image array on the premise of Model Identification precision, amount of calculation is reduced, and also to keep awayExempt from model and over-fitting occur.
Convolutional neural networks are one of the computing modules commonly used during current high performance is calculated, be typical memory access it is intensive andCompute-intensive applications, calculating unit and memory bandwidth to processor require very high, and computation complexity is very big, current main-streamAcceleration platform have the convolutional neural networks calculating platform based on GPU, the convolutional neural networks calculating platform based on FPGA, be based onThe calculating platform of special neutral net accelerator and convolutional Neural net is accelerated based on universal cpu or some vector processorsThe calculating of network model.Vector processor is a kind of processor of multipurpose multifunctional operating system, generally comprises Vector Processing part (VectorProcessing Unit, VPU) and scalar processor unit (Scalar Processing Unit), Vector Processing part it is main bySeveral vector processing units (Vector Pocessing Element, VPE) constitute computing array, are mainly responsible for gaugeCalculate, each VPE includes other work(such as multiple isomorphism calculation function parts such as some MAC0, MAC1, and ALU, position processing (BP)Can part;Scalar processing unit is mainly responsible for calculating task and stream is controlled, and VPU and SPU can carry out data channel transmission and exchange.There is provided the special vectorial memory bank of Large Copacity by the Load and Store of vector data access unit supporting vector data.
The content of the invention
The technical problem to be solved in the present invention is that:The technical problem existed for prior art, the present invention provides oneKind principle is simple, it is convenient to realize, can fully excavate the fusion of the computation capability of vector processor and the concurrency of algorithmReLU activation primitives and the vectorization implementation method in maximum pond, i.e., grasped by merging ReLU activation primitives and maximum pondizationMake to reduce the memory access amount of data, and then shorten the calculating time of convolutional neural networks, improve the meter of convolutional neural networks modelCalculate efficiency.
A kind of fusion ReLU activation primitives and the vectorization implementation method in maximum pond
In order to solve the above technical problems, the present invention uses following technical scheme:
The vectorization implementation method in a kind of fusion ReLU activation primitives and maximum pond, its step is:
S1:Calculating matrix A ReLU activation primitive values;
S2:The maximum pond of matrix in calculation procedure S1 after the processing of ReLU activation primitives;
S3:All sub-blocks of the repeat step S1 and step S2 up to having traveled through matrix A, are finally completed whole matrix AThe processing of ReLU activation primitives and the operation of maximum pondization.
As a further improvement on the present invention:The step S1's concretely comprises the following steps:
S1.1 sets the matrix for needing to carry out activation primitive processing after convolution operation as A (M, N), and ReLU activation primitives are f (x)(0, x), vector processing unit VPE number is p to=max, and it is p, k to take Nx、kyIntegral multiple, maximum pond window be kx×ky
S1.2 instructs the first row element for taking matrix A using vectorial VLOAD;
S1.3 compares size instruction VFCMPGD using vector, compares the size of vector registor, the logical value of comparative resultIt is put into PSW;
S1.4 use conditions vector assignment instructs VMOV, takes out the value in step 1.3 more than 0 and is put into vector registor;
S1.5 draws the result after the processing of ReLU activation primitives;
S1.6 is according to maximum pond window k, and repeat step draws the Relu activation of A matrix k row elements for 1.2 to 1.5k timesFunction operation, is as a result stored in vector registor, directly as the input value in maximum pond in step S2.
As a further improvement on the present invention:The step S2's concretely comprises the following steps:
S2.1 takes the k row elements calculated in step S 1.6, directly as the input of this calculating;
S2.2 makes comparisons the 1st row element with the 2nd row element, and the logical value of comparative result is put into PSW;
S2.3 use conditions vector assignment instructs VMOV;
S2.4 draws the corresponding row maximum of k row elements by comparing k-1 times;
S2.5 configures shuffle mode, compares the maximum for drawing corresponding k column elements in step S 2.4;
S2.6 finally show that p/k pond window size is k simultaneouslyx×kyMaximum pond result.
As a further improvement on the present invention:A maximum pond result c in the step S 2.50,0Calculating it is publicFormula is:
Wherein c0,0For first element in the matrix of consequence of maximum pond, kx、kyFor the size of pond window, in convolutionIn neutral net, pond window is square formation, i.e. kx=ky=k, ai,jTo need to carry out the element in the matrix A in maximum pond.
As a further improvement on the present invention:Defined in the above-mentioned steps size of pond window be sizeX, sizeY,The horizontal displacement of two adjacent pool windows or vertical displacement are stride, and pond window is not overlapping during maximum pondization is operated,That is sizeX=sizeY=stride.
Compared with prior art, the advantage of the invention is that:A kind of fusion ReLU activation primitives and maximum of the present inventionThe vectorization implementation method in pond, stream is calculated by the way that the operation of ReLU activation primitives and the calculating of maximum pond are fused into oneJourney, it is to avoid the most STORE and LOAD of time-consuming intermediate calculation data, while also make full use of the vectorial portion in vector processorThe characteristics of multiple parallel processing elements of part can carry out identical operation operation simultaneously carries out substantial amounts of same type operation, so that significantlyDegree improves the computational efficiency of convolutional neural networks model, and step is simple, it is easy to accomplish.
Brief description of the drawings
Fig. 1 is the schematic flow sheet of the inventive method.
Fig. 2 is the general structure schematic diagram of vector processor.
Fig. 3 is the present invention 2 × 2 maximum pond schematic diagram in concrete application example.
Fig. 4 is the RuLU activation primitive image schematic diagrames that the present invention is used in concrete application example.
Fig. 5 is present invention ReLU activation primitives vectorization implementation process schematic diagram in concrete application example.
Fig. 6 is the present invention 2 × 2 maximum pond vectorization implementation process schematic diagram in concrete application example.
Fig. 7 be the present invention in concrete application example maximum pondization operate in pond window not overlap operation schematic diagram.
Embodiment
The present invention is described in further details below with reference to Figure of description and specific embodiment.
As shown in Figure 1 and Figure 4, a kind of fusion ReLU activation primitives of the invention and the vectorization realization side in maximum pondMethod, its step is:
S1:Calculating matrix A ReLU activation primitive values;
S1.1 sets the matrix for needing to carry out activation primitive processing after convolution operation as A (M, N), and ReLU activation primitives are f (x)(0, x), vector processing unit VPE number is p to=max, and it is p, k typically to take Nx、kyIntegral multiple, maximum pond window iskx×ky
S1.2 instructs the first row element for taking matrix A using vectorial VLOAD;Such as take into vector registor VR10, makeIt is 0, i.e. VMOVI 0, VR20 with one vector registor VR20 of VMOVI instruction initialization;
S1.3 compares size instruction VFCMPGD using vector, compares vector registor VR10 and VR20 size, compares knotThe logical value of fruit is put into PSW, such as VR0;VFCMPGD VR10, VR20, VR0, if VR10 [i]>VR20[i],1≤i≤ p, then VR0 [i]=1, otherwise VR0 [i]=0;
S1.4 use conditions vector assignment instructs VMOV, takes out the value in step 1.3 more than 0 and is put into vector registor,Computations is:[VR0] VMOV VR10, VR20, p numerical value can be calculated simultaneously by being instructed by this condition adeleReLU activation primitive values, the numerical value that 0 is more than in VR10 are put into VR20, the numerical value less than 0 is set to 0;
S1.5 draws the result VR20 after the processing of ReLU activation primitives;
S1.6 is according to maximum pond window k, and repeat step draws the Relu activation of A matrix k row elements for 1.2 to 1.5k timesFunction operation, is as a result stored in vector registor, it is not necessary to store, directly as the input in maximum pond in step S2Value.
S2:The maximum pond of matrix in calculation procedure S1 after the processing of ReLU activation primitives;
S2.1 takes the k row elements calculated in step S 1.6, is posted because the result in step S 1.6 is stored directly inIn storage, therefore it directly as the input of this calculating, this process avoids the time data memory in step S 1.6 and stepThe data LOAD times in rapid S 2.2, therefore, the calculating time is reduced accordingly.
S2.2 makes comparisons the 1st row element with the 2nd row element, and the logical value of comparative result is put into PSW, such asIn VR1, VFCMPGD VR20, VR21, VR1, if VR20 [i]>VR21 [i], 1≤i≤p, then VR0 [i]=1, otherwise VR0 [i]=0;
S2.3 use conditions vector assignment instructs VMOV, takes out corresponding to the conditional register VR0 [i]=1 of step S 2.2VPE in value VR20 [i] be assigned to corresponding VR21 [i], then value bigger than VR20 [i] in VR21 [i] is kept constant.
S2.4 draws the corresponding row maximum of k row elements by comparing k-1 times.
S2.5 configures shuffle mode, compares the maximum for drawing corresponding k column elements in step S 2.4;
S2.6 finally show that p/k pond window size is k simultaneouslyx×kyMaximum pond result;
S3:All sub-blocks of the repeat step S1 and step S2 up to having traveled through matrix A, are finally completed whole matrix AThe processing of ReLU activation primitives and the operation of maximum pondization.
The present invention is mainly suitable for vector processor, as shown in Fig. 2 being the general structure schematic diagram of vector processor.In concrete application example, a maximum pond result c in the step S 2.50,0Calculation formula be:
Wherein c0,0For first element in the matrix of consequence of maximum pond, kx、kyFor the size of pond window, in convolutionIn neutral net, pond window is generally square formation, i.e. kx=ky=k, ai,jTo need to carry out the member in the matrix A in maximum pondElement, its maximum pond schematic flow sheet is as shown in Figure 3.
In concrete application example, the size of pond window defined in above-mentioned steps is sizeX, sizeY, and two adjacentThe horizontal displacement of pond window or vertical displacement are stride, and pond window is not overlapping in the operation of maximum pondization, i.e. sizeX=SizeY=stride, as shown in Figure 7.
As shown in Figure 5, Figure 6, the present invention is in a concrete application example, and detailed step is:
S100:Calculating matrix A ReLU activation primitive values;
S1.1 sets the matrix for needing to carry out activation primitive processing after convolution operation as A (16,16), and ReLU activation primitives are f(x) (0, x), vector processing unit VPE number p is 16, maximum pond window k to=maxx=ky=2;
S1.2 takes 16 elements of the 1st row of matrix A using vectorial VLOAD instructions into vector registor VR10, the 2nd row16 elements, into VR11, are 0 using vector assignment instruction VMOVI instruction initialization 2 vector registors VR20, VR21, i.e.,VMOVI 0, VR20, VMOVI 0, VR21;
S1.3 compares size instruction VFCMPGD using vector, compares vector registor VR10 and VR20, VR11 and VR21Size, the logical value of comparative result is respectively put into PSW VR0, VR1;VFCMPGD VR10,VR20,VR0、VFCMPGD VR11, VR21, VR1, if VR10 [i]>VR20 [i], (1≤i≤16), then VR0 [i]=1, otherwise VR0 [i]=0,Similarly VR1 [i]=1, otherwise VR1 [i]=0;
S1.4 use conditions vector assignment instructs VMOV, takes out the value in step 1.3 more than or equal to 0 and is put into vector registorIn, computations is:[VR0] VMOV VR10, VR20, [VR1] VMOV VR11, VR21, are counted simultaneously by condition assignment directiveCalculate the Relu activation primitive values that matrix A front two row element amounts to 32 elements;
The ReLU activation primitive values of S1.5 matrix A front two row elements are put into vector registor VR20, VR21;
S200:The maximum pond of matrix in calculation procedure S100 after the processing of ReLU activation primitives;
S2.1 is according to maximum pond window size kx=ky=2, the front two row element that step S1.5 is calculated is taken out,That is VR20 and VR21, is used as the input of maximum pond layer;
S2.2 compares the 1st row element VR20 and the 2nd row element VR21, and the logical value of comparative result is put into condition depositIn device VR2, computations is:VFCMPGD VR20, VR21, VR2, if VR20 [i]>VR21 [i], 1≤i≤p, then VR2 [i]=1, otherwise VR2 [i]=0;
S2.3 use conditions vector assignment instructs VMOV, takes out corresponding to step S2.3 conditional register VR0 [i]=1VPE in value VR20 [i] be assigned to corresponding VR21 [i], then value bigger than VR20 [i] in VR21 [i] is kept constant;
S2.4 compares 1 time, draws the corresponding row maximum of 2 row elements;
S2.5 configures corresponding shuffle mode, compares the maximum for drawing corresponding adjacent 2 column element in step S2.4;
S2.6 finally show that (16/2) 8 pond window size is 2 × 2 maximum pond results simultaneously;
S300:Repeat step S100 and step S200, until having traveled through all sub-blocks of matrix A, is finally completed whole squareThe battle array A processing of ReLU activation primitives and the operation of maximum pondization.
It the above is only the preferred embodiment of the present invention, protection scope of the present invention is not limited merely to above-described embodiment,All technical schemes belonged under thinking of the present invention belong to protection scope of the present invention.It should be pointed out that for the artFor those of ordinary skill, some improvements and modifications without departing from the principles of the present invention should be regarded as the protection of the present inventionScope.

Claims (5)

CN201710201376.2A2017-03-302017-03-30A kind of fusion ReLU activation primitives and the vectorization implementation method in maximum pondPendingCN106991472A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201710201376.2ACN106991472A (en)2017-03-302017-03-30A kind of fusion ReLU activation primitives and the vectorization implementation method in maximum pond

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201710201376.2ACN106991472A (en)2017-03-302017-03-30A kind of fusion ReLU activation primitives and the vectorization implementation method in maximum pond

Publications (1)

Publication NumberPublication Date
CN106991472Atrue CN106991472A (en)2017-07-28

Family

ID=59411852

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201710201376.2APendingCN106991472A (en)2017-03-302017-03-30A kind of fusion ReLU activation primitives and the vectorization implementation method in maximum pond

Country Status (1)

CountryLink
CN (1)CN106991472A (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN108205703A (en)*2017-12-292018-06-26中国人民解放军国防科技大学Multi-input multi-output matrix average value pooling vectorization implementation method
CN109165733A (en)*2018-07-112019-01-08中国人民解放军国防科技大学 Multi-input and multi-output matrix maximum pooling vectorization implementation method
CN109583561A (en)*2017-09-282019-04-05杭州海康威视数字技术股份有限公司A kind of the activation amount quantization method and device of deep neural network
CN109685058A (en)*2017-10-182019-04-26杭州海康威视数字技术股份有限公司A kind of images steganalysis method, apparatus and computer equipment
CN109727376A (en)*2018-12-292019-05-07北京沃东天骏信息技术有限公司Generate the method, apparatus and selling apparatus of configuration file
CN109754359A (en)*2017-11-012019-05-14腾讯科技(深圳)有限公司 A method and system for pooling processing applied to convolutional neural networks
CN110796236A (en)*2019-10-212020-02-14中国人民解放军国防科技大学 A vectorized implementation method for pooling of multi-sample and multi-channel convolutional neural networks
WO2021035621A1 (en)*2019-08-292021-03-04深圳市大疆创新科技有限公司Extreme point extraction method and apparatus, and computer-readable storage medium
CN112598640A (en)*2020-12-222021-04-02哈尔滨市科佳通用机电股份有限公司Water filling port cover plate loss detection method based on deep learning
US10970604B2 (en)2018-09-272021-04-06Industrial Technology Research InstituteFusion-based classifier, classification method, and classification system
CN113762452A (en)*2020-06-042021-12-07合肥君正科技有限公司Method for quantizing PRELU activation function
CN113892092A (en)*2019-02-062022-01-04瀚博控股公司Method and system for convolution model hardware accelerator

Cited By (19)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN109583561B (en)*2017-09-282021-05-07杭州海康威视数字技术股份有限公司Activation quantity quantification method and device for deep neural network
CN109583561A (en)*2017-09-282019-04-05杭州海康威视数字技术股份有限公司A kind of the activation amount quantization method and device of deep neural network
CN109685058A (en)*2017-10-182019-04-26杭州海康威视数字技术股份有限公司A kind of images steganalysis method, apparatus and computer equipment
US11347977B2 (en)2017-10-182022-05-31Hangzhou Hikvision Digital Technology Co., Ltd.Lateral and longitudinal feature based image object recognition method, computer device, and non-transitory computer readable storage medium
CN109754359A (en)*2017-11-012019-05-14腾讯科技(深圳)有限公司 A method and system for pooling processing applied to convolutional neural networks
US11537857B2 (en)2017-11-012022-12-27Tencent Technology (Shenzhen) Company LimitedPooling processing method and system applied to convolutional neural network
US11734554B2 (en)2017-11-012023-08-22Tencent Technology (Shenzhen) Company LimitedPooling processing method and system applied to convolutional neural network
CN108205703A (en)*2017-12-292018-06-26中国人民解放军国防科技大学Multi-input multi-output matrix average value pooling vectorization implementation method
CN109165733A (en)*2018-07-112019-01-08中国人民解放军国防科技大学 Multi-input and multi-output matrix maximum pooling vectorization implementation method
US10970604B2 (en)2018-09-272021-04-06Industrial Technology Research InstituteFusion-based classifier, classification method, and classification system
CN109727376A (en)*2018-12-292019-05-07北京沃东天骏信息技术有限公司Generate the method, apparatus and selling apparatus of configuration file
CN113892092A (en)*2019-02-062022-01-04瀚博控股公司Method and system for convolution model hardware accelerator
WO2021035621A1 (en)*2019-08-292021-03-04深圳市大疆创新科技有限公司Extreme point extraction method and apparatus, and computer-readable storage medium
CN110796236B (en)*2019-10-212022-06-17中国人民解放军国防科技大学Vectorization implementation method for pooling of multi-sample multi-channel convolutional neural network
CN110796236A (en)*2019-10-212020-02-14中国人民解放军国防科技大学 A vectorized implementation method for pooling of multi-sample and multi-channel convolutional neural networks
CN113762452A (en)*2020-06-042021-12-07合肥君正科技有限公司Method for quantizing PRELU activation function
CN113762452B (en)*2020-06-042024-01-02合肥君正科技有限公司Method for quantizing PRELU activation function
CN112598640B (en)*2020-12-222021-09-14哈尔滨市科佳通用机电股份有限公司Water filling port cover plate loss detection method based on deep learning
CN112598640A (en)*2020-12-222021-04-02哈尔滨市科佳通用机电股份有限公司Water filling port cover plate loss detection method based on deep learning

Similar Documents

PublicationPublication DateTitle
CN106991472A (en)A kind of fusion ReLU activation primitives and the vectorization implementation method in maximum pond
CN107578098B (en) Neural Network Processor Based on Systolic Array
US12307252B2 (en)Deep vision processor
EP4123515B1 (en)Data processing method and data processing device
Cong et al.Minimizing computation in convolutional neural networks
US10846591B2 (en)Configurable and programmable multi-core architecture with a specialized instruction set for embedded application based on neural networks
CN107844826B (en)Neural network processing unit and processing system comprising same
CN107862374B (en)Neural network processing system and processing method based on assembly line
US12327181B2 (en)Neural network apparatus performing floating-point operation and operating method of the same
CN106970896B (en)Vector processor-oriented vectorization implementation method for two-dimensional matrix convolution
CN113449859B (en) A data processing method and device thereof
CN110121721A (en)The framework accelerated for sparse neural network
Liu et al.An FPGA-based processor for training convolutional neural networks
CN107609641A (en)Sparse neural network framework and its implementation
CN109325591A (en) A neural network processor for Winograd convolution
CN108629406B (en)Arithmetic device for convolutional neural network
CN109086244A (en) A Vectorization Implementation Method of Matrix Convolution Based on Vector Processor
US11836463B2 (en)Method and apparatus with neural network processing
CN107301456A (en)Deep neural network multinuclear based on vector processor speeds up to method
CN106959937B (en)A kind of vectorization implementation method of the warp product matrix towards GPDSP
CN108205703B (en)Multi-input multi-output matrix average value pooling vectorization implementation method
CN110796236B (en)Vectorization implementation method for pooling of multi-sample multi-channel convolutional neural network
CN109165733A (en) Multi-input and multi-output matrix maximum pooling vectorization implementation method
CN118211623A (en) Software and hardware accelerated recurrent neural network acceleration system, method and storage medium
Li et al.A CPU-based algorithm for traffic optimization based on sparse convolutional neural networks

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
RJ01Rejection of invention patent application after publication

Application publication date:20170728

RJ01Rejection of invention patent application after publication

[8]ページ先頭

©2009-2025 Movatter.jp