技术领域technical field
本发明属于数字图像处理、模式识别技术领域,具体涉及一种在FPGA硬件平台上实现深度卷积神经网络模型的方法。The invention belongs to the technical fields of digital image processing and pattern recognition, and in particular relates to a method for realizing a deep convolutional neural network model on an FPGA hardware platform.
背景技术Background technique
在当前计算机技术及互联网高速发展的情况下,数据规模呈爆发式增长,海量数据的智能化分析处理成为有效利用数据价值的关键所在。人工智能技术是实现从海量数据中发现有价值的信息的一种有效手段,近年来在计算机视觉、语音识别和自然语言处理等应用领域取得突破性进展。基于深度卷积神经网络的深度学习算法模型是其中的一个典型代表。With the current rapid development of computer technology and the Internet, the scale of data is growing explosively, and the intelligent analysis and processing of massive data has become the key to effectively utilizing the value of data. Artificial intelligence technology is an effective means to discover valuable information from massive data. In recent years, breakthroughs have been made in computer vision, speech recognition and natural language processing and other application fields. The deep learning algorithm model based on deep convolutional neural network is a typical representative.
卷积神经网络(Convolutional Neural Network,CNN)受神经科学研究的启发。经过20多年的演变,在模式识别,人机对抗等领域取得令人瞩目的理论研究及实际应用成果,在著名的人机围棋对抗赛中,基于CNN+蒙特卡洛搜索树算法的人工智能系统AlphaGo,以4:1大比分优势战胜了世界围棋冠军李世石。典型的CNN算法模型由两部分组成:特征提取器和分类器。其中特征提取器负责生成输入数据的低维特征向量,对数据具有较好的鲁棒性。该向量作为分类器(通常基于传统的人工神经网络)的输入数据进行分类,得到输入数据的分类结果。Convolutional Neural Networks (CNNs) are inspired by neuroscience research. After more than 20 years of evolution, it has achieved remarkable theoretical research and practical application results in the fields of pattern recognition and man-machine confrontation. In the famous man-machine Go match, the artificial intelligence system AlphaGo based on CNN+Monte Carlo search tree algorithm, He defeated the world Go champion Lee Sedol with a big score of 4:1. A typical CNN algorithm model consists of two parts: feature extractor and classifier. Among them, the feature extractor is responsible for generating the low-dimensional feature vector of the input data, which has good robustness to the data. This vector is used as the input data of a classifier (usually based on a traditional artificial neural network) for classification, and the classification result of the input data is obtained.
在实现卷积神经网络算法模型中,卷积计算占整个算法模型90%的计算量[1],因此卷积层的高效计算是大幅提升CNN算法模型计算效率的关键,通过硬件加速实现卷积计算是一种有效途径。In the implementation of the convolutional neural network algorithm model, convolution calculation accounts for 90% of the calculation amount of the entire algorithm model[1] , so the efficient calculation of the convolution layer is the key to greatly improving the calculation efficiency of the CNN algorithm model, and the realization of convolution through hardware acceleration Computing is an effective way.
当前,业内普遍使用GPU集群实现深度学习算法模型,通过大规模并行计算实现深度神经网络模型,取得了令人瞩目的高效率与高性能结果,然而GPU的高功耗也制约了其大规模应用,进而成为深度卷积神经网络算法模型的实际推广应用的瓶颈所在。FPGA具有高性能并行计算和超低功耗的优点,在FPGA上实现深度学习算法模型是该领域的必然发展方向。At present, GPU clusters are widely used in the industry to implement deep learning algorithm models, and deep neural network models are implemented through large-scale parallel computing, which has achieved remarkable results in high efficiency and high performance. However, the high power consumption of GPUs also restricts its large-scale application. , and then become the bottleneck of the actual promotion and application of the deep convolutional neural network algorithm model. FPGA has the advantages of high-performance parallel computing and ultra-low power consumption. Implementing deep learning algorithm models on FPGA is an inevitable development direction in this field.
目前利用FPGA实现CNN的方案主要有三种:At present, there are three main schemes for implementing CNN using FPGA:
(1)利用软核CPU实现控制部分,配合FPGA实现算法加速;(1) Use soft-core CPU to implement the control part, and cooperate with FPGA to realize algorithm acceleration;
(2)利用硬核SoC内嵌的硬核ARM Cortex A9 CPU实现控制部分,配合FPGA实现算法加速;(2) Use the hard-core ARM Cortex A9 CPU embedded in the hard-core SoC to realize the control part, and cooperate with the FPGA to realize algorithm acceleration;
(3)利用云端服务器配合FPGA实现算法加速。(3) Use cloud servers to cooperate with FPGA to achieve algorithm acceleration.
三种方案各有利弊,根据不同的运用场合,可以选择不同的加速方案。The three schemes have their own advantages and disadvantages, and different acceleration schemes can be selected according to different application occasions.
在深度卷积神经网络中,卷积层计算占用了超过90%的计算量,而且是整个网络模型中承前启后的关键环节,其计算效率直接影响了模型算法实现的性能。然而,在FPGA上实现卷积计算具有很大难度,主要体现在以下几个方面:In the deep convolutional neural network, the calculation of the convolutional layer takes up more than 90% of the calculation, and it is a key link in the entire network model. Its calculation efficiency directly affects the performance of the model algorithm. However, it is very difficult to realize convolution calculation on FPGA, which is mainly reflected in the following aspects:
(1)深度学习算法模型目前基本还处于学术界研究的阶段,大规模产业化应用还有很多算法及模型优化的工作,因此算法模型需要不断优化,以适应不同的应用场景,需要对深度学习理论及算法有非常深入的理解;(1) The deep learning algorithm model is basically still in the stage of academic research. There are still many algorithms and model optimization work for large-scale industrial applications. Therefore, the algorithm model needs to be continuously optimized to adapt to different application scenarios. Have a very deep understanding of theory and algorithms;
(2)FPGA的研发基于底层的硬件语言,适于算法模型相对稳定的情况,不断变化的深度学习算法模型为其在FPGA上实现带来很大的难度;(2) The research and development of FPGA is based on the underlying hardware language, which is suitable for the situation where the algorithm model is relatively stable, and the ever-changing deep learning algorithm model brings great difficulty to its implementation on FPGA;
(3)在FPGA上实现深度卷积神经网络,需要对FPGA的工程实现具有丰富的经验。FPGA的运行时钟频率和使用的乘法器等模块的输出延时(Latency)互相矛盾,时钟频率越高,模块的输出延时越长,时钟频率越低,模块的输出延时越短。需要借助工程经验通过手工实验找到相对平衡的参数。(3) Realizing deep convolutional neural network on FPGA requires rich experience in FPGA engineering implementation. The operating clock frequency of FPGA and the output delay (Latency) of the multiplier and other modules used are contradictory. The higher the clock frequency, the longer the output delay of the module, and the lower the clock frequency, the shorter the output delay of the module. It is necessary to use engineering experience to find relatively balanced parameters through manual experiments.
发明内容Contents of the invention
本发明方法的目的是提供一种高效率、低功耗的实现深度卷积神经网络模型的方法,以解决当前基于GPU或CPU的深度学习模型功耗大、效率低的问题。The purpose of the method of the present invention is to provide a method for realizing a deep convolutional neural network model with high efficiency and low power consumption, so as to solve the problems of high power consumption and low efficiency of the current deep learning model based on GPU or CPU.
本发明对FPGA硬件设计进行了优化,有效降低了资源消耗,能够在低端FPGA硬件平台上实现深度卷积神经网络模型。The invention optimizes FPGA hardware design, effectively reduces resource consumption, and can realize a deep convolutional neural network model on a low-end FPGA hardware platform.
本发明提供的实现深度卷积神经网络模型的方法,实现的硬件平台是XilinxZYNQ-7030可编程片上SoC,硬件平台内置FPGA和ARM Cortex A9处理器。本发明首先将训练好的网络模型参数加载到FPGA端,然后在ARM端对输入数据进行预处理,再将结果传输到FPGA端,在FPGA端实现深度卷积神经网络的卷积计算和下采样,形成数据特征向量并传输至ARM端,完成特征分类计算。具体包括4个过程:模型参数加载过程、输入数据预处理操作过程、卷积和下采样计算过程、分类计算过程:The method for realizing the deep convolutional neural network model provided by the present invention, the hardware platform realized is XilinxZYNQ-7030 programmable on-chip SoC, and the hardware platform has built-in FPGA and ARM Cortex A9 processor. The invention first loads the trained network model parameters to the FPGA end, then preprocesses the input data at the ARM end, and then transmits the result to the FPGA end, and realizes the convolution calculation and downsampling of the deep convolutional neural network at the FPGA end , to form a data feature vector and transmit it to the ARM end to complete the feature classification calculation. It specifically includes four processes: model parameter loading process, input data preprocessing operation process, convolution and downsampling calculation process, and classification calculation process:
1、模型参数加载过程为:1. The model parameter loading process is:
(1)离线训练深度卷积神经网络模型;(1) Offline training of deep convolutional neural network models;
(2)ARM端加载训练模型参数;(2) The ARM end loads the training model parameters;
(3)将模型参数传输至FPGA;(3) Transfer model parameters to FPGA;
2、输入数据预处理操作过程为:2. The input data preprocessing operation process is as follows:
(1)归一化处理;(1) Normalization processing;
(2)将处理结果传输至FPGA;(2) Transfer the processing results to FPGA;
(3)在FPGA端存储至Block RAM;(3) Stored in the Block RAM on the FPGA side;
3、卷积和下采样计算过程为:3. The calculation process of convolution and downsampling is:
(1)初始化卷积流水线;(1) Initialize the convolution pipeline;
(2)卷积计算;(2) Convolution calculation;
(3)池化下采样计算;(3) Pooling down-sampling calculation;
(4)重新初始化卷积流水线,进行多层卷积下采样计算;(4) Re-initialize the convolution pipeline and perform multi-layer convolution down-sampling calculations;
4、分类计算过程为:4. The classification calculation process is:
(1)将特征向量传回ARM端;(1) Send the feature vector back to the ARM side;
(2)通过分类模型计算;(2) Calculated by the classification model;
(3)输出分类结果。(3) Output classification results.
具体介绍如下:The details are as follows:
步骤1、加载训练模型参数Step 1. Load training model parameters
(1)在ARM端加载离线训练的深度卷积神经网络模型参数;(1) Load the offline training deep convolutional neural network model parameters on the ARM side;
(2)将训练模型参数传输至FPGA端;(2) Transfer the training model parameters to the FPGA side;
(3)FPGA端经过FIFO缓存后存储在Block RAM(块随机存储器)中;(3) The FPGA end is stored in Block RAM (block random access memory) after being cached by FIFO;
步骤2、预处理深度卷积神经网络模型Step 2. Preprocessing the deep convolutional neural network model
(1)对输入数据进行归一化处理,使其满足模型卷积运算要求;(1) Normalize the input data to make it meet the requirements of the model convolution operation;
(2)利用APB总线将ARM端归一化数据传输至FPGA端;(2) Use the APB bus to transmit the normalized data from the ARM side to the FPGA side;
(3)FPGA端将归一化数据经过FIFO缓存后存入Block RAM;(3) The FPGA end stores the normalized data into the Block RAM after passing through the FIFO cache;
步骤3、卷积和下采样计算Step 3, convolution and downsampling calculation
针对深度卷积神经网络模型中计算量最大的卷积层计算,设计深度流水线实现模式。设网络模型有H个卷积层和池化层。第h个(h=1,2,…,H)卷积层输入为T个m×m浮点数(32位)矩阵,输出为S个(m-n+1)×(m-n+1)浮点数(32位)矩阵,卷积核为K个n×n浮点数(32位)矩阵(n≤m),输入数据滑动窗尺度为n×n,横向滑动步长为1,纵向滑动步长为1。Aiming at the calculation of the convolutional layer with the largest amount of calculation in the deep convolutional neural network model, a deep pipeline implementation mode is designed. Suppose the network model has H convolutional layers and pooling layers. The hth (h=1,2,...,H) convolutional layer input is T m×m floating-point number (32-bit) matrix, and the output is S (m-n+1)×(m-n+1 ) floating-point number (32-bit) matrix, the convolution kernel is K n×n floating-point number (32-bit) matrix (n≤m), the input data sliding window scale is n×n, the horizontal sliding step is 1, and the vertical sliding The step size is 1.
(1)初始化卷积运算流水线(1) Initialize the convolution operation pipeline
定义n+1个数据缓存寄存器P0,P1,…,Pn-1,Pn,每个寄存器存放m个数据。其中n个寄存器(P(i-1)%(n+1)+0,P(i-1)%(n+1)+1,…,P(i-1)%(n+1)+n-1)存放第t个(t=1,2,…,T)输入数据矩阵的第i个(i=1,2,…,m-n+1)子矩阵(n×m)数据,其中%表示取余数,如果(i-1)%(n+1)+x>n,则(i-1)%(n+1)+x=0,(i-1)%(n+1)+x+1=1,…,其中x=0,1,…,n-1。如果n<m,P(i-1)%(n+1)+n寄存器存放输入数据矩阵中的第i+n行数据,在卷积计算过程中实现并行初始化,以减少FPGA空闲周期,提高计算效率。Define n+1 data cache registers P0 , P1 , . . . , Pn-1 , Pn , and each register stores m data. Among them, n registers (P(i-1)%(n+1)+0 , P(i-1)%(n+1)+1 , ..., P(i-1)%(n+1)+ n-1 ) stores the i-th (i=1,2,...,m-n+1) sub-matrix (n×m) data of the t-th (t=1,2,...,T) input data matrix, Among them, % means to take the remainder, if (i-1)%(n+1)+x>n, then (i-1)%(n+1)+x=0, (i-1)%(n+1 )+x+1=1,..., where x=0,1,...,n-1. If n<m, the P(i-1)%(n+1)+n register stores the i+nth row of data in the input data matrix, and realizes parallel initialization during the convolution calculation process to reduce FPGA idle cycles and improve Computational efficiency.
定义1个卷积核矩阵缓存寄存器W,存放第k个(k=1,2,…,K)n×n个卷积核矩阵权值数据。Define a convolution kernel matrix cache register W to store the kth (k=1,2,...,K) n×n convolution kernel matrix weight data.
(2)第h个卷积层计算(2) Calculation of the hth convolutional layer
完成网络第h个卷积层第t个输入数据矩阵和第k个卷积核的卷积计算,通过Sigmoid函数实现计算结果的激活。Complete the convolution calculation of the tth input data matrix of the hth convolutional layer of the network and the kth convolution kernel, and activate the calculation results through the Sigmoid function.
具体来说,在进行每次卷积计算的同时,初始化第i+n个数据缓存寄存器P(i-1)%(n+1)+n,作为卷积中第i+1个子矩阵卷积计算的缓存输入数据,实现循环卷积。Specifically, while performing each convolution calculation, initialize the i+nth data cache register P(i-1)%(n+1)+n , as the i+1th sub-matrix convolution in the convolution Computational cached input data to implement circular convolution.
在FPGA端通过浮点IP(Floating-point IP)核构建Sigmoid函数,实现卷积计算结果的激活;所述Sigmoid函数的表达式为:。具体步骤为:Construct the Sigmoid function through the floating-point IP (Floating-point IP) core on the FPGA side to realize the activation of the convolution calculation result; the expression of the Sigmoid function is: . The specific steps are:
如前所述,输入数据为m×m浮点数矩阵,卷积核为n×n浮点数矩阵,滑动窗尺度为n×n,横向滑动步长为1,纵向滑动步长为1,则卷积结果为(m-n+1)×(m-n+1)的浮点数矩阵,矩阵的每个元素加上偏置量b11(离线训练模型参数),利用Sigmoid函数激活后,结果为(m-n+1)×(m-n+1)的浮点数矩阵,存入Block RAM。As mentioned above, the input data is an m×m floating-point matrix, the convolution kernel is an n×n floating-point matrix, the sliding window scale is n×n, the horizontal sliding step is 1, and the vertical sliding step is 1, then the convolution The result of the product is (m-n+1)×(m-n+1) floating-point number matrix, each element of the matrix is added with the offset b11 (offline training model parameters), after activation by the Sigmoid function, the result is ( The floating-point number matrix of m-n+1)×(m-n+1) is stored in Block RAM.
完成1次卷积计算后,重新初始化卷积核矩阵缓存寄存器W,进行下一次卷积计算,往复循环卷积计算,计算结果为S个(m-n+1)×(m-n+1)浮点数矩阵,存入Block RAM。After completing one convolution calculation, re-initialize the convolution kernel matrix cache register W, perform the next convolution calculation, reciprocate circular convolution calculation, and the calculation result is S (m-n+1)×(m-n+1 ) Floating-point number matrix, stored in Block RAM.
(3)第h个池化层计算(3) Calculation of the hth pooling layer
实现第h个卷积层计算结果的池化计算,结果为S个[(m-n+1)/2]×[(m-n+1)/2]浮点数矩阵,存入Block RAM。具体步骤为:设卷积计算结果数据滑动窗尺度为2×2,步长为2,采用平均下采样法实现池化,即逐个2×2浮点数矩阵相加,计算结果取均值,获得S个[(m-n+1)/2]×[(m-n+1)/2]浮点数矩阵,作为第h+1个卷积层计算的输入矩阵。Realize the pooling calculation of the calculation results of the hth convolutional layer, and the results are S [(m-n+1)/2]×[(m-n+1)/2] floating-point number matrices, which are stored in Block RAM. The specific steps are: set the sliding window scale of the convolution calculation result data to 2×2, and the step size is 2, and use the average downsampling method to realize pooling, that is, add each 2×2 floating-point number matrix one by one, and take the mean value of the calculation results to obtain S A [(m-n+1)/2]×[(m-n+1)/2] floating-point number matrix is used as the input matrix for the calculation of the h+1th convolutional layer.
步骤4、分类计算Step 4, classification calculation
将卷积计算和池化计算结果传回ARM端进行分类运算。具体步骤为:FPGA端将BlockRAM中的卷积池化计算结果矩阵,通过FIFO缓存,APB总线传输至ARM端,ARM端利用Softmax运算完成数据分类计算,得到输入数据的分类结果并输出。The convolution calculation and pooling calculation results are sent back to the ARM side for classification operations. The specific steps are: the FPGA side transfers the convolution pool calculation result matrix in the BlockRAM to the ARM side through the FIFO cache, and the APB bus. The ARM side uses the Softmax operation to complete the data classification calculation, and obtains the classification results of the input data and outputs them.
本发明方法的主要特点有:The main features of the inventive method have:
(1)在低端FPGA上实现了深度卷积神经网络模型;(1) Implemented a deep convolutional neural network model on a low-end FPGA;
(2)利用流水线计算方式实现了深度卷积神经网络模型中的卷积计算加速;(2) The convolution calculation acceleration in the deep convolutional neural network model is realized by using the pipeline calculation method;
(3)控制芯片采用Soc内嵌ARM处理器实现,具有体积小,功耗低,效率高的特点,可广泛应用于嵌入式系统领域。(3) The control chip is realized by Soc embedded ARM processor, which has the characteristics of small size, low power consumption and high efficiency, and can be widely used in the field of embedded systems.
本发明利用FPGA的快速并行处理和极低功耗的高效能计算特性,实现深度卷积神经网络模型中复杂度最高的卷积计算部分,在保证算法正确率的前提下,大幅提升算法效率。相比于传统基于CPU或GPU实现深度卷积神经网络的方法,本发明方法在有效提高算法计算速度的同时,大幅降低了功耗,有效解决了采用CPU或GPU实现深度卷积神经网络导致的运算时间长或功耗大的问题。The invention utilizes the fast parallel processing of FPGA and the high-efficiency calculation characteristics of extremely low power consumption to realize the convolution calculation part with the highest complexity in the deep convolutional neural network model, and greatly improve the algorithm efficiency under the premise of ensuring the correct rate of the algorithm. Compared with the traditional method of implementing deep convolutional neural network based on CPU or GPU, the method of the present invention not only effectively improves the calculation speed of the algorithm, but also greatly reduces power consumption, and effectively solves the problems caused by the implementation of deep convolutional neural network by using CPU or GPU. The problem of long operation time or high power consumption.
附图说明Description of drawings
图1基于FPGA的深度卷积神经网络实现流程图。Figure 1. FPGA-based implementation flow chart of deep convolutional neural network.
图2 MNIST数据库(部分)。Figure 2 MNIST database (partial).
图3矩阵转置原理图。Figure 3 Schematic diagram of matrix transposition.
图4流水线计算示意图。Figure 4 Schematic diagram of pipeline calculation.
图5卷积计算示意图。Figure 5 Schematic diagram of convolution calculation.
图6 深度卷积神经网络结构图。Figure 6. Structure diagram of deep convolutional neural network.
图7下采样计算示意图。Figure 7 Schematic diagram of downsampling calculation.
图8 基于FPGA的深度卷积神经网络模型仿真结果。Figure 8 Simulation results of the FPGA-based deep convolutional neural network model.
图9 数字“7”的实测分类结果(MNIST数据库)。Figure 9. The measured classification results of the number "7" (MNIST database).
具体实施方式detailed description
以下结合附图解释运用了本发明方法,在FPGA硬件平台上利用深度卷积神经网络模型实现手写体字符识别算法的具体实施。(该深度卷积神经网络模型由输入层I,第一个卷积层C1,第一个下采样层S1,第二个卷积层C2,第二个下采样层S2和全链接层Softmax组成。输入图片大小为28×28,第一层卷积层包含1个大小为5×5的卷积核,第二个卷积层包含3个大小为5×5的卷积核)。The method of the present invention is explained below in conjunction with the accompanying drawings, and the specific implementation of the handwritten character recognition algorithm is realized on the FPGA hardware platform using a deep convolutional neural network model. (The deep convolutional neural network model consists of the input layer I, the first convolutional layer C1, the first downsampling layer S1, the second convolutional layer C2, the second downsampling layer S2 and the full connection layer Softmax The input image size is 28×28, the first convolutional layer contains a convolution kernel with a size of 5×5, and the second convolutional layer contains three convolution kernels with a size of 5×5).
利用深度卷积神经网络模型的手写体字符识别算法在FPGA上实现的具体运算步骤如附图1所示。The specific operation steps of the handwritten character recognition algorithm using the deep convolutional neural network model implemented on the FPGA are shown in Figure 1.
1、加载训练好的模型参数1. Load the trained model parameters
首先参考DeepLearnToolbox-master中CNN的函数,并进行一定的修改(将卷积函数重写,并将神经网络层数改为5层,一个输入层,两个卷积层,两个下采样层;第一个卷积层1个大小为5×5的卷积核,第二个卷积层3个大小为5×5的卷积核,两个下采样层的滑动步长为2,滑动窗2×2矩阵,训练次数设为10),利用Matlab训练深度卷积神经网络,然后在ARM端加载训练好的权值参数和偏置参数,最后将训练好的模型参数传输至FPGA端,经过FIFO缓存后存储在Block RAM中。First, refer to the CNN function in DeepLearnToolbox-master, and make certain modifications (rewrite the convolution function, and change the number of neural network layers to 5 layers, one input layer, two convolution layers, and two downsampling layers; The first convolution layer has 1 convolution kernel with a size of 5×5, the second convolution layer has 3 convolution kernels with a size of 5×5, the sliding step of the two downsampling layers is 2, and the sliding window 2×2 matrix, the number of training times is set to 10), use Matlab to train the deep convolutional neural network, then load the trained weight parameters and bias parameters on the ARM side, and finally transfer the trained model parameters to the FPGA side, after Stored in Block RAM after FIFO buffer.
2、预处理2. Pretreatment
附图2所示的MNIST手写体图像读入内存,每个像素除以255进行归一化,然后按照附图3所示进行转置。The MNIST handwriting image shown in Figure 2 is read into memory, each pixel is divided by 255 for normalization, and then transposed as shown in Figure 3.
3、将预处理结果传输至FPGA3. Transfer the preprocessing results to the FPGA
通过ZYNQ-7030 Soc上APB总线,将预处理结果传输至FPGA端,经过FIFO缓存后存储在Block RAM中。Through the APB bus on the ZYNQ-7030 Soc, the preprocessing result is transmitted to the FPGA side, and stored in the Block RAM after being buffered by FIFO.
4、初始化卷积运算流水线4. Initialize the convolution operation pipeline
如附图4所示,定义6个数据缓存寄存器P0,P1,P2,P3,P4,P5,每个寄存器可存放28个浮点数数据。其中5个寄存器(P(i-1)%(5+1)+0,P(i-1)%(5+1)+1,…,P(i-1)%(5+1)+5-1)存放输入图像矩阵的第i个(i=1,2,…,24)子矩阵(5×28)数据,其中%表示取余数。如果(i-1)%(5+1)+x>5,则(i-1)%(5+1)+x=0,(i-1)%(5+1)+x+1=1,…,其中x=0,1,…,4。P(i-1)%(5+1)+5寄存器存放输入图像矩阵中的第i+5行数据。As shown in Figure 4, six data buffer registers P0 , P1 , P2 , P3 , P4 , and P5 are defined, and each register can store 28 floating-point data. Among them, 5 registers (P(i-1)%(5+1)+0 , P(i-1)%(5+1)+1 , ..., P(i-1)%(5+1)+ 5-1 ) Store the i-th (i=1,2,...,24) sub-matrix (5×28) data of the input image matrix, where % means to take the remainder. If (i-1)%(5+1)+x>5, then (i-1)%(5+1)+x=0, (i-1)%(5+1)+x+1= 1,...,where x=0,1,...,4. The P(i-1)%(5+1)+5 register stores the i+5th row data in the input image matrix.
定义1个卷积核矩阵缓存寄存器W,存放第1个卷积层的1个5×5个卷积核矩阵权值数据。Define a convolution kernel matrix buffer register W to store a 5×5 convolution kernel matrix weight data of the first convolution layer.
5、进行第1个卷积层计算5. Perform the first convolutional layer calculation
完成网络第1个卷积层输入图像矩阵和第1个卷积层第1个卷积核的卷积计算,通过Sigmoid函数实现计算结果的激活。Complete the convolution calculation of the input image matrix of the first convolutional layer of the network and the first convolution kernel of the first convolutional layer, and activate the calculation results through the Sigmoid function.
在进行卷积计算的同时,初始化第i+5个数据缓存寄存器P(i-1)%(5+1)+5,作为卷积中第i+1个子矩阵卷积计算的缓存输入数据,实现循环卷积,如附图5所示。While performing the convolution calculation, initialize the i+5th data cache register P(i-1)%(5+1)+5 as the cache input data for the i+1th sub-matrix convolution calculation in the convolution, Realize circular convolution, as shown in Figure 5.
在FPGA端通过浮点IP(Floating-point IP)核构建Sigmoid函数,实现卷积计算结果的激活。Sigmoid函数的表达式为:。On the FPGA side, the Sigmoid function is constructed through the floating-point IP (Floating-point IP) core to realize the activation of the convolution calculation results. The expression of the Sigmoid function is: .
具体步骤为:The specific steps are:
如前所述,输入图像为28×28浮点数矩阵,卷积核为5×5浮点数矩阵,滑动窗尺度为5×5,横向滑动步长为1,纵向滑动步长为1,则卷积结果为24×24的浮点数矩阵,矩阵的每个元素加上偏置量b11(离线训练模型参数),利用Sigmoid函数激活后,结果为24×24的浮点数矩阵,存入Block RAM。As mentioned above, the input image is a 28×28 floating-point matrix, the convolution kernel is a 5×5 floating-point matrix, the sliding window scale is 5×5, the horizontal sliding step is 1, and the vertical sliding step is 1, then the convolution The result of the product is a 24×24 floating-point matrix. Each element of the matrix is added with the offset b11 (offline training model parameter). After activation by the Sigmoid function, the result is a 24×24 floating-point matrix, which is stored in the Block RAM.
完成1次卷积计算后,计算结果为1个24×24浮点数矩阵,存入Block RAM。After completing one convolution calculation, the calculation result is a 24×24 floating-point number matrix, which is stored in Block RAM.
6、进行第1个池化层计算6. Perform the first pooling layer calculation
实现第1个卷积层计算结果的池化计算,如附图6所示,结果为1个12×12浮点数矩阵,存入Block RAM。具体步骤为:卷积计算结果数据滑动窗尺度为2×2,步长为2,采用平均下采样法实现池化,即逐个2×2浮点数矩阵相加,计算结果取均值,获得1个12×12浮点数矩阵,作为第2个卷积层计算的输入矩阵,如附图7所示。Realize the pooling calculation of the calculation result of the first convolutional layer, as shown in Figure 6, the result is a 12×12 floating-point number matrix, which is stored in Block RAM. The specific steps are: the sliding window scale of the convolution calculation result data is 2×2, and the step size is 2. The average downsampling method is used to realize pooling, that is, the 2×2 floating-point matrix is added one by one, and the calculation results are averaged to obtain 1 The 12×12 floating-point number matrix is used as the input matrix for the calculation of the second convolutional layer, as shown in Figure 7.
7、重新初始化卷积流水线7. Reinitialize the convolution pipeline
如附图4所示,重新初始化6个数据缓存寄存器P0,P1,P2,P3,P4,P5,每个寄存器存放12个浮点数数据。其中5个寄存器(P(i-1)%(5+1)+0,P(i-1)%(5+1)+1,…,P(i-1)%(5+1)+5-1)存放输入矩阵的第i个(i=1,2,…,8)子矩阵(5×12)数据,其中%表示取余数。如果(i-1)%(5+1)+x>5,则(i-1)%(5+1)+x=0,(i-1)%(5+1)+x+1=1,…,其中x=0,1,…,4。P(i-1)%(5+1)+5寄存器存放输入矩阵中的第i+5行数据。As shown in Figure 4, re-initialize the six data cache registers P0 , P1 , P2 , P3 , P4 , and P5 , and each register stores 12 floating-point data. Among them, 5 registers (P(i-1)%(5+1)+0 , P(i-1)%(5+1)+1 , ..., P(i-1)%(5+1)+ 5-1 ) Store the i-th (i=1,2,...,8) sub-matrix (5×12) data of the input matrix, where % means to take the remainder. If (i-1)%(5+1)+x>5, then (i-1)%(5+1)+x=0, (i-1)%(5+1)+x+1= 1,...,where x=0,1,...,4. The P(i-1)%(5+1)+5 register stores the i+5th row data in the input matrix.
重新初始化卷积核矩阵缓存寄存器W,存放第2个卷积层的第1个5×5个卷积核矩阵权值数据。Reinitialize the convolution kernel matrix cache register W to store the first 5×5 convolution kernel matrix weight data of the second convolution layer.
8、进行第2个卷积层计算8. Perform the second convolutional layer calculation
完成网络第2个卷积层输入数据矩阵和第2个卷积层第1个卷积核的卷积计算,通过Sigmoid函数实现计算结果的激活。Complete the convolution calculation of the input data matrix of the second convolutional layer of the network and the first convolution kernel of the second convolutional layer, and activate the calculation results through the Sigmoid function.
重新初始化卷积核矩阵缓存寄存器W,存放第2个卷积层的第2个5×5个卷积核矩阵权值数据,完成网络第2个卷积层输入数据矩阵和第2个卷积层第2个卷积核的卷积计算,通过Sigmoid函数实现计算结果的激活。Reinitialize the convolution kernel matrix cache register W, store the second 5×5 convolution kernel matrix weight data of the second convolution layer, and complete the input data matrix of the second convolution layer of the network and the second convolution The convolution calculation of the second convolution kernel of the layer is activated by the Sigmoid function.
重新初始化卷积核矩阵缓存寄存器W,存放第2个卷积层的第3个5×5个卷积核矩阵权值数据,完成网络第2个卷积层输入数据矩阵和第2个卷积层第3个卷积核的卷积计算,通过Sigmoid函数实现计算结果的激活。Reinitialize the convolution kernel matrix cache register W, store the third 5×5 convolution kernel matrix weight data of the second convolution layer, and complete the input data matrix of the second convolution layer of the network and the second convolution The convolution calculation of the third convolution kernel of the layer is activated by the Sigmoid function.
在进行每次卷积计算的同时,初始化第i+5个数据缓存寄存器P(i-1)%(5+1)+5,作为卷积中第i+1个子矩阵卷积计算的缓存输入数据,实现循环卷积,如附图5所示。At the same time of each convolution calculation, initialize the i+5th data cache register P(i-1)%(5+1)+5 as the cache input for the i+1th sub-matrix convolution calculation in the convolution Data, implement circular convolution, as shown in Figure 5.
具体步骤为:如前所述,输入图像为12×12浮点数矩阵,卷积核为3个5×5浮点数矩阵,滑动窗尺度为5×5,横向滑动步长为1,纵向滑动步长为1,则卷积结果为3个8×8的浮点数矩阵,3个矩阵的每个元素分别加上偏置量b21,b22,b23(离线训练模型参数),利用Sigmoid函数激活后,结果为3个8×8的浮点数矩阵,存入Block RAM。The specific steps are: as mentioned above, the input image is a 12×12 floating-point number matrix, the convolution kernel is three 5×5 floating-point number matrices, the sliding window scale is 5×5, the horizontal sliding step is 1, and the vertical sliding step is 1. If the length is 1, the convolution result will be three 8×8 floating-point matrixes, each element of the three matrices is added with offsets b21, b22, b23 (offline training model parameters), and after activation by the Sigmoid function, The result is three 8×8 floating-point matrixes, which are stored in Block RAM.
完成2次卷积计算后,计算结果为3个8×8浮点数矩阵,存入Block RAM。After completing two convolution calculations, the calculation results are three 8×8 floating-point matrixes, which are stored in Block RAM.
9、进行第2个池化层计算9. Perform the second pooling layer calculation
实现第2个卷积层计算结果的池化计算,如附图6所示,结果为3个4×4浮点数矩阵,存入Block RAM。具体步骤为:卷积计算结果数据滑动窗尺度为2×2,步长为2,采用平均下采样法实现池化,即逐个2×2浮点数矩阵相加,计算结果取均值,获得3个4×4浮点数矩阵,作为Softmax层的输入矩阵,如附图7所示。Realize the pooling calculation of the calculation result of the second convolutional layer, as shown in Figure 6, the result is three 4×4 floating-point number matrices, which are stored in Block RAM. The specific steps are: the sliding window scale of the convolution calculation result data is 2×2, and the step size is 2. The average downsampling method is used to realize pooling, that is, the 2×2 floating-point matrix is added one by one, and the calculation results are averaged to obtain 3 The 4×4 floating-point number matrix is used as the input matrix of the Softmax layer, as shown in Figure 7.
10、分类计算10. Classification calculation
将卷积计算和池化计算结果传回ARM端进行分类运算。具体步骤为:FPGA端将BlockRAM中的卷积池化计算结果矩阵,通过FIFO缓存,APB总线传输至ARM端,ARM端利用Softmax运算完成数据分类计算,得到输入图片的分类结果并输出。The convolution calculation and pooling calculation results are sent back to the ARM side for classification operations. The specific steps are: the FPGA side transfers the convolution pool calculation result matrix in the BlockRAM to the ARM side through the FIFO cache, and the APB bus. The ARM side uses the Softmax operation to complete the data classification calculation, and obtains the classification result of the input image and outputs it.
上述方法处理MNIST数据库中数字图片“7”的仿真结果如图8所示。The simulation results of the above method processing the digital picture "7" in the MNIST database are shown in Fig. 8 .
上述方法处理MNIST数据库中数字图片“7”的实测分类结果如图9所示。Figure 9 shows the measured classification results of the above-mentioned method processing the digital picture "7" in the MNIST database.
参考文献references
[1] Cong J, Xiao B. Minimizing Computation in Convolutional NeuralNetworks[M]// Artificial Neural Networks and Machine Learning – ICANN 2014.Springer International Publishing, 2014:33-7.[1] Cong J, Xiao B. Minimizing Computation in Convolutional Neural Networks[M]// Artificial Neural Networks and Machine Learning – ICANN 2014. Springer International Publishing, 2014:33-7.
[2] Farabet C, Poulet C, Han J Y, et al. CNP: An FPGA-based processor forConvolutional Networks[J]. International Conference on Field ProgrammableLogic & Applications, 2009:32-37.[2] Farabet C, Poulet C, Han J Y, et al. CNP: An FPGA-based processor for Convolutional Networks[J]. International Conference on Field ProgrammableLogic & Applications, 2009:32-37.
[3] Gokhale V, Jin J, Dundar A, et al. A 240 G-ops/s Mobile Coprocessorfor Deep Neural Networks[C]// IEEE Embedded Vision Workshop. 2014:696-701.[3] Gokhale V, Jin J, Dundar A, et al. A 240 G-ops/s Mobile Coprocessor for Deep Neural Networks[C]// IEEE Embedded Vision Workshop. 2014:696-701.
[4] Zhang C, Li P, Sun G, et al. Optimizing FPGA-based Accelerator Designfor Deep Convolutional Neural Networks[C]// Acm/sigda InternationalSymposium. 2015:161-170.[4] Zhang C, Li P, Sun G, et al. Optimizing FPGA-based Accelerator Designfor Deep Convolutional Neural Networks[C]// Acm/sigda InternationalSymposium. 2015:161-170.
[5] Krizhevsky A, Sutskever I, Hinton G E. ImageNet Classification withDeep Convolutional Neural Networks[J]. Advances in Neural InformationProcessing Systems, 2012, 25(2):2012.[5] Krizhevsky A, Sutskever I, Hinton G E. ImageNet Classification with Deep Convolutional Neural Networks[J]. Advances in Neural Information Processing Systems, 2012, 25(2):2012.
[6] Farabet C, Martini B, Corda B, et al. NeuFlow: A runtimereconfigurable dataflow processor for vision[J]. 2011, 9(6):109-116.[6] Farabet C, Martini B, Corda B, et al. NeuFlow: A runtime reconfigurable dataflow processor for vision[J]. 2011, 9(6):109-116.
[7] Matai J, Irturk A, Kastner R. Design and Implementation of an FPGA-Based Real-Time Face Recognition System[C]// IEEE, International Symposium onField-Programmable Custom Computing Machines. 2011:97-100.[7] Matai J, Irturk A, Kastner R. Design and Implementation of an FPGA-Based Real-Time Face Recognition System[C]// IEEE, International Symposium on Field-Programmable Custom Computing Machines. 2011:97-100.
[8] Sankaradas M, Jakkula V, Cadambi S, et al. A Massively ParallelCoprocessor for Convolutional Neural Networks[C]// IEEE InternationalConference on Application-Specific Systems, Architectures and Processors.IEEE Computer Society, 2009:53-60.。[8] Sankaradas M, Jakkula V, Cadambi S, et al. A Massively Parallel Coprocessor for Convolutional Neural Networks[C]// IEEE International Conference on Application-Specific Systems, Architectures and Processors.IEEE Computer Society, 2009:53-60..
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201610615714.2ACN106228240B (en) | 2016-07-30 | 2016-07-30 | Deep convolution neural network implementation method based on FPGA |
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201610615714.2ACN106228240B (en) | 2016-07-30 | 2016-07-30 | Deep convolution neural network implementation method based on FPGA |
| Publication Number | Publication Date |
|---|---|
| CN106228240Atrue CN106228240A (en) | 2016-12-14 |
| CN106228240B CN106228240B (en) | 2020-09-01 |
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201610615714.2AActiveCN106228240B (en) | 2016-07-30 | 2016-07-30 | Deep convolution neural network implementation method based on FPGA |
| Country | Link |
|---|---|
| CN (1) | CN106228240B (en) |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN106529517A (en)* | 2016-12-30 | 2017-03-22 | 北京旷视科技有限公司 | Image processing method and image processing device |
| CN106650691A (en)* | 2016-12-30 | 2017-05-10 | 北京旷视科技有限公司 | Image processing method and image processing device |
| CN106682702A (en)* | 2017-01-12 | 2017-05-17 | 张亮 | Deep learning method and system |
| CN106779060A (en)* | 2017-02-09 | 2017-05-31 | 武汉魅瞳科技有限公司 | A kind of computational methods of the depth convolutional neural networks for being suitable to hardware design realization |
| CN106875012A (en)* | 2017-02-09 | 2017-06-20 | 武汉魅瞳科技有限公司 | A kind of streamlined acceleration system of the depth convolutional neural networks based on FPGA |
| CN106875011A (en)* | 2017-01-12 | 2017-06-20 | 南京大学 | The hardware structure and its calculation process of two-value weight convolutional neural networks accelerator |
| CN106909970A (en)* | 2017-01-12 | 2017-06-30 | 南京大学 | A kind of two-value weight convolutional neural networks hardware accelerator computing module based on approximate calculation |
| CN106991474A (en)* | 2017-03-28 | 2017-07-28 | 华中科技大学 | The parallel full articulamentum method for interchanging data of deep neural network model and system |
| CN106991999A (en)* | 2017-03-29 | 2017-07-28 | 北京小米移动软件有限公司 | Audio recognition method and device |
| CN107229969A (en)* | 2017-06-21 | 2017-10-03 | 郑州云海信息技术有限公司 | A kind of convolutional neural networks implementation method and device based on FPGA |
| TWI607389B (en)* | 2017-02-10 | 2017-12-01 | 耐能股份有限公司 | Pool computing operation device and method for convolutional neural network |
| CN107451653A (en)* | 2017-07-05 | 2017-12-08 | 深圳市自行科技有限公司 | Computational methods, device and the readable storage medium storing program for executing of deep neural network |
| CN107451659A (en)* | 2017-07-27 | 2017-12-08 | 清华大学 | Neutral net accelerator and its implementation for bit wide subregion |
| CN107451654A (en)* | 2017-07-05 | 2017-12-08 | 深圳市自行科技有限公司 | Acceleration operation method, server and the storage medium of convolutional neural networks |
| CN107564522A (en)* | 2017-09-18 | 2018-01-09 | 郑州云海信息技术有限公司 | A kind of intelligent control method and device |
| CN107622305A (en)* | 2017-08-24 | 2018-01-23 | 中国科学院计算技术研究所 | Processor and processing method for neural network |
| CN107656899A (en)* | 2017-09-27 | 2018-02-02 | 深圳大学 | A kind of mask convolution method and system based on FPGA |
| CN107689223A (en)* | 2017-08-30 | 2018-02-13 | 北京嘉楠捷思信息技术有限公司 | Audio identification method and device |
| CN107749044A (en)* | 2017-10-19 | 2018-03-02 | 珠海格力电器股份有限公司 | Image information pooling method and device |
| CN107844833A (en)* | 2017-11-28 | 2018-03-27 | 郑州云海信息技术有限公司 | A kind of data processing method of convolutional neural networks, device and medium |
| CN108009631A (en)* | 2017-11-30 | 2018-05-08 | 睿视智觉(深圳)算法技术有限公司 | A kind of VGG-16 general purpose processing blocks and its control method based on FPGA |
| CN108108809A (en)* | 2018-03-05 | 2018-06-01 | 山东领能电子科技有限公司 | A kind of hardware structure and its method of work that acceleration is made inferences for convolutional Neural metanetwork |
| CN108154229A (en)* | 2018-01-10 | 2018-06-12 | 西安电子科技大学 | Accelerate the image processing method of convolutional neural networks frame based on FPGA |
| CN108229645A (en)* | 2017-04-28 | 2018-06-29 | 北京市商汤科技开发有限公司 | Convolution accelerates and computation processing method, device, electronic equipment and storage medium |
| CN108229653A (en)* | 2016-12-22 | 2018-06-29 | 三星电子株式会社 | Convolutional neural networks system and its operating method |
| CN108256636A (en)* | 2018-03-16 | 2018-07-06 | 成都理工大学 | A kind of convolutional neural networks algorithm design implementation method based on Heterogeneous Computing |
| WO2018130029A1 (en)* | 2017-01-13 | 2018-07-19 | 华为技术有限公司 | Calculating device and calculation method for neural network calculation |
| WO2018137177A1 (en)* | 2017-01-25 | 2018-08-02 | 北京大学 | Method for convolution operation based on nor flash array |
| CN108362628A (en)* | 2018-01-11 | 2018-08-03 | 天津大学 | The n cell flow-sorting methods of flow cytometer are imaged based on polarizing diffraction |
| CN108388943A (en)* | 2018-01-08 | 2018-08-10 | 中国科学院计算技术研究所 | A kind of pond device and method suitable for neural network |
| CN108416422A (en)* | 2017-12-29 | 2018-08-17 | 国民技术股份有限公司 | A kind of convolutional neural networks implementation method and device based on FPGA |
| CN108470211A (en)* | 2018-04-09 | 2018-08-31 | 郑州云海信息技术有限公司 | A kind of implementation method of convolutional calculation, equipment and computer storage media |
| CN108520300A (en)* | 2018-04-09 | 2018-09-11 | 郑州云海信息技术有限公司 | A method and device for implementing a deep learning network |
| CN108537330A (en)* | 2018-03-09 | 2018-09-14 | 中国科学院自动化研究所 | Convolutional calculation device and method applied to neural network |
| CN108549935A (en)* | 2018-05-03 | 2018-09-18 | 济南浪潮高新科技投资发展有限公司 | A kind of device and method for realizing neural network model |
| CN108595379A (en)* | 2018-05-08 | 2018-09-28 | 济南浪潮高新科技投资发展有限公司 | A kind of parallelization convolution algorithm method and system based on multi-level buffer |
| CN108615076A (en)* | 2018-04-08 | 2018-10-02 | 福州瑞芯微电子股份有限公司 | A kind of data store optimization method and apparatus based on deep learning chip |
| CN108710892A (en)* | 2018-04-04 | 2018-10-26 | 浙江工业大学 | Synergetic immunity defence method towards a variety of confrontation picture attacks |
| CN108764182A (en)* | 2018-06-01 | 2018-11-06 | 阿依瓦(北京)技术有限公司 | A kind of acceleration method and device for artificial intelligence of optimization |
| CN108805270A (en)* | 2018-05-08 | 2018-11-13 | 华中科技大学 | A kind of convolutional neural networks system based on memory |
| CN108804974A (en)* | 2017-04-27 | 2018-11-13 | 上海鲲云信息科技有限公司 | Method and system for resource estimation and configuration of hardware architecture of target detection algorithm |
| CN108805267A (en)* | 2018-05-28 | 2018-11-13 | 重庆大学 | The data processing method hardware-accelerated for convolutional neural networks |
| CN109036459A (en)* | 2018-08-22 | 2018-12-18 | 百度在线网络技术(北京)有限公司 | Sound end detecting method, device, computer equipment, computer storage medium |
| CN109032781A (en)* | 2018-07-13 | 2018-12-18 | 重庆邮电大学 | A kind of FPGA parallel system of convolutional neural networks algorithm |
| CN109086879A (en)* | 2018-07-05 | 2018-12-25 | 东南大学 | A kind of implementation method of the dense Connection Neural Network based on FPGA |
| CN109102070A (en)* | 2018-08-22 | 2018-12-28 | 地平线(上海)人工智能技术有限公司 | The preprocess method and device of convolutional neural networks data |
| CN109117949A (en)* | 2018-08-01 | 2019-01-01 | 南京天数智芯科技有限公司 | Flexible data stream handle and processing method for artificial intelligence equipment |
| CN109146067A (en)* | 2018-11-19 | 2019-01-04 | 东北大学 | A kind of Policy convolutional neural networks accelerator based on FPGA |
| CN109214506A (en)* | 2018-09-13 | 2019-01-15 | 深思考人工智能机器人科技(北京)有限公司 | A kind of convolutional neural networks establish device and method |
| CN109313723A (en)* | 2018-01-15 | 2019-02-05 | 深圳鲲云信息科技有限公司 | Artificial intelligence convolution processing method, device, readable storage medium, and terminal |
| CN109359732A (en)* | 2018-09-30 | 2019-02-19 | 阿里巴巴集团控股有限公司 | A chip and a data processing method based thereon |
| CN109376843A (en)* | 2018-10-12 | 2019-02-22 | 山东师范大学 | FPGA-based fast classification method, realization method and device of EEG signal |
| CN109416756A (en)* | 2018-01-15 | 2019-03-01 | 深圳鲲云信息科技有限公司 | Convolver and its applied artificial intelligence processing device |
| WO2019055224A1 (en)* | 2017-09-14 | 2019-03-21 | Xilinx, Inc. | System and method for implementing neural networks in integrated circuits |
| CN109615067A (en)* | 2019-03-05 | 2019-04-12 | 深兰人工智能芯片研究院(江苏)有限公司 | A data scheduling method and device for a convolutional neural network |
| CN109670578A (en)* | 2018-12-14 | 2019-04-23 | 北京中科寒武纪科技有限公司 | Neural network first floor convolution layer data processing method, device and computer equipment |
| CN109711539A (en)* | 2018-12-17 | 2019-05-03 | 北京中科寒武纪科技有限公司 | Operation method, device and Related product |
| CN109740748A (en)* | 2019-01-08 | 2019-05-10 | 西安邮电大学 | An FPGA-based Convolutional Neural Network Accelerator |
| CN109754062A (en)* | 2017-11-07 | 2019-05-14 | 上海寒武纪信息科技有限公司 | Execution method of convolution expansion instruction and related products |
| CN109784483A (en)* | 2019-01-24 | 2019-05-21 | 电子科技大学 | In-memory computing accelerator for binarized convolutional neural network based on FD-SOI process |
| CN109871939A (en)* | 2019-01-29 | 2019-06-11 | 深兰人工智能芯片研究院(江苏)有限公司 | A kind of image processing method and image processing apparatus |
| CN109961133A (en)* | 2017-12-14 | 2019-07-02 | 北京中科寒武纪科技有限公司 | Integrated circuit chip device and Related product |
| WO2019136747A1 (en)* | 2018-01-15 | 2019-07-18 | 深圳鲲云信息科技有限公司 | Deconvolver and an artificial intelligence processing device applied by same |
| WO2019136756A1 (en)* | 2018-01-15 | 2019-07-18 | 深圳鲲云信息科技有限公司 | Design model establishing method and system for artificial intelligent processing device, storage medium, and terminal |
| CN110032374A (en)* | 2019-03-21 | 2019-07-19 | 深兰科技(上海)有限公司 | A kind of parameter extracting method, device, equipment and medium |
| CN110084363A (en)* | 2019-05-15 | 2019-08-02 | 电科瑞达(成都)科技有限公司 | A kind of deep learning model accelerated method based on FPGA platform |
| CN110134379A (en)* | 2018-02-08 | 2019-08-16 | 广达电脑股份有限公司 | Computer system, programming method, and non-transitory computer readable medium |
| CN110209627A (en)* | 2019-06-03 | 2019-09-06 | 山东浪潮人工智能研究院有限公司 | A kind of hardware-accelerated method of SSD towards intelligent terminal |
| CN110223687A (en)* | 2019-06-03 | 2019-09-10 | Oppo广东移动通信有限公司 | Instruction execution method, device, storage medium and electronic device |
| CN110399976A (en)* | 2018-04-25 | 2019-11-01 | 华为技术有限公司 | Calculation device and calculation method |
| CN110458279A (en)* | 2019-07-15 | 2019-11-15 | 武汉魅瞳科技有限公司 | An FPGA-based binary neural network acceleration method and system |
| CN110472442A (en)* | 2019-08-20 | 2019-11-19 | 厦门理工学院 | An IP Core for Automatically Detecting Hardware Trojans |
| WO2019233228A1 (en)* | 2018-06-08 | 2019-12-12 | Oppo广东移动通信有限公司 | Electronic device and device control method |
| CN110574371A (en)* | 2017-12-08 | 2019-12-13 | 百度时代网络技术(北京)有限公司 | Stereo Camera Depth Determination Using Hardware Accelerators |
| CN110619387A (en)* | 2019-09-12 | 2019-12-27 | 复旦大学 | Channel expansion method based on convolutional neural network |
| CN110689088A (en)* | 2019-10-09 | 2020-01-14 | 山东大学 | CNN-based LIBS ore spectral data classification method and device |
| CN110727634A (en)* | 2019-07-05 | 2020-01-24 | 中国科学院计算技术研究所 | Embedded intelligent computer architecture for object-end data processing |
| CN110880038A (en)* | 2019-11-29 | 2020-03-13 | 中国科学院自动化研究所 | System for accelerating convolution calculation based on FPGA and convolution neural network |
| WO2020052266A1 (en)* | 2018-09-14 | 2020-03-19 | Huawei Technologies Co., Ltd. | System and method for cascaded max pooling in neural networks |
| WO2020052265A1 (en)* | 2018-09-14 | 2020-03-19 | Huawei Technologies Co., Ltd. | System and method for cascaded dynamic max pooling in neural networks |
| CN110910434A (en)* | 2019-11-05 | 2020-03-24 | 东南大学 | An energy-efficient method for deep learning disparity estimation algorithm based on FPGA |
| CN110928318A (en)* | 2019-12-31 | 2020-03-27 | 苏州清研微视电子科技有限公司 | FPGA-based binocular vision assisted driving system |
| CN110991632A (en)* | 2019-11-29 | 2020-04-10 | 电子科技大学 | A Design Method of Heterogeneous Neural Network Computing Accelerator Based on FPGA |
| CN111008629A (en)* | 2019-12-07 | 2020-04-14 | 怀化学院 | Cortex-M3-based method for identifying number of tip |
| TWI696129B (en)* | 2019-03-15 | 2020-06-11 | 華邦電子股份有限公司 | Memory chip capable of performing artificial intelligence operation and operation method thereof |
| CN111310921A (en)* | 2020-03-27 | 2020-06-19 | 西安电子科技大学 | An FPGA Implementation Method of Lightweight Deep Convolutional Neural Network |
| CN111667053A (en)* | 2020-06-01 | 2020-09-15 | 重庆邮电大学 | Novel convolutional neural network accelerator and forward propagation calculation acceleration method thereof |
| CN109800867B (en)* | 2018-12-17 | 2020-09-29 | 北京理工大学 | Data calling method based on FPGA off-chip memory |
| CN111832718A (en)* | 2020-06-24 | 2020-10-27 | 上海西井信息科技有限公司 | Chip architecture |
| CN111860815A (en)* | 2017-08-31 | 2020-10-30 | 中科寒武纪科技股份有限公司 | Method and device for convolution operation |
| CN111860773A (en)* | 2020-06-30 | 2020-10-30 | 北京百度网讯科技有限公司 | Processing apparatus and method for information processing |
| CN112508184A (en)* | 2020-12-16 | 2021-03-16 | 重庆邮电大学 | Design method of fast image recognition accelerator based on convolutional neural network |
| TWI724515B (en)* | 2019-08-27 | 2021-04-11 | 聯智科創有限公司 | Machine learning service delivery method |
| CN113012689A (en)* | 2021-04-15 | 2021-06-22 | 成都爱旗科技有限公司 | Electronic equipment and deep learning hardware acceleration method |
| CN113762491A (en)* | 2021-08-10 | 2021-12-07 | 南京工业大学 | An FPGA-based Convolutional Neural Network Accelerator |
| CN113850814A (en)* | 2021-09-26 | 2021-12-28 | 华南农业大学 | A CNN model-based method for identification of litchi leaf diseases and insect pests |
| US11227086B2 (en) | 2017-01-04 | 2022-01-18 | Stmicroelectronics S.R.L. | Reconfigurable interconnect |
| CN114299514A (en)* | 2021-12-07 | 2022-04-08 | 南京理工大学 | Method for realizing handwritten number recognition |
| US11341398B2 (en)* | 2016-10-03 | 2022-05-24 | Hitachi, Ltd. | Recognition apparatus and learning system using neural networks |
| US11531873B2 (en) | 2020-06-23 | 2022-12-20 | Stmicroelectronics S.R.L. | Convolution acceleration with embedded vector decompression |
| US11562115B2 (en) | 2017-01-04 | 2023-01-24 | Stmicroelectronics S.R.L. | Configurable accelerator framework including a stream switch having a plurality of unidirectional stream links |
| US11593609B2 (en) | 2020-02-18 | 2023-02-28 | Stmicroelectronics S.R.L. | Vector quantization decoding hardware unit for real-time dynamic decompression for parameters of neural networks |
| WO2023155369A1 (en)* | 2022-02-21 | 2023-08-24 | 山东浪潮科学研究院有限公司 | Depthwise convolution optimization method and system based on micro-architecture processor, and device |
| CN116718894A (en)* | 2023-06-19 | 2023-09-08 | 上饶市广强电子科技有限公司 | Circuit stability test method and system for corn lamp |
| US11874898B2 (en) | 2018-01-15 | 2024-01-16 | Shenzhen Corerain Technologies Co., Ltd. | Streaming-based artificial intelligence convolution processing method and apparatus, readable storage medium and terminal |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7882164B1 (en)* | 2004-09-24 | 2011-02-01 | University Of Southern California | Image convolution engine optimized for use in programmable gate arrays |
| CN104035750A (en)* | 2014-06-11 | 2014-09-10 | 西安电子科技大学 | Field programmable gate array (FPGA)-based real-time template convolution implementing method |
| US20140269376A1 (en)* | 2013-03-15 | 2014-09-18 | DGS Global Systems, Inc. | Systems, methods, and devices for electronic spectrum management |
| CN105046681A (en)* | 2015-05-14 | 2015-11-11 | 江南大学 | Image salient region detecting method based on SoC |
| CN105469039A (en)* | 2015-11-19 | 2016-04-06 | 天津大学 | Target identification system based on AER image sensor |
| CN105491269A (en)* | 2015-11-24 | 2016-04-13 | 长春乙天科技有限公司 | High-fidelity video amplification method based on deconvolution image restoration |
| CN105678379A (en)* | 2016-01-12 | 2016-06-15 | 腾讯科技(深圳)有限公司 | CNN processing method and device |
| CN105678378A (en)* | 2014-12-04 | 2016-06-15 | 辉达公司 | Indirect access to sample data to perform multiple convolution operations in parallel processing systems |
| CN105740773A (en)* | 2016-01-25 | 2016-07-06 | 重庆理工大学 | Deep learning and multi-scale information based behavior identification method |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7882164B1 (en)* | 2004-09-24 | 2011-02-01 | University Of Southern California | Image convolution engine optimized for use in programmable gate arrays |
| US20140269376A1 (en)* | 2013-03-15 | 2014-09-18 | DGS Global Systems, Inc. | Systems, methods, and devices for electronic spectrum management |
| CN104035750A (en)* | 2014-06-11 | 2014-09-10 | 西安电子科技大学 | Field programmable gate array (FPGA)-based real-time template convolution implementing method |
| CN105678378A (en)* | 2014-12-04 | 2016-06-15 | 辉达公司 | Indirect access to sample data to perform multiple convolution operations in parallel processing systems |
| CN105046681A (en)* | 2015-05-14 | 2015-11-11 | 江南大学 | Image salient region detecting method based on SoC |
| CN105469039A (en)* | 2015-11-19 | 2016-04-06 | 天津大学 | Target identification system based on AER image sensor |
| CN105491269A (en)* | 2015-11-24 | 2016-04-13 | 长春乙天科技有限公司 | High-fidelity video amplification method based on deconvolution image restoration |
| CN105678379A (en)* | 2016-01-12 | 2016-06-15 | 腾讯科技(深圳)有限公司 | CNN processing method and device |
| CN105740773A (en)* | 2016-01-25 | 2016-07-06 | 重庆理工大学 | Deep learning and multi-scale information based behavior identification method |
| Title |
|---|
| LI, NING 等: "A Multistage Dataflow Implementation of a Deep Convolutional Neural Network Based on FPGA For High-Speed Object Recognition", 《2016 IEEE SOUTHWEST SYMPOSIUM ON IMAGE ANALYSIS AND INTERPRETATION》* |
| MOHAMMAD MOTAMEDI 等: "Design space exploration of FPGA-based Deep Convolutional Neural Networks", 《2016 21ST ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE》* |
| YOUSEFZADEH, A 等: "Fast Pipeline 128times128 pixel spiking convolution core for event-driven vision processing in FPGAs", 《2015 FIRST INTERNATIONAL CONFERENCE ON EVENT-BASED CONTROL, COMMUNICATION AND SIGNAL PROCESSING 》* |
| 朱学亮 等: "基于FPGA的图像卷积IP核的设计与实现", 《微电子学与计算机》* |
| 李明 等: "空间模板卷积滤波算法的FPGA实现新方法", 《计算机应用与软件》* |
| 桑红石 等: "一种新型2-D卷积器的FPGA实现", 《微电子学与计算机》* |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US11341398B2 (en)* | 2016-10-03 | 2022-05-24 | Hitachi, Ltd. | Recognition apparatus and learning system using neural networks |
| CN108229653B (en)* | 2016-12-22 | 2023-04-07 | 三星电子株式会社 | Convolutional neural network system and method of operating the same |
| CN108229653A (en)* | 2016-12-22 | 2018-06-29 | 三星电子株式会社 | Convolutional neural networks system and its operating method |
| CN106529517A (en)* | 2016-12-30 | 2017-03-22 | 北京旷视科技有限公司 | Image processing method and image processing device |
| CN106529517B (en)* | 2016-12-30 | 2019-11-01 | 北京旷视科技有限公司 | Image processing method and image processing apparatus |
| CN106650691A (en)* | 2016-12-30 | 2017-05-10 | 北京旷视科技有限公司 | Image processing method and image processing device |
| US11562115B2 (en) | 2017-01-04 | 2023-01-24 | Stmicroelectronics S.R.L. | Configurable accelerator framework including a stream switch having a plurality of unidirectional stream links |
| US11227086B2 (en) | 2017-01-04 | 2022-01-18 | Stmicroelectronics S.R.L. | Reconfigurable interconnect |
| US12118451B2 (en) | 2017-01-04 | 2024-10-15 | Stmicroelectronics S.R.L. | Deep convolutional network heterogeneous architecture |
| US12073308B2 (en)* | 2017-01-04 | 2024-08-27 | Stmicroelectronics International N.V. | Hardware accelerator engine |
| US11675943B2 (en) | 2017-01-04 | 2023-06-13 | Stmicroelectronics S.R.L. | Tool to create a reconfigurable interconnect framework |
| CN106909970B (en)* | 2017-01-12 | 2020-04-21 | 南京风兴科技有限公司 | Approximate calculation-based binary weight convolution neural network hardware accelerator calculation device |
| CN106682702A (en)* | 2017-01-12 | 2017-05-17 | 张亮 | Deep learning method and system |
| CN106875011B (en)* | 2017-01-12 | 2020-04-17 | 南京风兴科技有限公司 | Hardware architecture of binary weight convolution neural network accelerator and calculation flow thereof |
| CN106909970A (en)* | 2017-01-12 | 2017-06-30 | 南京大学 | A kind of two-value weight convolutional neural networks hardware accelerator computing module based on approximate calculation |
| CN106875011A (en)* | 2017-01-12 | 2017-06-20 | 南京大学 | The hardware structure and its calculation process of two-value weight convolutional neural networks accelerator |
| WO2018130029A1 (en)* | 2017-01-13 | 2018-07-19 | 华为技术有限公司 | Calculating device and calculation method for neural network calculation |
| WO2018137177A1 (en)* | 2017-01-25 | 2018-08-02 | 北京大学 | Method for convolution operation based on nor flash array |
| US11309026B2 (en) | 2017-01-25 | 2022-04-19 | Peking University | Convolution operation method based on NOR flash array |
| CN106779060B (en)* | 2017-02-09 | 2019-03-08 | 武汉魅瞳科技有限公司 | A kind of calculation method for the depth convolutional neural networks realized suitable for hardware design |
| CN106779060A (en)* | 2017-02-09 | 2017-05-31 | 武汉魅瞳科技有限公司 | A kind of computational methods of the depth convolutional neural networks for being suitable to hardware design realization |
| CN106875012A (en)* | 2017-02-09 | 2017-06-20 | 武汉魅瞳科技有限公司 | A kind of streamlined acceleration system of the depth convolutional neural networks based on FPGA |
| CN106875012B (en)* | 2017-02-09 | 2019-09-20 | 武汉魅瞳科技有限公司 | A Pipelined Acceleration System of FPGA-Based Deep Convolutional Neural Network |
| US10943166B2 (en) | 2017-02-10 | 2021-03-09 | Kneron, Inc. | Pooling operation device and method for convolutional neural network |
| TWI607389B (en)* | 2017-02-10 | 2017-12-01 | 耐能股份有限公司 | Pool computing operation device and method for convolutional neural network |
| CN106991474B (en)* | 2017-03-28 | 2019-09-24 | 华中科技大学 | The parallel full articulamentum method for interchanging data of deep neural network model and system |
| CN106991474A (en)* | 2017-03-28 | 2017-07-28 | 华中科技大学 | The parallel full articulamentum method for interchanging data of deep neural network model and system |
| CN106991999B (en)* | 2017-03-29 | 2020-06-02 | 北京小米移动软件有限公司 | Voice recognition method and device |
| CN106991999A (en)* | 2017-03-29 | 2017-07-28 | 北京小米移动软件有限公司 | Audio recognition method and device |
| CN108804974A (en)* | 2017-04-27 | 2018-11-13 | 上海鲲云信息科技有限公司 | Method and system for resource estimation and configuration of hardware architecture of target detection algorithm |
| CN108229645A (en)* | 2017-04-28 | 2018-06-29 | 北京市商汤科技开发有限公司 | Convolution accelerates and computation processing method, device, electronic equipment and storage medium |
| CN108229645B (en)* | 2017-04-28 | 2021-08-06 | 北京市商汤科技开发有限公司 | Convolution acceleration and computing processing method, device, electronic device and storage medium |
| US11429852B2 (en) | 2017-04-28 | 2022-08-30 | Beijing Sensetime Technology Development Co., Ltd. | Convolution acceleration and computing processing method and apparatus, electronic device, and storage medium |
| WO2018196863A1 (en)* | 2017-04-28 | 2018-11-01 | 北京市商汤科技开发有限公司 | Convolution acceleration and calculation processing methods and apparatuses, electronic device and storage medium |
| CN107229969A (en)* | 2017-06-21 | 2017-10-03 | 郑州云海信息技术有限公司 | A kind of convolutional neural networks implementation method and device based on FPGA |
| CN107451654B (en)* | 2017-07-05 | 2021-05-18 | 深圳市自行科技有限公司 | Acceleration operation method of convolutional neural network, server and storage medium |
| CN107451653A (en)* | 2017-07-05 | 2017-12-08 | 深圳市自行科技有限公司 | Computational methods, device and the readable storage medium storing program for executing of deep neural network |
| CN107451654A (en)* | 2017-07-05 | 2017-12-08 | 深圳市自行科技有限公司 | Acceleration operation method, server and the storage medium of convolutional neural networks |
| CN107451659A (en)* | 2017-07-27 | 2017-12-08 | 清华大学 | Neutral net accelerator and its implementation for bit wide subregion |
| CN107622305A (en)* | 2017-08-24 | 2018-01-23 | 中国科学院计算技术研究所 | Processor and processing method for neural network |
| CN107689223A (en)* | 2017-08-30 | 2018-02-13 | 北京嘉楠捷思信息技术有限公司 | Audio identification method and device |
| CN111860815A (en)* | 2017-08-31 | 2020-10-30 | 中科寒武纪科技股份有限公司 | Method and device for convolution operation |
| WO2019055224A1 (en)* | 2017-09-14 | 2019-03-21 | Xilinx, Inc. | System and method for implementing neural networks in integrated circuits |
| US10839286B2 (en) | 2017-09-14 | 2020-11-17 | Xilinx, Inc. | System and method for implementing neural networks in integrated circuits |
| EP3682378A1 (en)* | 2017-09-14 | 2020-07-22 | Xilinx, Inc. | System and method for implementing neural networks in integrated circuits |
| CN107564522A (en)* | 2017-09-18 | 2018-01-09 | 郑州云海信息技术有限公司 | A kind of intelligent control method and device |
| CN107656899A (en)* | 2017-09-27 | 2018-02-02 | 深圳大学 | A kind of mask convolution method and system based on FPGA |
| CN107749044A (en)* | 2017-10-19 | 2018-03-02 | 珠海格力电器股份有限公司 | Image information pooling method and device |
| CN109754062A (en)* | 2017-11-07 | 2019-05-14 | 上海寒武纪信息科技有限公司 | Execution method of convolution expansion instruction and related products |
| CN109754062B (en)* | 2017-11-07 | 2024-05-14 | 上海寒武纪信息科技有限公司 | Execution method of convolution expansion instruction and related products |
| CN107844833A (en)* | 2017-11-28 | 2018-03-27 | 郑州云海信息技术有限公司 | A kind of data processing method of convolutional neural networks, device and medium |
| CN108009631A (en)* | 2017-11-30 | 2018-05-08 | 睿视智觉(深圳)算法技术有限公司 | A kind of VGG-16 general purpose processing blocks and its control method based on FPGA |
| CN110574371B (en)* | 2017-12-08 | 2021-12-21 | 百度时代网络技术(北京)有限公司 | Stereo camera depth determination using hardware accelerators |
| CN110574371A (en)* | 2017-12-08 | 2019-12-13 | 百度时代网络技术(北京)有限公司 | Stereo Camera Depth Determination Using Hardware Accelerators |
| US11182917B2 (en) | 2017-12-08 | 2021-11-23 | Baidu Usa Llc | Stereo camera depth determination using hardware accelerator |
| CN109961133A (en)* | 2017-12-14 | 2019-07-02 | 北京中科寒武纪科技有限公司 | Integrated circuit chip device and Related product |
| CN108416422B (en)* | 2017-12-29 | 2024-03-01 | 国民技术股份有限公司 | FPGA-based convolutional neural network implementation method and device |
| CN108416422A (en)* | 2017-12-29 | 2018-08-17 | 国民技术股份有限公司 | A kind of convolutional neural networks implementation method and device based on FPGA |
| CN108388943B (en)* | 2018-01-08 | 2020-12-29 | 中国科学院计算技术研究所 | A pooling device and method suitable for neural networks |
| CN108388943A (en)* | 2018-01-08 | 2018-08-10 | 中国科学院计算技术研究所 | A kind of pond device and method suitable for neural network |
| CN108154229A (en)* | 2018-01-10 | 2018-06-12 | 西安电子科技大学 | Accelerate the image processing method of convolutional neural networks frame based on FPGA |
| CN108154229B (en)* | 2018-01-10 | 2022-04-08 | 西安电子科技大学 | Image processing method based on FPGA accelerated convolutional neural network framework |
| CN108362628A (en)* | 2018-01-11 | 2018-08-03 | 天津大学 | The n cell flow-sorting methods of flow cytometer are imaged based on polarizing diffraction |
| WO2019136756A1 (en)* | 2018-01-15 | 2019-07-18 | 深圳鲲云信息科技有限公司 | Design model establishing method and system for artificial intelligent processing device, storage medium, and terminal |
| WO2019136747A1 (en)* | 2018-01-15 | 2019-07-18 | 深圳鲲云信息科技有限公司 | Deconvolver and an artificial intelligence processing device applied by same |
| US11874898B2 (en) | 2018-01-15 | 2024-01-16 | Shenzhen Corerain Technologies Co., Ltd. | Streaming-based artificial intelligence convolution processing method and apparatus, readable storage medium and terminal |
| CN110178146B (en)* | 2018-01-15 | 2023-05-12 | 深圳鲲云信息科技有限公司 | Deconvolutor and artificial intelligence processing device applied by deconvolutor |
| CN109313723B (en)* | 2018-01-15 | 2022-03-15 | 深圳鲲云信息科技有限公司 | Artificial intelligence convolution processing method and device, readable storage medium and terminal |
| CN110178146A (en)* | 2018-01-15 | 2019-08-27 | 深圳鲲云信息科技有限公司 | Deconvolution device and its applied artificial intelligence process device |
| CN109313723A (en)* | 2018-01-15 | 2019-02-05 | 深圳鲲云信息科技有限公司 | Artificial intelligence convolution processing method, device, readable storage medium, and terminal |
| CN109416756A (en)* | 2018-01-15 | 2019-03-01 | 深圳鲲云信息科技有限公司 | Convolver and its applied artificial intelligence processing device |
| CN110134379A (en)* | 2018-02-08 | 2019-08-16 | 广达电脑股份有限公司 | Computer system, programming method, and non-transitory computer readable medium |
| US11568232B2 (en) | 2018-02-08 | 2023-01-31 | Quanta Computer Inc. | Deep learning FPGA converter |
| TWI709088B (en)* | 2018-02-08 | 2020-11-01 | 廣達電腦股份有限公司 | Computing system, prgramming method, and non-transitory computer-readable medium |
| CN110134379B (en)* | 2018-02-08 | 2022-11-22 | 广达电脑股份有限公司 | Computer system, programming method and non-transitory computer readable medium |
| CN108108809B (en)* | 2018-03-05 | 2021-03-02 | 山东领能电子科技有限公司 | Hardware architecture for reasoning and accelerating convolutional neural network and working method thereof |
| CN108108809A (en)* | 2018-03-05 | 2018-06-01 | 山东领能电子科技有限公司 | A kind of hardware structure and its method of work that acceleration is made inferences for convolutional Neural metanetwork |
| CN108537330A (en)* | 2018-03-09 | 2018-09-14 | 中国科学院自动化研究所 | Convolutional calculation device and method applied to neural network |
| CN108256636A (en)* | 2018-03-16 | 2018-07-06 | 成都理工大学 | A kind of convolutional neural networks algorithm design implementation method based on Heterogeneous Computing |
| CN108710892B (en)* | 2018-04-04 | 2020-09-01 | 浙江工业大学 | Cooperative immune defense method for multiple anti-picture attacks |
| CN108710892A (en)* | 2018-04-04 | 2018-10-26 | 浙江工业大学 | Synergetic immunity defence method towards a variety of confrontation picture attacks |
| CN108615076B (en)* | 2018-04-08 | 2020-09-11 | 瑞芯微电子股份有限公司 | Deep learning chip-based data storage optimization method and device |
| CN108615076A (en)* | 2018-04-08 | 2018-10-02 | 福州瑞芯微电子股份有限公司 | A kind of data store optimization method and apparatus based on deep learning chip |
| CN108520300A (en)* | 2018-04-09 | 2018-09-11 | 郑州云海信息技术有限公司 | A method and device for implementing a deep learning network |
| CN108470211A (en)* | 2018-04-09 | 2018-08-31 | 郑州云海信息技术有限公司 | A kind of implementation method of convolutional calculation, equipment and computer storage media |
| CN110399976A (en)* | 2018-04-25 | 2019-11-01 | 华为技术有限公司 | Calculation device and calculation method |
| CN108549935A (en)* | 2018-05-03 | 2018-09-18 | 济南浪潮高新科技投资发展有限公司 | A kind of device and method for realizing neural network model |
| US11531880B2 (en) | 2018-05-08 | 2022-12-20 | Huazhong University Of Science And Technology | Memory-based convolutional neural network system |
| CN108805270A (en)* | 2018-05-08 | 2018-11-13 | 华中科技大学 | A kind of convolutional neural networks system based on memory |
| CN108595379A (en)* | 2018-05-08 | 2018-09-28 | 济南浪潮高新科技投资发展有限公司 | A kind of parallelization convolution algorithm method and system based on multi-level buffer |
| WO2019227518A1 (en)* | 2018-05-08 | 2019-12-05 | 华中科技大学 | Convolutional neural network system based on memory |
| CN108805267A (en)* | 2018-05-28 | 2018-11-13 | 重庆大学 | The data processing method hardware-accelerated for convolutional neural networks |
| CN108805267B (en)* | 2018-05-28 | 2021-09-10 | 重庆大学 | Data processing method for hardware acceleration of convolutional neural network |
| CN108764182A (en)* | 2018-06-01 | 2018-11-06 | 阿依瓦(北京)技术有限公司 | A kind of acceleration method and device for artificial intelligence of optimization |
| WO2019233228A1 (en)* | 2018-06-08 | 2019-12-12 | Oppo广东移动通信有限公司 | Electronic device and device control method |
| CN109086879A (en)* | 2018-07-05 | 2018-12-25 | 东南大学 | A kind of implementation method of the dense Connection Neural Network based on FPGA |
| CN109032781A (en)* | 2018-07-13 | 2018-12-18 | 重庆邮电大学 | A kind of FPGA parallel system of convolutional neural networks algorithm |
| CN109117949A (en)* | 2018-08-01 | 2019-01-01 | 南京天数智芯科技有限公司 | Flexible data stream handle and processing method for artificial intelligence equipment |
| CN109036459A (en)* | 2018-08-22 | 2018-12-18 | 百度在线网络技术(北京)有限公司 | Sound end detecting method, device, computer equipment, computer storage medium |
| CN109102070A (en)* | 2018-08-22 | 2018-12-28 | 地平线(上海)人工智能技术有限公司 | The preprocess method and device of convolutional neural networks data |
| CN109214506A (en)* | 2018-09-13 | 2019-01-15 | 深思考人工智能机器人科技(北京)有限公司 | A kind of convolutional neural networks establish device and method |
| CN109214506B (en)* | 2018-09-13 | 2022-04-15 | 深思考人工智能机器人科技(北京)有限公司 | Convolutional neural network establishing device and method based on pixels |
| WO2020052266A1 (en)* | 2018-09-14 | 2020-03-19 | Huawei Technologies Co., Ltd. | System and method for cascaded max pooling in neural networks |
| WO2020052265A1 (en)* | 2018-09-14 | 2020-03-19 | Huawei Technologies Co., Ltd. | System and method for cascaded dynamic max pooling in neural networks |
| CN109359732A (en)* | 2018-09-30 | 2019-02-19 | 阿里巴巴集团控股有限公司 | A chip and a data processing method based thereon |
| US11361217B2 (en) | 2018-09-30 | 2022-06-14 | Advanced New Technologies Co., Ltd. | Chip and chip-based data processing method |
| US11062201B2 (en) | 2018-09-30 | 2021-07-13 | Advanced New Technologies Co., Ltd. | Chip and chip-based data processing method |
| CN109376843A (en)* | 2018-10-12 | 2019-02-22 | 山东师范大学 | FPGA-based fast classification method, realization method and device of EEG signal |
| CN109146067A (en)* | 2018-11-19 | 2019-01-04 | 东北大学 | A kind of Policy convolutional neural networks accelerator based on FPGA |
| CN109146067B (en)* | 2018-11-19 | 2021-11-05 | 东北大学 | An FPGA-based Policy Convolutional Neural Network Accelerator |
| CN109670578A (en)* | 2018-12-14 | 2019-04-23 | 北京中科寒武纪科技有限公司 | Neural network first floor convolution layer data processing method, device and computer equipment |
| CN109711539A (en)* | 2018-12-17 | 2019-05-03 | 北京中科寒武纪科技有限公司 | Operation method, device and Related product |
| CN109800867B (en)* | 2018-12-17 | 2020-09-29 | 北京理工大学 | Data calling method based on FPGA off-chip memory |
| CN109740748A (en)* | 2019-01-08 | 2019-05-10 | 西安邮电大学 | An FPGA-based Convolutional Neural Network Accelerator |
| CN109784483B (en)* | 2019-01-24 | 2022-09-09 | 电子科技大学 | In-memory computing accelerator for binarized convolutional neural network based on FD-SOI process |
| CN109784483A (en)* | 2019-01-24 | 2019-05-21 | 电子科技大学 | In-memory computing accelerator for binarized convolutional neural network based on FD-SOI process |
| CN109871939A (en)* | 2019-01-29 | 2019-06-11 | 深兰人工智能芯片研究院(江苏)有限公司 | A kind of image processing method and image processing apparatus |
| CN109615067A (en)* | 2019-03-05 | 2019-04-12 | 深兰人工智能芯片研究院(江苏)有限公司 | A data scheduling method and device for a convolutional neural network |
| TWI696129B (en)* | 2019-03-15 | 2020-06-11 | 華邦電子股份有限公司 | Memory chip capable of performing artificial intelligence operation and operation method thereof |
| CN110032374B (en)* | 2019-03-21 | 2023-04-07 | 深兰科技(上海)有限公司 | Parameter extraction method, device, equipment and medium |
| CN110032374A (en)* | 2019-03-21 | 2019-07-19 | 深兰科技(上海)有限公司 | A kind of parameter extracting method, device, equipment and medium |
| CN110084363B (en)* | 2019-05-15 | 2023-04-25 | 电科瑞达(成都)科技有限公司 | A Method of Accelerating Deep Learning Model Based on FPGA Platform |
| CN110084363A (en)* | 2019-05-15 | 2019-08-02 | 电科瑞达(成都)科技有限公司 | A kind of deep learning model accelerated method based on FPGA platform |
| CN110223687A (en)* | 2019-06-03 | 2019-09-10 | Oppo广东移动通信有限公司 | Instruction execution method, device, storage medium and electronic device |
| CN110209627A (en)* | 2019-06-03 | 2019-09-06 | 山东浪潮人工智能研究院有限公司 | A kind of hardware-accelerated method of SSD towards intelligent terminal |
| CN110727634A (en)* | 2019-07-05 | 2020-01-24 | 中国科学院计算技术研究所 | Embedded intelligent computer architecture for object-end data processing |
| CN110727634B (en)* | 2019-07-05 | 2021-10-29 | 中国科学院计算技术研究所 | Embedded intelligent computer system for object-end data processing |
| CN110458279B (en)* | 2019-07-15 | 2022-05-20 | 武汉魅瞳科技有限公司 | An FPGA-based binary neural network acceleration method and system |
| CN110458279A (en)* | 2019-07-15 | 2019-11-15 | 武汉魅瞳科技有限公司 | An FPGA-based binary neural network acceleration method and system |
| CN110472442A (en)* | 2019-08-20 | 2019-11-19 | 厦门理工学院 | An IP Core for Automatically Detecting Hardware Trojans |
| TWI724515B (en)* | 2019-08-27 | 2021-04-11 | 聯智科創有限公司 | Machine learning service delivery method |
| CN110619387B (en)* | 2019-09-12 | 2023-06-20 | 复旦大学 | Channel expansion method based on convolutional neural network |
| CN110619387A (en)* | 2019-09-12 | 2019-12-27 | 复旦大学 | Channel expansion method based on convolutional neural network |
| CN110689088A (en)* | 2019-10-09 | 2020-01-14 | 山东大学 | CNN-based LIBS ore spectral data classification method and device |
| CN110910434A (en)* | 2019-11-05 | 2020-03-24 | 东南大学 | An energy-efficient method for deep learning disparity estimation algorithm based on FPGA |
| CN110910434B (en)* | 2019-11-05 | 2023-05-12 | 东南大学 | Method for realizing deep learning parallax estimation algorithm based on FPGA (field programmable Gate array) high energy efficiency |
| CN110880038A (en)* | 2019-11-29 | 2020-03-13 | 中国科学院自动化研究所 | System for accelerating convolution calculation based on FPGA and convolution neural network |
| CN110991632A (en)* | 2019-11-29 | 2020-04-10 | 电子科技大学 | A Design Method of Heterogeneous Neural Network Computing Accelerator Based on FPGA |
| CN110880038B (en)* | 2019-11-29 | 2022-07-01 | 中国科学院自动化研究所 | FPGA-based system for accelerating convolution computing, convolutional neural network |
| CN111008629A (en)* | 2019-12-07 | 2020-04-14 | 怀化学院 | Cortex-M3-based method for identifying number of tip |
| CN110928318A (en)* | 2019-12-31 | 2020-03-27 | 苏州清研微视电子科技有限公司 | FPGA-based binocular vision assisted driving system |
| US11593609B2 (en) | 2020-02-18 | 2023-02-28 | Stmicroelectronics S.R.L. | Vector quantization decoding hardware unit for real-time dynamic decompression for parameters of neural networks |
| CN111310921B (en)* | 2020-03-27 | 2022-04-19 | 西安电子科技大学 | FPGA implementation method of lightweight deep convolutional neural network |
| CN111310921A (en)* | 2020-03-27 | 2020-06-19 | 西安电子科技大学 | An FPGA Implementation Method of Lightweight Deep Convolutional Neural Network |
| CN111667053A (en)* | 2020-06-01 | 2020-09-15 | 重庆邮电大学 | Novel convolutional neural network accelerator and forward propagation calculation acceleration method thereof |
| CN111667053B (en)* | 2020-06-01 | 2023-05-09 | 重庆邮电大学 | A Forward Propagation Calculation Acceleration Method for Convolutional Neural Network Accelerator |
| US11531873B2 (en) | 2020-06-23 | 2022-12-20 | Stmicroelectronics S.R.L. | Convolution acceleration with embedded vector decompression |
| CN111832718A (en)* | 2020-06-24 | 2020-10-27 | 上海西井信息科技有限公司 | Chip architecture |
| CN111860773A (en)* | 2020-06-30 | 2020-10-30 | 北京百度网讯科技有限公司 | Processing apparatus and method for information processing |
| CN111860773B (en)* | 2020-06-30 | 2023-07-28 | 北京百度网讯科技有限公司 | Processing apparatus and method for information processing |
| CN112508184B (en)* | 2020-12-16 | 2022-04-29 | 重庆邮电大学 | Design method of fast image recognition accelerator based on convolutional neural network |
| CN112508184A (en)* | 2020-12-16 | 2021-03-16 | 重庆邮电大学 | Design method of fast image recognition accelerator based on convolutional neural network |
| CN113012689A (en)* | 2021-04-15 | 2021-06-22 | 成都爱旗科技有限公司 | Electronic equipment and deep learning hardware acceleration method |
| CN113012689B (en)* | 2021-04-15 | 2023-04-07 | 成都爱旗科技有限公司 | Electronic equipment and deep learning hardware acceleration method |
| CN113762491B (en)* | 2021-08-10 | 2023-06-30 | 南京工业大学 | Convolutional neural network accelerator based on FPGA |
| CN113762491A (en)* | 2021-08-10 | 2021-12-07 | 南京工业大学 | An FPGA-based Convolutional Neural Network Accelerator |
| CN113850814A (en)* | 2021-09-26 | 2021-12-28 | 华南农业大学 | A CNN model-based method for identification of litchi leaf diseases and insect pests |
| CN114299514A (en)* | 2021-12-07 | 2022-04-08 | 南京理工大学 | Method for realizing handwritten number recognition |
| WO2023155369A1 (en)* | 2022-02-21 | 2023-08-24 | 山东浪潮科学研究院有限公司 | Depthwise convolution optimization method and system based on micro-architecture processor, and device |
| CN116718894A (en)* | 2023-06-19 | 2023-09-08 | 上饶市广强电子科技有限公司 | Circuit stability test method and system for corn lamp |
| CN116718894B (en)* | 2023-06-19 | 2024-03-29 | 上饶市广强电子科技有限公司 | Circuit stability test method and system for corn lamp |
| Publication number | Publication date |
|---|---|
| CN106228240B (en) | 2020-09-01 |
| Publication | Publication Date | Title |
|---|---|---|
| CN106228240B (en) | Deep convolution neural network implementation method based on FPGA | |
| CN109284817B (en) | Deep separable convolutional neural network processing architecture/method/system and medium | |
| CN111967468B (en) | Implementation method of lightweight target detection neural network based on FPGA | |
| CN108427990B (en) | Neural network computing system and method | |
| EP3407266B1 (en) | Artificial neural network calculating device and method for sparse connection | |
| CN110383300B (en) | A computing device and method | |
| CN110390385A (en) | A Configurable Parallel General Convolutional Neural Network Accelerator Based on BNRP | |
| CN113051216B (en) | MobileNet-SSD target detection device and method based on FPGA acceleration | |
| CN113298237B (en) | An on-chip training accelerator for convolutional neural networks based on FPGA | |
| CN109447241B (en) | A Dynamic Reconfigurable Convolutional Neural Network Accelerator Architecture for the Internet of Things | |
| CN111210019B (en) | A neural network inference method based on software and hardware co-acceleration | |
| CN108090565A (en) | Accelerated method is trained in a kind of convolutional neural networks parallelization | |
| CN108665059A (en) | Convolutional neural networks acceleration system based on field programmable gate array | |
| CN107239824A (en) | Apparatus and method for realizing sparse convolution neutral net accelerator | |
| CN104915322A (en) | Method for accelerating convolution neutral network hardware and AXI bus IP core thereof | |
| CN111582451B (en) | Image recognition interlayer parallel pipeline type binary convolution neural network array architecture | |
| CN111126590B (en) | Device and method for artificial neural network operation | |
| CN108665063A (en) | Two-way simultaneous for BNN hardware accelerators handles convolution acceleration system | |
| CN113392963B (en) | FPGA-based CNN hardware acceleration system design method | |
| CN108960414B (en) | Method for realizing single broadcast multiple operations based on deep learning accelerator | |
| CN110991630A (en) | Convolutional neural network processor for edge calculation | |
| CN110991631A (en) | Neural network acceleration system based on FPGA | |
| CN111275167A (en) | High-energy-efficiency pulse array framework for binary convolutional neural network | |
| CN116187407A (en) | A realization system and method based on systolic array self-attention mechanism | |
| CN112052941B (en) | Efficient memory calculation system applied to CNN (computer numerical network) convolution layer and operation method thereof |
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |