Movatterモバイル変換


[0]ホーム

URL:


CN109102065B - Convolutional neural network accelerator based on PSoC - Google Patents

Convolutional neural network accelerator based on PSoC
Download PDF

Info

Publication number
CN109102065B
CN109102065BCN201810689938.7ACN201810689938ACN109102065BCN 109102065 BCN109102065 BCN 109102065BCN 201810689938 ACN201810689938 ACN 201810689938ACN 109102065 BCN109102065 BCN 109102065B
Authority
CN
China
Prior art keywords
memory
module
neural network
convolutional neural
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810689938.7A
Other languages
Chinese (zh)
Other versions
CN109102065A (en
Inventor
熊晓明
李子聪
曾宇航
胡湘宏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chipeye Microelectronics Foshan Ltd
Guangdong University of Technology
Original Assignee
Chipeye Microelectronics Foshan Ltd
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chipeye Microelectronics Foshan Ltd, Guangdong University of TechnologyfiledCriticalChipeye Microelectronics Foshan Ltd
Priority to CN201810689938.7ApriorityCriticalpatent/CN109102065B/en
Publication of CN109102065ApublicationCriticalpatent/CN109102065A/en
Application grantedgrantedCritical
Publication of CN109102065BpublicationCriticalpatent/CN109102065B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Landscapes

Abstract

Translated fromChinese

本专利公开了一种基于PSoC器件构建的卷积神经网络加速器,包括片外存储器、CPU、特征图输入存储器、特征图输出存储器、偏置存储器、权重存储器、直接内存存取与神经元数目相同的计算单元,所述计算单元包括先入先出队列、状态机、数据选择器、平均值池化模块、最大值池化模块、乘加计算模块、激活函数模块,所构成的乘加计算模块内计算是并行执行的,可用于多种架构的卷积神经网络系统。本发明充分利用片上可编程系统(PSoC,Programmable System on Chip)器件中可编程部分实现计算量大,并行性高的卷积神经网络计算部分,利用CPU实现串行算法及状态控制。

Figure 201810689938

This patent discloses a convolutional neural network accelerator based on PSoC device, including off-chip memory, CPU, feature map input memory, feature map output memory, bias memory, weight memory, direct memory access and the same number of neurons The computing unit includes a first-in, first-out queue, a state machine, a data selector, an average value pooling module, a maximum value pooling module, a multiply-add calculation module, and an activation function module. Computations are performed in parallel and can be used in convolutional neural network systems of various architectures. The invention makes full use of the programmable part in the programmable system on chip (PSoC, Programmable System on Chip) device to realize the convolutional neural network calculation part with large amount of calculation and high parallelism, and uses the CPU to realize serial algorithm and state control.

Figure 201810689938

Description

Convolutional neural network accelerator based on PSoC
Technical Field
The invention relates to a convolutional neural network structure technology, in particular to a convolutional neural network accelerator based on PSoC.
Background
The convolution neural network has unique superiority in image processing by local weight sharing, the layout of the convolution neural network is closer to the actual biological neural network, the complexity of the neural network is reduced by sharing the weight, and the calculation amount of the neural network is reduced. At present, the convolutional neural network is widely applied to the fields of video monitoring, machine vision, mode recognition, image search and the like.
But the realization of the hardware of the convolution network needs a large amount of hardware resources, and has the problems of low bandwidth utilization rate, low data multiplexing and the like. Convolutional neural networks need to support convolutional operation, pooling operation and full-link operation of different sizes, and many applications of convolutional neural networks usually include a picture processing part, so that the pure hardware logic implementation of an FPGA (Field Programmable Gate Array) limits expandability. For the implementation of the convolutional neural network, the network implemented by hardware is fixed, the bandwidth utilization rate is low, and the convolutional neural network supporting other structures cannot be expanded. The PSoC device has a hardware programmable part and a software programming characteristic, and is considered to be a suitable platform for realizing the convolutional neural network.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a PSoC-based convolutional neural network accelerator, wherein the programmable part of the hardware of the whole neural network accelerator can be simplified into a multiplication and addition calculation module, an activation function module, a maximum pooling module and an average pooling module. The multiplication and addition operations in all the multiplication and addition calculation modules are calculated in parallel, convolution calculation with different convolution kernel sizes is supported, and the problems that the calculation amount of a roll-in neural network is large and the bandwidth requirement is large are solved. The software part solves the softmax classifier and non-maximum suppression algorithm and image processing algorithm which can not be realized by hardware logic, and solves the configuration of the convolutional neural network with different network structures.
The purpose of the invention is realized by the following technical scheme: a PSoC-based convolutional neural network accelerator, comprising: an off-chip memory, a CPU, a characteristic diagram input memory, a characteristic diagram output memory, an offset memory, a weight memory, a direct memory access DMA and a calculation unit with the same number as the neurons,
the direct memory storage DMA is read from the off-chip memory and transmitted to the characteristic diagram input memory, the offset memory and the weight memory under the control of the CPU, or data of the characteristic diagram data memory is written back to the off-chip memory, and the CPU needs to control the storage positions of the input characteristic diagram, the offset, the weight and the output characteristic diagram in the off-chip memory and the parameter transmission of the multilayer convolutional neural network so as to adapt to neural networks with various architectures.
Further, the computing unit comprises a first-in first-out queue, a state machine, a first data selector, a second data selector, an average value pooling module, a maximum value pooling module, a multiply-add computing module and an activation function module,
wherein the first data selector is in communication with the feature map input memory, the input feature map input data is input to the mean pooling module, the maximum pooling module, the multiply-add calculation module, and the activation function module via the first data selector,
the second data selector is communicated with the feature map output memory, and output results of the average value pooling module, the maximum value pooling module and the multiply-add calculation module are selectively output to the feature map output memory through the second data selector.
Further, the multiplication and addition calculation module is based on a combined structure of a multiplication and addition tree and a multiplication and addition register and comprises an input characteristic diagram matrix, a weight input matrix and a bias matrix.
Further, the activation function module includes a first configuration register, a first selector, a first multiplier, and a first adder, and is configured to implement a tangent function, a sigmoid function, and a ReLU function, and the CPU configures the first configuration register of the activation function module to implement the activation function through hardware logic.
Further, the average pooling module comprises a second configuration register, a second multiplier and a second adder, and the average pooling module is configured by the CPU to realize pooling of the matrix average and obtain the matrix average.
Further, the maximum pooling module comprises a third configuration register, a comparator and a second selector, the maximum pooling module is configured through the CPU, the maximum pooling of the matrix is realized, and each data in the matrix is compared to obtain a maximum value.
Compared with the prior art, the invention has the following advantages and effects: the whole convolutional neural network is controlled by a CPU to perform data storage allocation and data transmission, a data selector performs data allocation under the control of a state machine and transmits the data allocation to a multiply-add computing module, an activation function computing module, a maximum pooling module and an average pooling module, and meanwhile, the CPU performs algorithms such as image processing, a softmax classifier and a non-maximum suppression algorithm.
Drawings
FIG. 1 is a diagram of a PSoC-based convolutional neural network accelerator of the present invention;
FIG. 2 is a block diagram of a multiply-add calculation module according to the present invention;
FIG. 3 is a block diagram of an activation function module of the present invention;
FIG. 4 is a block diagram of the mean pooling module of the present invention;
FIG. 5 is a block diagram of a maximum pooling module of the present invention;
FIG. 6 is a CPU software flow diagram of the present invention.
Detailed Description
The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.
Example one
In order to increase the computation amount of the convolutional neural network, increase the parallel processing efficiency, and reduce the bandwidth requirement, the present invention provides a convolutional neural network accelerator 100 based on a PSoC shown in fig. 1, which includes: an off-chip memory 101, a CPU102, a featuremap input memory 103, a featuremap output memory 104, anoffset memory 105, aweight memory 106, a direct memory access DMA107, and acalculation unit 108 having the same number of neurons.
The direct memory storage DMA107 reads data transferred from the off-chip memory 101 to theprofile input memory 103, theoffset memory 105, and theweight memory 106 under the control of the CPU102, or writes data of theprofile output memory 104 back to the off-chip memory 101. The CPU102 needs to control the storage locations of the input feature map, the offset, the weight, the output feature map in the off-chip memory, and the parameter transmission of the multi-layer convolutional neural network to adapt to neural networks of various architectures.
Thecalculation unit 108 with the same number of neurons includes a first-in-first-out queue, astate machine 109, afirst data selector 110, a second data selector 111, anaverage pooling module 112, amaximum pooling module 113, a multiply-add calculation module 114, and anactivation function module 115, wherein thefirst data selector 110 is in communication with the featuremap input memory 103, input feature map input data is input to theaverage pooling module 112, themaximum pooling module 113, the multiply-add calculation module 114, and theactivation function module 115 through thefirst data selector 110, the second data selector 111 is in communication with the featuremap output memory 104, and output results of theaverage pooling module 112, themaximum pooling module 113, and the multiply-add calculation module 114 are selected and output to the featuremap output memory 104 through the second data selector 111.
As shown in fig. 2, the multiply-add calculation module is based on a structure combining a multiply-add tree and a multiply-add register, and includes an input feature map matrix, a weight input matrix, and a bias matrix. The structure can realize parallel and efficient completion of convolution operation, and cannot reduce the utilization rate of the multiplier when convolution kernels with different sizes are realized.
As shown in fig. 3, the activation function module includes a first configuration register, a first selector, a first multiplier, and a first adder, and is configured to implement a tangent function, a sigmoid function, and a ReLU function, and the CPU configures the first configuration register of the activation function module to implement the activation function through hardware logic.
As shown in fig. 4, the average pooling module includes a second configuration register, a second multiplier, and a second adder. And configuring an average value pooling module through the CPU, wherein the m value can be configured, so that the pooling of the m × m average value is realized, and the m × m matrix average value is obtained.
As shown in fig. 5, the maximum pooling module includes a third configuration register, a comparator, and a second selector. And the CPU is configured with a maximum value pooling module, the k value is configurable, k × k maximum value pooling is realized, and each data in the k × k matrix is compared to obtain a maximum value.
Example two
Correspondingly, the invention further describes a method flow of the convolutional neural network calculation by the convolutional neural network accelerator based on the PSoC in combination with FIG. 6.
The CPU can be programmed in embedded software, the construction of a deep convolutional neural network is realized in the software programming, and the deep convolutional neural network is input into a relevant processor and is used for transmitting a command value control register through bus configuration.
Examples of configuration commands are shown in the following table:
the first layer input is x1 input feature map data and x3 weight data, and the calculation results are input into a maximum value pooling module and an activation function module to obtain x2 output feature map data.
Figure BDA0001712240430000041
The convolution layer output characteristic diagram has M layers in the storage form of the off-chip memory, and M takes the value of 1,3,5,7 … …. The output characteristic diagram of the M layer is the input characteristic diagram of the M +1 layer, the output characteristic diagram of the M layer is stored in the address space with the address A1 as the starting address, and the output characteristic diagram of the M +1 layer is stored in the address space with the address A2 as the starting address.
In a particular application, the computations within the convolutional neural network layers are performed in parallel. The whole network implementation process is as follows:
(1) the software of theprocessor 102 controls the image processing, and the sample data is stored in the off-chip memory 101;
(2) theprocessor 102 controls the DMA107 to read off-chip memory data to thefirst data selector 110 while configuring the multiply-add calculation unit 114, theaverage pooling module 112, themaximum pooling module 113, theactivation function module 115, and thestate machine 109 via theprocessor 102. Configuration information includes, but is not limited to, convolution computation step size, convolution kernel size, activation function type, mean pooling size, maximum pooling block size.
(3) Data is transferred from the DMA to theprofile input memory 103,offset memory 105,weight memory 106 under control of thestate machine 109.
(4) The data is input into the multiply-add calculation unit 114, theactivation function module 115, theaverage pooling module 112 or themaximum pooling module 113 to obtain the calculation result.
(5) Under state machine control, data is transferred from the multiply-add computation unit 114, theactivation function 115, themean pooling module 112 or themaximum pooling module 113 to the data selector and to the off-chip memory 101.
At this point, the whole network completes one layer of results, and the network completes multiple layers of results in a circulating manner.
In a word, the programmable part of the whole convolutional neural network accelerator hardware can be simplified into a multiply-add computing module, an activation function module, a maximum value pooling module and an average value pooling module, the multiply-add operations in all the multiply-add computing modules are computed in parallel, convolutional computations with different convolutional kernel sizes are supported, pooling computations with different sizes are supported, a Softmax classifier and a non-maximum value suppression algorithm which cannot be realized by hardware logic are realized through CPU software design of the convolutional neural network accelerator, convolutional neural network computations are completed through configuration of convolutional neural networks supporting different network structures, the problems that the amount of computation of a convolutional neural network is large, the bandwidth requirement is large are solved, and the convolutional neural network algorithms supporting different structures can be configured.
The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims (5)

1. A PSoC-based convolutional neural network accelerator, comprising: the device comprises an off-chip memory, a CPU, a characteristic diagram input memory, a characteristic diagram output memory, an offset memory, a weight memory, a Direct Memory Access (DMA) and a computing unit with the same number as the neurons;
the direct memory storage DMA is read from an off-chip memory and transmitted to a characteristic diagram input memory, an offset memory and a weight memory under the control of a CPU (central processing unit), or data of a characteristic diagram data memory is written back to the off-chip memory, the CPU needs to control the storage positions of an input characteristic diagram, an offset, a weight and an output characteristic diagram in the off-chip memory and the parameter transmission of a multilayer convolutional neural network so as to adapt to neural networks with various architectures;
the computing unit comprises a first-in first-out queue, a state machine, a first data selector, a second data selector, an average value pooling module, a maximum value pooling module, a multiply-add computing module and an activation function module;
the first data selector is communicated with the feature map input memory, and input feature map input data are input into the average value pooling module, the maximum value pooling module, the multiply-add calculation module and the activation function module through the first data selector;
the second data selector is communicated with the feature map output memory, and output results of the average value pooling module, the maximum value pooling module and the multiply-add calculation module are selectively output to the feature map output memory through the second data selector.
2. The PSoC-based convolutional neural network accelerator according to claim 1, wherein the multiply-add computation module is based on a combination of a multiply-add tree and a multiply-add register, and comprises an input feature map matrix, a weight input matrix and a bias matrix.
3. The PSoC-based convolutional neural network accelerator as claimed in claim 1, wherein said activation function module comprises a first configuration register, a first selector, a first multiplier and a first adder for implementing tangent function, sigmoid function and ReLU function, and the CPU configures the first configuration register of the activation function module to implement the activation function through hardware logic.
4. The PSoC-based convolutional neural network accelerator as claimed in claim 1, wherein the mean pooling module comprises a second configuration register, a second multiplier and a second adder, and the CPU configures the mean pooling module to realize pooling of matrix means to obtain matrix means.
5. The PSoC-based convolutional neural network accelerator as claimed in claim 1, wherein the maximum pooling module comprises a third configuration register, a comparator and a second selector, the maximum pooling module is configured by the CPU to pool the maximum values of the matrices, and each data in the matrices is compared to obtain the maximum value.
CN201810689938.7A2018-06-282018-06-28Convolutional neural network accelerator based on PSoCActiveCN109102065B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201810689938.7ACN109102065B (en)2018-06-282018-06-28Convolutional neural network accelerator based on PSoC

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201810689938.7ACN109102065B (en)2018-06-282018-06-28Convolutional neural network accelerator based on PSoC

Publications (2)

Publication NumberPublication Date
CN109102065A CN109102065A (en)2018-12-28
CN109102065Btrue CN109102065B (en)2022-03-11

Family

ID=64845331

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201810689938.7AActiveCN109102065B (en)2018-06-282018-06-28Convolutional neural network accelerator based on PSoC

Country Status (1)

CountryLink
CN (1)CN109102065B (en)

Families Citing this family (25)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN109948785B (en)*2019-01-312020-11-20瑞芯微电子股份有限公司High-efficiency neural network circuit system and method
CN109918281B (en)*2019-03-122022-07-12中国人民解放军国防科技大学Multi-bandwidth target accelerator efficiency testing method
CN110222815B (en)*2019-04-262021-09-07上海酷芯微电子有限公司Configurable activation function device and method suitable for deep learning hardware accelerator
CN110263925B (en)*2019-06-042022-03-15电子科技大学 A hardware acceleration implementation device for forward prediction of convolutional neural network based on FPGA
CN110390384B (en)*2019-06-252021-07-06东南大学 A Configurable Universal Convolutional Neural Network Accelerator
CN110687392B (en)*2019-09-022024-05-31北京智芯微电子科技有限公司Power system fault diagnosis device and method based on neural network
CN110689122B (en)*2019-09-252022-07-12苏州浪潮智能科技有限公司Storage system and method
CN111047008B (en)*2019-11-122023-08-01天津大学Convolutional neural network accelerator and acceleration method
CN111445018B (en)*2020-03-272023-11-14国网甘肃省电力公司电力科学研究院Ultraviolet imaging real-time information processing method based on accelerating convolutional neural network algorithm
CN111626403B (en)*2020-05-142022-05-10北京航空航天大学Convolutional neural network accelerator based on CPU-FPGA memory sharing
CN111563483B (en)*2020-06-222024-06-11武汉芯昌科技有限公司Image recognition method and system based on compact lenet model
CN111860781B (en)*2020-07-102024-06-28逢亿科技(上海)有限公司Convolutional neural network feature decoding system based on FPGA
CN111860540B (en)*2020-07-202024-01-12深圳大学 Neural network image feature extraction system based on FPGA
CN111966076B (en)*2020-08-112023-06-09广东工业大学 Fault Location Method Based on Finite State Machine and Graph Neural Network
CN112101538B (en)*2020-09-232023-11-17成都市深思创芯科技有限公司Graphic neural network hardware computing system and method based on memory computing
CN112561034A (en)*2020-12-042021-03-26深兰人工智能(深圳)有限公司Neural network accelerating device
CN112784977B (en)*2021-01-152023-09-08北方工业大学 A convolutional neural network accelerator for target detection
CN112905530B (en)*2021-03-292023-05-26上海西井信息科技有限公司On-chip architecture, pooled computing accelerator array, unit and control method
CN113344178A (en)*2021-05-102021-09-03井芯微电子技术(天津)有限公司Method and hardware structure capable of realizing convolution calculation in various neural networks
CN113692592B (en)*2021-07-082022-06-28香港应用科技研究院有限公司 Dynamic Tile Parallel Neural Network Accelerator
CN114492729B (en)*2021-12-212025-09-05杭州未名信科科技有限公司 Convolutional neural network processor, implementation method, electronic device and storage medium
CN114265696B (en)*2021-12-282024-12-20北京航天自动控制研究所 Pooler and pooling acceleration circuit for the maximum pooling layer of convolutional neural network
CN114781629B (en)*2022-04-062024-03-05合肥工业大学 Hardware accelerator and parallel multiplexing method of convolutional neural network based on parallel multiplexing
CN114780481A (en)*2022-04-292022-07-22中国科学技术大学 Reconfigurable Processing Units for Deep Learning
CN116109908A (en)*2022-12-072023-05-12上海无线电设备研究所 Neural network real-time image recognition system and method based on PSoC chip

Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN107239824A (en)*2016-12-052017-10-10北京深鉴智能科技有限公司Apparatus and method for realizing sparse convolution neutral net accelerator
CN107392308A (en)*2017-06-202017-11-24中国科学院计算技术研究所A kind of convolutional neural networks accelerated method and system based on programming device

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US10515304B2 (en)*2015-04-282019-12-24Qualcomm IncorporatedFilter specificity as training criterion for neural networks
KR102835519B1 (en)*2016-09-282025-07-17에스케이하이닉스 주식회사Apparatus and method test operating of convolutional neural network

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN107239824A (en)*2016-12-052017-10-10北京深鉴智能科技有限公司Apparatus and method for realizing sparse convolution neutral net accelerator
CN107392308A (en)*2017-06-202017-11-24中国科学院计算技术研究所A kind of convolutional neural networks accelerated method and system based on programming device

Also Published As

Publication numberPublication date
CN109102065A (en)2018-12-28

Similar Documents

PublicationPublication DateTitle
CN109102065B (en)Convolutional neural network accelerator based on PSoC
CN110046700B (en) Hardware implementation of convolutional layers of deep neural networks
CN107403221B (en)Method and hardware for implementing convolutional neural network, manufacturing method and system
CN110097174B (en)Method, system and device for realizing convolutional neural network based on FPGA and row output priority
US11775430B1 (en)Memory access for multiple circuit components
US11294599B1 (en)Registers for restricted memory
CN111465943B (en)Integrated circuit and method for neural network processing
JP6823495B2 (en) Information processing device and image recognition device
WO2020073211A1 (en)Operation accelerator, processing method, and related device
US20190286974A1 (en)Processing circuit and neural network computation method thereof
US11449459B2 (en)Systems and methods for implementing a machine perception and dense algorithm integrated circuit and enabling a flowing propagation of data within the integrated circuit
CN108647773A (en)A kind of hardwired interconnections framework of restructural convolutional neural networks
US10733498B1 (en)Parametric mathematical function approximation in integrated circuits
GB2602524A (en)Neural network comprising matrix multiplication
US20190340152A1 (en)Reconfigurable reduced instruction set computer processor architecture with fractured cores
CN110929854A (en) A data processing method, device and hardware accelerator
JP2022137247A (en)Processing for a plurality of input data sets
WO2023109748A1 (en)Neural network adjustment method and corresponding apparatus
CN112639726A (en) Method and system for performing parallel computing
US11367498B2 (en)Multi-level memory hierarchy
US11467973B1 (en)Fine-grained access memory controller
CN116648694A (en) Data processing method in chip and chip
CN113673690B (en)Underwater noise classification convolutional neural network accelerator
CN111291871A (en)Computing device and related product
CN111209230B (en) Data processing device, method and related products

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp