Movatterモバイル変換


[0]ホーム

URL:


CN111291871B - Computing device and related product - Google Patents

Computing device and related product
Download PDF

Info

Publication number
CN111291871B
CN111291871BCN201811507488.1ACN201811507488ACN111291871BCN 111291871 BCN111291871 BCN 111291871BCN 201811507488 ACN201811507488 ACN 201811507488ACN 111291871 BCN111291871 BCN 111291871B
Authority
CN
China
Prior art keywords
weights
unit
data
instruction
weight
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811507488.1A
Other languages
Chinese (zh)
Other versions
CN111291871A (en
Inventor
请求不公布姓名
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Cambricon Technologies Corp Ltd
Original Assignee
Cambricon Technologies Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cambricon Technologies Corp LtdfiledCriticalCambricon Technologies Corp Ltd
Priority to CN201811507488.1ApriorityCriticalpatent/CN111291871B/en
Priority to CN201811538782.9Aprioritypatent/CN111291884B/en
Publication of CN111291871ApublicationCriticalpatent/CN111291871A/en
Application grantedgrantedCritical
Publication of CN111291871BpublicationCriticalpatent/CN111291871B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

The application provides a computing device and related products, wherein the computing device comprises a load balancing unit, an operation unit and a controller unit; the controller unit is used for acquiring a pruning request aiming at first input data and indicating the load balancing unit to prune the first input data according to the pruning request; wherein the first input data includes first weight data; the load balancing unit is used for adjusting the first weight data into second weight data; and the controller unit is also used for executing the neural network calculation according to the second input data and the calculation instruction. The application solves the problem of unbalanced load caused by different operation amounts of each neuron due to the sparsity problem, and improves the operation speed.

Description

Translated fromChinese
一种计算装置及相关产品A computing device and related products

技术领域Technical Field

本申请涉及信息处理技术领域,具体涉及一种计算装置及相关产品。The present application relates to the field of information processing technology, and in particular to a computing device and related products.

背景技术Background Art

神经网络是一种模仿动物神经网络行为特征,进行分布式并行信息处理的算法数学模型,这种网络由大量的节点(或称神经元)之间星湖连接构成,通过调整内部大量节点之间相互连接的关系,利用输入神经元数据、权值产生输出数据模拟人脑的信息处理过程处理信息并生成模式识别之后的结果。A neural network is an algorithmic mathematical model that imitates the behavioral characteristics of animal neural networks and performs distributed parallel information processing. This network is composed of a large number of nodes (or neurons) that are connected to each other. It simulates the information processing process of the human brain by adjusting the interconnected relationships between a large number of internal nodes and using input neuron data and weights to generate output data to process information and generate results after pattern recognition.

目前,神经网络被广泛应用在计算机视觉的各个领域,如图像识别、物体检测、图像分割等。然而,在实际应用中,神经网络模型往往有着数量庞大的模型参数(例如,超大规模权值),在这种情况下,这意味着神经网络需要大量的计算资源和存储资源,大量的计算资源和存储资源的开销会降低神经网络的运算速度,对硬件的传输带宽以及运算器的要求也大大提高了,因此,如何在减少神经网络模型的参数的同时,降低神经网络的计算量变得十分重要。At present, neural networks are widely used in various fields of computer vision, such as image recognition, object detection, image segmentation, etc. However, in practical applications, neural network models often have a large number of model parameters (for example, ultra-large-scale weights). In this case, this means that the neural network requires a large amount of computing resources and storage resources. The overhead of a large amount of computing resources and storage resources will reduce the computing speed of the neural network, and the requirements for the transmission bandwidth and operator of the hardware will also be greatly increased. Therefore, how to reduce the computational complexity of the neural network while reducing the parameters of the neural network model becomes very important.

现有技术中,通过剪枝方法对神经网络模型的参数进行调整,以减少神经网络模型的参数以及降低神经网络的计算量。但是,在对神经网络模型进行剪枝的过程中,容易导致神经网络模型出现稀疏性问题。上述稀疏性问题会引入不规则性,将原有规则的拓扑结构变得不规则,导致每个输出神经元的运算量不同,进而出现负载不均衡的问题。In the prior art, the parameters of the neural network model are adjusted by pruning methods to reduce the parameters of the neural network model and reduce the amount of computation of the neural network. However, in the process of pruning the neural network model, it is easy to cause sparsity problems in the neural network model. The above sparsity problem will introduce irregularity, making the original regular topological structure irregular, resulting in different computational loads for each output neuron, and then causing load imbalance problems.

发明内容Summary of the invention

本申请实施例提供了一种计算装置及相关产品,解决了因稀疏性问题带来的每个神经元的运算量不同而出现的负载不均衡的问题,提高了运算速度。The embodiments of the present application provide a computing device and related products, which solve the problem of load imbalance caused by the different computing amounts of each neuron due to the sparsity problem and improve the computing speed.

第一方面,提供一种计算装置,所述计算装置用于执行机器学习模型机器学习计算,所述计算装置包括:负载均衡单元、运算单元以及控制器单元;In a first aspect, a computing device is provided, the computing device being used to perform machine learning calculations of a machine learning model, the computing device comprising: a load balancing unit, a computing unit, and a controller unit;

所述控制器单元,用于获取针对第一输入数据的剪枝请求,并根据所述剪枝请求指示所述负载均衡单元对所述第一输入数据进行剪枝;其中,所述第一输入数据包括第一权值数据;The controller unit is configured to obtain a pruning request for first input data, and instruct the load balancing unit to prune the first input data according to the pruning request; wherein the first input data includes first weight data;

所述负载均衡单元,用于将所述第一权值数据调整为第二权值数据;The load balancing unit is used to adjust the first weight data to second weight data;

所述控制器单元,还用于获取第二输入数据以及计算指令;所述第二输入数据包括所述第二权值数据以及输入神经元数据;The controller unit is further used to obtain second input data and calculation instructions; the second input data includes the second weight data and input neuron data;

所述控制器单元,还用于解析该计算指令得到多个运算指令,将所述多个运算指令以及所述第二输入数据发送给运算单元;The controller unit is further configured to parse the calculation instruction to obtain a plurality of operation instructions, and send the plurality of operation instructions and the second input data to the operation unit;

所述运算单元获取所述运算指令,并根据所述运算指令以及所述第二输入数据执行神经网络计算。The operation unit obtains the operation instruction and performs neural network calculation according to the operation instruction and the second input data.

通过本申请,可以通过负载均衡单元将第一权值数据剪枝得到第二权值数据,继而可以根据第二权值数据以及输入神经元数据执行神经网络计算,解决了因稀疏性问题带来的每个神经元的运算量不同而出现的负载不均衡的问题,可以提高运算速度。Through the present application, the first weight data can be pruned by a load balancing unit to obtain the second weight data, and then the neural network calculation can be performed according to the second weight data and the input neuron data, thereby solving the load imbalance problem caused by the different calculation amount of each neuron due to the sparsity problem, and improving the calculation speed.

第二方面,本申请实施例提供了一种机器学习运算装置,该机器学习运算装置包括一个或者多个第一方面所述的计算装置。该机器学习运算装置用于从其他处理装置中获取待运算数据和控制信息,并执行指定的机器学习运算,将执行结果通过I/O接口传递给其他处理装置;In a second aspect, an embodiment of the present application provides a machine learning computing device, which includes one or more computing devices described in the first aspect. The machine learning computing device is used to obtain data to be computed and control information from other processing devices, and perform specified machine learning operations, and transmit the execution results to other processing devices through an I/O interface;

当所述机器学习运算装置包含多个所述计算装置时,所述多个所述计算装置间可以通过特定的结构进行链接并传输数据;When the machine learning computing device includes a plurality of computing devices, the plurality of computing devices may be linked and transmit data through a specific structure;

其中,多个所述计算装置通过PCIE总线进行互联并传输数据,以支持更大规模的机器学习的运算;多个所述计算装置共享同一控制系统或拥有各自的控制系统;多个所述计算装置共享内存或者拥有各自的内存;多个所述计算装置的互联方式是任意互联拓扑。Among them, multiple computing devices are interconnected and transmit data through a PCIE bus to support larger-scale machine learning operations; multiple computing devices share the same control system or have their own control systems; multiple computing devices share memory or have their own memory; the interconnection method of multiple computing devices is any interconnection topology.

第三方面,本申请实施例提供了一种组合处理装置,该组合处理装置包括如第三方面所述的机器学习处理装置、通用互联接口,和其他处理装置。该机器学习运算装置与上述其他处理装置进行交互,共同完成用户指定的操作。该组合处理装置还可以包括存储装置,该存储装置分别与所述机器学习运算装置和所述其他处理装置连接,用于保存所述机器学习运算装置和所述其他处理装置的数据。In a third aspect, an embodiment of the present application provides a combined processing device, which includes a machine learning processing device, a universal interconnection interface, and other processing devices as described in the third aspect. The machine learning operation device interacts with the above-mentioned other processing devices to jointly complete the operation specified by the user. The combined processing device may also include a storage device, which is respectively connected to the machine learning operation device and the other processing device, and is used to save data of the machine learning operation device and the other processing device.

第四方面,本申请实施例提供了一种神经网络芯片,该神经网络芯片包括上述第一方面所述的计算装置、上述第二方面所述的机器学习运算装置或者上述第三方面所述的组合处理装置。In a fourth aspect, an embodiment of the present application provides a neural network chip, which includes the computing device described in the first aspect, the machine learning computing device described in the second aspect, or the combined processing device described in the third aspect.

第五方面,本申请实施例提供了一种神经网络芯片封装结构,该神经网络芯片封装结构包括上述第四方面所述的神经网络芯片。In a fifth aspect, an embodiment of the present application provides a neural network chip packaging structure, which includes the neural network chip described in the fourth aspect above.

第六方面,本申请实施例提供了一种板卡,该板卡包括上述第五方面所述的神经网络芯片封装结构。In a sixth aspect, an embodiment of the present application provides a board card, which includes the neural network chip packaging structure described in the fifth aspect above.

第七方面,本申请实施例提供了一种电子装置,该电子装置包括上述第六方面所述的神经网络芯片或者上述第六方面所述的板卡。In a seventh aspect, an embodiment of the present application provides an electronic device, which includes the neural network chip described in the sixth aspect or the board described in the sixth aspect.

第八方面,本申请实施例还提供一种执行机器学习模型的计算方法,所述计算方法应用于计算装置,计算装置用于执行机器学习计算;所述计算装置包括:负载均衡单元、运算单元以及控制器单元;所述方法包括:In an eighth aspect, an embodiment of the present application further provides a calculation method for executing a machine learning model, wherein the calculation method is applied to a computing device, and the computing device is used to perform machine learning calculations; the computing device includes: a load balancing unit, a computing unit, and a controller unit; the method includes:

所述控制器单元获取第一输入数据以及负载均衡指令;其中,所述第一输入数据包括第一权值数据;所述控制器单元解析该负载均衡指令得到多个操作指令,将所述多个操作指令以及所述第一权值数据发送给负载均衡单元;The controller unit obtains first input data and a load balancing instruction; wherein the first input data includes first weight data; the controller unit parses the load balancing instruction to obtain a plurality of operation instructions, and sends the plurality of operation instructions and the first weight data to the load balancing unit;

所述负载均衡单元根据所述多个操作指令将所述第一权值数据调整为第二权值数据;The load balancing unit adjusts the first weight data to second weight data according to the multiple operation instructions;

所述控制器单元获取第二输入数据以及计算指令;所述第二输入数据包括所述第二权值数据以及输入神经元数据;The controller unit acquires second input data and a calculation instruction; the second input data includes the second weight data and input neuron data;

所述控制器单元解析该计算指令得到多个运算指令,将所述多个运算指令以及所述第二输入数据发送给运算单元;The controller unit parses the calculation instruction to obtain a plurality of operation instructions, and sends the plurality of operation instructions and the second input data to the operation unit;

所述运算单元获取所述运算指令,并根据所述运算指令以及所述第二输入数据执行神经网络计算。The operation unit obtains the operation instruction and performs neural network calculation according to the operation instruction and the second input data.

在一些实施例中,所述电子设备包括数据处理装置、机器人、电脑、打印机、扫描仪、平板电脑、智能终端、手机、行车记录仪、导航仪、传感器、摄像头、服务器、云端服务器、相机、摄像机、投影仪、手表、耳机、移动存储、可穿戴设备、交通工具、家用电器、和/或医疗设备。In some embodiments, the electronic device includes a data processing device, a robot, a computer, a printer, a scanner, a tablet computer, a smart terminal, a mobile phone, a driving recorder, a navigator, a sensor, a camera, a server, a cloud server, a camera, a camcorder, a projector, a watch, a headset, a mobile storage, a wearable device, a vehicle, a household appliance, and/or a medical device.

在一些实施例中,所述交通工具包括飞机、轮船和/或车辆;所述家用电器包括电视、空调、微波炉、冰箱、电饭煲、加湿器、洗衣机、电灯、燃气灶、油烟机;所述医疗设备包括核磁共振仪、B超仪和/或心电图仪。In some embodiments, the means of transportation include airplanes, ships and/or vehicles; the household appliances include televisions, air conditioners, microwave ovens, refrigerators, rice cookers, humidifiers, washing machines, electric lights, gas stoves, and range hoods; the medical equipment includes magnetic resonance imaging, ultrasound equipment and/or electrocardiographs.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings required for use in the description of the embodiments will be briefly introduced below. Obviously, the drawings described below are some embodiments of the present application. For ordinary technicians in this field, other drawings can be obtained based on these drawings without paying any creative work.

图1是本申请实施例提供的一种计算装置的结构示意图;FIG1 is a schematic diagram of the structure of a computing device provided in an embodiment of the present application;

图2是本申请实施例提供的一种控制单元的结构示意图;FIG2 is a schematic diagram of the structure of a control unit provided in an embodiment of the present application;

图3是本申请实施例提供的一种神经网络运算方法的流程示意图;FIG3 is a flow chart of a neural network operation method provided in an embodiment of the present application;

图4是本申请实施例提供的一种神经网络平衡剪枝方法的流程示意图;FIG4 is a schematic diagram of a flow chart of a neural network balanced pruning method provided in an embodiment of the present application;

图5A是本申请实施例提供的一种神经网络架构的示意图;FIG5A is a schematic diagram of a neural network architecture provided in an embodiment of the present application;

图5B是本申请实施例提供的一种全连接层权值矩阵的示意图;FIG5B is a schematic diagram of a fully connected layer weight matrix provided in an embodiment of the present application;

图5C是本申请实施例提供的一种对全连接层权值矩阵进行连续分组的操作示意图;FIG5C is a schematic diagram of an operation of continuously grouping a fully connected layer weight matrix provided by an embodiment of the present application;

图5D是本申请实施例提供的一种对全连接层权值矩阵进行交叉分组的操作示意图;FIG5D is a schematic diagram of an operation of cross-grouping a fully connected layer weight matrix provided in an embodiment of the present application;

图5E是本申请实施例提供的一种卷积层中卷积核的结构示意图;FIG5E is a schematic diagram of the structure of a convolution kernel in a convolution layer provided in an embodiment of the present application;

图5F是本申请实施例提供的一种对卷积层的卷积核连续分组的示意图;FIG5F is a schematic diagram of continuously grouping convolution kernels of a convolution layer provided in an embodiment of the present application;

图5G是本申请实施例提供的一种对卷积层的卷积核进行交叉分组的示意图;FIG5G is a schematic diagram of cross-grouping convolution kernels of a convolution layer provided in an embodiment of the present application;

图5H是本申请另一实施例提供的一种全连接层权值矩阵的示意图;FIG5H is a schematic diagram of a fully connected layer weight matrix provided in another embodiment of the present application;

图5I是本申请实施例提供的一种对全连接层进行剪枝的操作示意图;FIG5I is a schematic diagram of an operation of pruning a fully connected layer provided in an embodiment of the present application;

图6是本申请实施例提供的另一种计算装置的结构示意图;FIG6 is a schematic diagram of the structure of another computing device provided in an embodiment of the present application;

图7是本申请实施例提供的主处理电路的结构示意图;7 is a schematic diagram of the structure of a main processing circuit provided in an embodiment of the present application;

图8是本申请实施例提供的另一种计算装置的结构示意图;FIG8 is a schematic diagram of the structure of another computing device provided in an embodiment of the present application;

图9是本申请实施例提供的树型模块的结构示意图;FIG9 is a schematic diagram of the structure of a tree module provided in an embodiment of the present application;

图10是本申请实施例提供的又一种计算装置的结构图;FIG10 is a structural diagram of another computing device provided in an embodiment of the present application;

图11是本申请实施例提供的还一种计算装置的结构图;FIG11 is a structural diagram of another computing device provided in an embodiment of the present application;

图12是本申请实施例提供的另一种计算装置的结构图;FIG12 is a structural diagram of another computing device provided in an embodiment of the present application;

图13是本申请实施例提供的一种组合处理装置的结构图;FIG13 is a structural diagram of a combined processing device provided in an embodiment of the present application;

图14是本申请实施例提供的另一种组合处理装置的结构图;FIG14 is a structural diagram of another combined processing device provided in an embodiment of the present application;

图15是本申请实施例提供的一种板卡的结构示意图;FIG15 is a schematic diagram of the structure of a board provided in an embodiment of the present application;

图16是本申请实施例提供的一种神经网络剪枝方法的流程示意图;FIG16 is a flow chart of a neural network pruning method provided in an embodiment of the present application;

图17A是本申请实施例提供的一种神经网络剪枝装置的结构示意图;FIG17A is a schematic diagram of the structure of a neural network pruning device provided in an embodiment of the present application;

图17B是本申请实施例提供的另一种神经网络剪枝装置的结构示意图;FIG17B is a schematic diagram of the structure of another neural network pruning device provided in an embodiment of the present application;

图18是本申请实施例提供的一种电子设备的结构示意图。FIG. 18 is a schematic diagram of the structure of an electronic device provided in an embodiment of the present application.

具体实施方式DETAILED DESCRIPTION

下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The following will be combined with the drawings in the embodiments of the present application to clearly and completely describe the technical solutions in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of this application.

本申请的说明书和权利要求书及所述附图中的术语“第一”、“第二”、“第三”和“第四”等是用于区别不同对象,而不是用于描述特定顺序。此外,术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。The terms "first", "second", "third" and "fourth" etc. in the specification and claims of the present application and the drawings are used to distinguish different objects, rather than to describe a specific order. In addition, the terms "including" and "having" and any variations thereof are intended to cover non-exclusive inclusions. For example, a process, method, system, product or device that includes a series of steps or units is not limited to the listed steps or units, but optionally includes steps or units that are not listed, or optionally includes other steps or units inherent to these processes, methods, products or devices.

在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。Reference to "embodiments" herein means that a particular feature, structure, or characteristic described in conjunction with the embodiments may be included in at least one embodiment of the present application. The appearance of the phrase in various locations in the specification does not necessarily refer to the same embodiment, nor is it an independent or alternative embodiment that is mutually exclusive with other embodiments. It is explicitly and implicitly understood by those skilled in the art that the embodiments described herein may be combined with other embodiments.

本申请提供了一种负载均衡单元,用于将第一权值数据剪枝得到第二权值数据,以解决因稀疏性问题带来的每个神经元的运算量不同而出现的负载不均衡的问题。在实际应用中,上述负载均衡单元可以用于神经网络计算中,具体地,用于执行神经网络计算的计算装置中,下面结合图1所示的计算装置对本发明进行介绍。The present application provides a load balancing unit for pruning first weight data to obtain second weight data, so as to solve the problem of load imbalance caused by different computational loads of each neuron due to sparsity problems. In practical applications, the load balancing unit can be used in neural network calculations, specifically, in a computing device for performing neural network calculations. The present invention is introduced below in conjunction with the computing device shown in FIG1 .

参阅图1,图1是本发明实施例提供的一种计算装置的结构示意图,该计算装置用于执行机器学习计算,该计算装置包括:控制器单元11、运算单元12以及负载均衡单元13,其中,控制器单元11分别与运算单元12以及负载均衡单元13连接;Referring to FIG. 1 , FIG. 1 is a schematic diagram of the structure of a computing device provided by an embodiment of the present invention, the computing device is used to perform machine learning calculations, and the computing device includes: a controller unit 11, a computing unit 12, and a load balancing unit 13, wherein the controller unit 11 is connected to the computing unit 12 and the load balancing unit 13 respectively;

其中,控制器单元11,用于获取针对第一输入数据的剪枝请求,并根据所述剪枝请求指示所述负载均衡单元对所述第一输入数据进行剪枝;其中,所述第一输入数据包括第一权值数据;在一种可选方案中,该剪枝请求可以通过数据输入输出单元进行触发的,该数据输入输出单元具体可以为一个或多个数据I/O接口或I/O引脚;The controller unit 11 is used to obtain a pruning request for first input data, and instruct the load balancing unit to prune the first input data according to the pruning request; wherein the first input data includes first weight data; in an optional solution, the pruning request can be triggered by a data input and output unit, and the data input and output unit can specifically be one or more data I/O interfaces or I/O pins;

所述负载均衡单元13,用于将所述第一权值数据调整为第二权值数据;The load balancing unit 13 is used to adjust the first weight data to second weight data;

具体实现中,所述负载均衡单元13包括分组单元131、计算阈值单元132以及剪枝单元133。其中,分组单元131,用于对所述第一权值数据进行分组,得到M组权值;其中,M为大于0的正整数;计算阈值单元132,用于根据预设的稀疏度确定所述M组权值中每组权值的阈值;剪枝单元133,用于根据确定好的所述阈值对M组权值中的每组权值进行剪枝,得到第二权值数据。在其中一个实施方式中,P为大于0且小于1的实数。In a specific implementation, the load balancing unit 13 includes a grouping unit 131, a threshold calculation unit 132, and a pruning unit 133. The grouping unit 131 is used to group the first weight data to obtain M groups of weights; wherein M is a positive integer greater than 0; the threshold calculation unit 132 is used to determine the threshold of each group of weights in the M groups of weights according to a preset sparsity; the pruning unit 133 is used to prune each group of weights in the M groups of weights according to the determined threshold to obtain second weight data. In one embodiment, P is a real number greater than 0 and less than 1.

所述控制器单元11,还用于获取第二输入数据以及计算指令;所述第二输入数据包括第二权值数据以及输入神经元数据;在一种可选方案中,具体的,获取第二输入数据以及计算指令方式可以通过数据输入输出单元得到,该数据输入输出单元具体可以为一个或多个数据I/O接口或I/O引脚。The controller unit 11 is also used to obtain second input data and calculation instructions; the second input data includes second weight data and input neuron data; in an optional scheme, specifically, the second input data and calculation instruction method can be obtained through a data input and output unit, and the data input and output unit can specifically be one or more data I/O interfaces or I/O pins.

所述控制器单元11,还用于解析该计算指令得到多个运算指令,将所述多个运算指令以及所述第二输入数据发送给运算单元;The controller unit 11 is further used to parse the calculation instruction to obtain multiple operation instructions, and send the multiple operation instructions and the second input data to the operation unit;

所述运算单元12,用于获取所述运算指令,并根据所述运算指令以及所述第二输入数据执行神经网络计算。The computing unit 12 is used to obtain the computing instruction and perform neural network calculation according to the computing instruction and the second input data.

在其中一个实现方式中,考虑到上述计算装置中设置有“负载均衡指令”,在这种情况下,控制器单元11,用于获取第一输入数据以及负载均衡指令;其中,所述第一输入数据包括第一权值数据;在一种可选方案中,具体的,获取第一输入数据以及负载均衡指令方式可以通过数据输入输出单元得到,该数据输入输出单元具体可以为一个或多个数据I/O接口或I/O引脚。In one implementation, considering that a "load balancing instruction" is provided in the above-mentioned computing device, in this case, the controller unit 11 is used to obtain first input data and a load balancing instruction; wherein, the first input data includes first weight data; in an optional scheme, specifically, the first input data and the load balancing instruction can be obtained through a data input and output unit, and the data input and output unit can specifically be one or more data I/O interfaces or I/O pins.

所述控制器单元11,还用于解析该负载均衡指令得到多个操作指令,将所述多个操作指令以及所述第一权值数据发送给负载均衡单元;The controller unit 11 is further configured to parse the load balancing instruction to obtain a plurality of operation instructions, and send the plurality of operation instructions and the first weight data to the load balancing unit;

所述负载均衡单元13,用于根据所述多个操作指令将所述第一权值数据调整为第二权值数据;The load balancing unit 13 is used to adjust the first weight data to second weight data according to the multiple operation instructions;

所述控制器单元11,还用于获取第二输入数据以及计算指令;所述第二输入数据包括所述第二权值数据以及输入神经元数据;在一种可选方案中,具体的,获取第二输入数据以及计算指令方式可以通过数据输入输出单元得到,该数据输入输出单元具体可以为一个或多个数据I/O接口或I/O引脚。The controller unit 11 is also used to obtain second input data and calculation instructions; the second input data includes the second weight data and input neuron data; in an optional scheme, specifically, the second input data and calculation instruction method can be obtained through a data input and output unit, and the data input and output unit can specifically be one or more data I/O interfaces or I/O pins.

所述控制器单元11,还用于解析该计算指令得到多个运算指令,将所述多个运算指令以及所述第二输入数据发送给运算单元;The controller unit 11 is further used to parse the calculation instruction to obtain multiple operation instructions, and send the multiple operation instructions and the second input data to the operation unit;

所述运算单元12,用于获取所述运算指令,并根据所述运算指令以及所述第二输入数据执行神经网络计算。The computing unit 12 is used to obtain the computing instruction and perform neural network calculation according to the computing instruction and the second input data.

具体实现中,所述运算单元12包括主处理电路101以及多个从处理电路102,所述主处理电路101,用于对所述第二输入数据执行前序处理以及与所述多个从处理电路之间传输数据以及运算指令;In a specific implementation, the operation unit 12 includes a master processing circuit 101 and a plurality of slave processing circuits 102, wherein the master processing circuit 101 is used to perform pre-order processing on the second input data and transmit data and operation instructions between the plurality of slave processing circuits;

多个从处理电路102,用于依据从所述主处理电路传输的数据以及运算指令并行执行中间运算得到多个中间结果,并将多个中间结果传输给所述主处理电路;A plurality of slave processing circuits 102, configured to perform intermediate operations in parallel according to data and operation instructions transmitted from the master processing circuit to obtain a plurality of intermediate results, and transmit the plurality of intermediate results to the master processing circuit;

主处理电路101,用于对所述多个中间结果执行后续处理得到所述计算指令的计算结果。The main processing circuit 101 is used to perform subsequent processing on the multiple intermediate results to obtain the calculation result of the calculation instruction.

可选的,上述第二输入数据具体可以包括:第二权值数据以及输入神经元数据。上述计算结果具体可以为:神经网络运算的结果即输出神经元数据。Optionally, the second input data may specifically include: second weight data and input neuron data. The calculation result may specifically be: the result of the neural network operation, that is, the output neuron data.

在其中一个实施方式中,上述计算装置还可以包括:该存储单元10和直接内存访问单元50,存储单元10可以包括:寄存器、缓存中的一个或任意组合,具体的,所述缓存,用于存储所述计算指令;所述寄存器,用于存储所述输入数据和标量;所述缓存为高速暂存缓存。直接内存访问单元50用于从存储单元10读取或存储数据。In one embodiment, the computing device may further include: the storage unit 10 and a direct memory access unit 50. The storage unit 10 may include: one or any combination of a register and a cache. Specifically, the cache is used to store the computing instructions; the register is used to store the input data and scalars; the cache is a high-speed temporary cache. The direct memory access unit 50 is used to read or store data from the storage unit 10.

本申请实施例中,如图2所示,该控制器单元11包括:指令缓存单元110、指令处理单元111、依赖关系处理单元112以及存储队列单元113;In the embodiment of the present application, as shown in FIG2 , the controller unit 11 includes: an instruction cache unit 110 , an instruction processing unit 111 , a dependency processing unit 112 , and a storage queue unit 113 ;

指令缓存单元110,用于存储所述人工神经网络运算关联的计算指令,在第零计算指令在被执行的过程中,同时将未被提交执行的其他指令缓存在所述指令缓存单元110中,当所述第零计算指令执行完之后,如果第一计算指令是指令缓存单元110中未被提交指令中最早的一条指令,则所述第一计算指令将被提交,一旦提交,该指令进行的操作对装置状态的改变将无法撤销;The instruction cache unit 110 is used to store the computing instructions associated with the artificial neural network operation. When the zeroth computing instruction is being executed, other instructions that have not been submitted for execution are cached in the instruction cache unit 110. After the zeroth computing instruction is executed, if the first computing instruction is the earliest instruction among the instructions that have not been submitted in the instruction cache unit 110, the first computing instruction will be submitted. Once submitted, the change of the device state caused by the operation performed by the instruction cannot be undone.

所述指令处理单元111,用于从所述指令缓存单元获取所述计算指令,并对所述计算指令解析得到多个操作指令;The instruction processing unit 111 is used to obtain the calculation instruction from the instruction cache unit, and parse the calculation instruction to obtain multiple operation instructions;

所述依赖关系处理单元112,用于在具有多个操作指令时,确定第一操作指令与所述第一操作指令之前的第零操作指令是否存在关联关系,如所述第一操作指令与所述第零操作指令存在关联关系,则将所述第一操作指令存储到存储队列单元113内,在所述第零操作指令执行完毕后,所述第一操作指令与所述第零操作指令的关联关系解除,则从所述存储队列单元113中提取所述第一操作指令传输至所述运算单元;The dependency processing unit 112 is used to determine whether a first operation instruction has an association relationship with a zeroth operation instruction before the first operation instruction when there are multiple operation instructions. If the first operation instruction has an association relationship with the zeroth operation instruction, the first operation instruction is stored in the storage queue unit 113. After the zeroth operation instruction is executed, the association relationship between the first operation instruction and the zeroth operation instruction is released, and the first operation instruction is extracted from the storage queue unit 113 and transmitted to the operation unit.

所述确定该第一操作指令与第一操作指令之前的第零操作指令是否存在关联关系包括:The determining whether the first operation instruction has an association relationship with the zeroth operation instruction before the first operation instruction comprises:

依据所述第一操作指令提取所述第一操作指令中所需数据(例如矩阵)的第一存储地址区间,依据所述第零操作指令提取所述第零操作指令中所需矩阵的第零存储地址区间,如所述第一存储地址区间与所述第零存储地址区间具有重叠的区域,则确定所述第一操作指令与所述第零操作指令具有关联关系,如所述第一存储地址区间与所述第零存储地址区间不具有重叠的区域,则确定所述第一操作指令与所述第零操作指令不具有关联关系。The first storage address interval of the data (e.g., a matrix) required in the first operation instruction is extracted according to the first operation instruction, and the zeroth storage address interval of the matrix required in the zeroth operation instruction is extracted according to the zeroth operation instruction. If the first storage address interval and the zeroth storage address interval have an overlapping area, it is determined that the first operation instruction and the zeroth operation instruction have an associated relationship. If the first storage address interval and the zeroth storage address interval do not have an overlapping area, it is determined that the first operation instruction and the zeroth operation instruction have no associated relationship.

存储队列单元113,用于存储指令队列,该指令队列包括:按该队列的前后顺序待执行的多个操作指令或计算指令。The storage queue unit 113 is used to store an instruction queue, which includes: a plurality of operation instructions or calculation instructions to be executed in the order of the queue.

本申请实施例中,如图2所示,所述指令处理单元111包括取指模块、译码模块以及指令队列,其中,所述取指模块,用于从所述指令缓存单元110中获取神经网络的计算指令;所述译码模块用于对所述取指模块获取的计算指令进行译码,得到神经网络的操作指令;所述指令队列用于对译码后得到的操作指令,按照待执行的先后顺序进行顺序存储。In an embodiment of the present application, as shown in FIG2 , the instruction processing unit 111 includes an instruction fetch module, a decoding module and an instruction queue, wherein the instruction fetch module is used to obtain computing instructions of the neural network from the instruction cache unit 110; the decoding module is used to decode the computing instructions obtained by the instruction fetch module to obtain operating instructions of the neural network; the instruction queue is used to sequentially store the operating instructions obtained after decoding in the order to be executed.

举例说明,在一个可选的技术方案中,主运算处理电路也可以包括一个控制器单元,该控制器单元可以包括主指令处理单元,具体用于将指令译码成微指令。当然在另一种可选方案中,从运算处理电路也可以包括另一个控制器单元,该另一个控制器单元包括从指令处理单元,具体用于接收并处理微指令。上述微指令可以为指令的下一级指令,该微指令可以通过对指令的拆分或解码后获得,能被进一步解码为各部件、各单元或各处理电路的控制信号。For example, in an optional technical solution, the main operation processing circuit may also include a controller unit, which may include a main instruction processing unit, specifically for decoding instructions into microinstructions. Of course, in another optional solution, the slave operation processing circuit may also include another controller unit, which includes a slave instruction processing unit, specifically for receiving and processing microinstructions. The above-mentioned microinstructions may be the next level instructions of the instructions, which may be obtained by splitting or decoding the instructions, and can be further decoded into control signals for each component, each unit or each processing circuit.

在一种可选方案中,该计算指令的结构可以如下表所示。In an optional solution, the structure of the calculation instruction may be as shown in the following table.

表1Table 1

操作码Operation Code寄存器或立即数Register or immediate value寄存器/立即数Register/Immediate......

上表中的省略号表示可以包括多个寄存器或立即数。The ellipsis in the above table indicates that multiple registers or immediate values can be included.

在另一种可选方案中,该计算指令可以包括:一个或多个操作域以及一个操作码。该计算指令可以包括神经网络运算指令,也可以包括上述所涉及的负载均衡指令。以神经网络运算指令为例,如表1所示,其中,寄存器号0、寄存器号1、寄存器号2、寄存器号3、寄存器号4可以为操作域。其中,每个寄存器号0、寄存器号1、寄存器号2、寄存器号3、寄存器号4可以是一个或者多个寄存器的号码。In another optional solution, the computing instruction may include: one or more operation domains and an operation code. The computing instruction may include a neural network operation instruction, and may also include the load balancing instruction involved above. Taking the neural network operation instruction as an example, as shown in Table 1, register number 0, register number 1, register number 2, register number 3, and register number 4 may be operation domains. Each register number 0, register number 1, register number 2, register number 3, and register number 4 may be the number of one or more registers.

表2Table 2

上述寄存器可以为片外存储器,当然在实际应用中,也可以为片内存储器,用于存储数据,该数据具体可以为n维数据,n为大于等于1的整数,例如,n=1时,为1维数据,即向量,如n=2时,为2维数据,即矩阵,如n=3或3以上时,为多维张量。The above-mentioned registers can be off-chip memories, and of course in practical applications, they can also be on-chip memories for storing data. The data can specifically be n-dimensional data, where n is an integer greater than or equal to 1. For example, when n=1, it is 1-dimensional data, i.e., a vector; when n=2, it is 2-dimensional data, i.e., a matrix; and when n=3 or more, it is a multi-dimensional tensor.

在本发明实施例中,所述计算装置执行所述神经网络运算的过程如图3所示,包括:In the embodiment of the present invention, the process of the computing device executing the neural network operation is shown in FIG3 , and includes:

步骤S1、控制器单元接收负载均衡指令,将负载均衡指令译码解析为多个操作指令,并将多个操作指令发送给负载均衡单元。Step S1: The controller unit receives a load balancing instruction, decodes the load balancing instruction into a plurality of operation instructions, and sends the plurality of operation instructions to the load balancing unit.

控制器单元从存储单元读取负载均衡指令之后,将负载均衡指令解析为操作指令,并将所述操作指令发送至负载均衡单元。具体的,控制器单元11中指令处理单元111的取指模块从指令缓存单元110中获取负载均衡指令,并将该指令传送至译码模块,所述译码模块对所述负载均衡指令进行译码,得到操作指令,并将所述操作指令根据预设指令规则拆分为操作码和各个不同的操作域,其中,操作码和操作域的组成与作用可参照前文,在此不再赘述。所述译码模块将译码后得到的操作指令传送至指令队列中进行顺序存储,在所述指令队列中,根据所述运操作指令的操作码和操作与获取该指令对应的待处理数据的数据地址,并将所述数据地址传送至依赖关系处理单元112中,依赖关系处理单元分析该指令与正在执行的指令是否存在关联关系,若存在,则将该操作指令存储到存储队列单元113中直至所述关联关系解除,若不存在关联关系,则将该操作指令发送至负载均衡单元中执行对应的操作。After the controller unit reads the load balancing instruction from the storage unit, it parses the load balancing instruction into an operation instruction and sends the operation instruction to the load balancing unit. Specifically, the instruction fetch module of the instruction processing unit 111 in the controller unit 11 obtains the load balancing instruction from the instruction cache unit 110, and transmits the instruction to the decoding module, the decoding module decodes the load balancing instruction to obtain the operation instruction, and splits the operation instruction into an operation code and various operation domains according to the preset instruction rule, wherein the composition and function of the operation code and the operation domain can be referred to the foregoing, and will not be repeated here. The decoding module transmits the operation instruction obtained after decoding to the instruction queue for sequential storage, in which the data address of the to-be-processed data corresponding to the instruction is obtained according to the operation code and operation of the operation instruction in the instruction queue, and the data address is transmitted to the dependency processing unit 112, and the dependency processing unit analyzes whether there is an association relationship between the instruction and the instruction being executed, if so, the operation instruction is stored in the storage queue unit 113 until the association relationship is released, if there is no association relationship, the operation instruction is sent to the load balancing unit to perform the corresponding operation.

S2、负载均衡单元接收控制单元发送的操作指令,并根据从存储单元中读取的第一权值数据进行负载均衡处理,以得到第二权值数据。S2. The load balancing unit receives the operation instruction sent by the control unit, and performs load balancing processing according to the first weight data read from the storage unit to obtain second weight data.

下面结合图4所示的本发明实施例提供的平衡剪枝方法的流程示意图,具体阐述本发明实施例是如何实现针对第一权值数据的平衡剪枝,以得到第二权值数据的,可以包括但不限于如下步骤:In conjunction with the flowchart of the balanced pruning method provided by the embodiment of the present invention shown in FIG. 4 , the following specifically describes how the embodiment of the present invention implements balanced pruning of the first weight data to obtain the second weight data, which may include but is not limited to the following steps:

步骤S21、对第一权值数据进行分组,得到M组权值;其中,M为大于0的正整数。Step S21, grouping the first weight data to obtain M groups of weights; wherein M is a positive integer greater than 0.

具体实现中,第一权值数据可以为任意实数。In a specific implementation, the first weight data may be any real number.

在本发明实施例中,对第一权值数据进行分组可以包括对第一权值数据进行连续分组,也可以包括对第一权值数据进行交叉分组。In the embodiment of the present invention, grouping the first weight data may include continuously grouping the first weight data, or may include cross-grouping the first weight data.

以神经网络的全连接层为例,全连接层是指对n-1层和n层而言,n-1层的任意一个节点,都和n层的所有节点有连接。具体地,参见图5A,是本发明实施例提供的一种神经网络的一维全连接层的结构示意图,如图5A所示,该神经网络包括输入层、隐含层以及输出层,其中,输入层到隐含层之间的这一全连接层的二维参数矩阵为(3,4),该二维参数矩阵(3,4)表示在输入层到隐含层之间的全连接层结构中,输入神经元的个数为3,输出神经元的个数为4,权值数量为12。具体实现中,这12个权值可以表示为4行3列的权值矩阵,其权值矩阵的表现形式可以如图5B所示。Taking the fully connected layer of a neural network as an example, the fully connected layer means that for the n-1 layer and the n layer, any node in the n-1 layer is connected to all the nodes in the n layer. Specifically, referring to FIG5A, it is a schematic diagram of the structure of a one-dimensional fully connected layer of a neural network provided in an embodiment of the present invention. As shown in FIG5A, the neural network includes an input layer, a hidden layer, and an output layer, wherein the two-dimensional parameter matrix of the fully connected layer between the input layer and the hidden layer is (3,4), and the two-dimensional parameter matrix (3,4) indicates that in the fully connected layer structure between the input layer and the hidden layer, the number of input neurons is 3, the number of output neurons is 4, and the number of weights is 12. In a specific implementation, these 12 weights can be represented as a weight matrix of 4 rows and 3 columns, and the expression of the weight matrix can be shown in FIG5B.

在实际应用中,将全连接层的权值分成M组时,M为大于1且小于Nout的正整数。In practical applications, when the weights of the fully connected layer are divided into M groups, M is a positive integer greater than 1 and less than Nout .

在其中一个实施方式中,对权值矩阵进行连续分组时,权值矩阵的连续行为同一组。第i组权值包括权值矩阵的第行,第行,第行,……,第行。其中,i为大于0且小于M的正整数,Nout为输出神经元的个数。In one embodiment, when the weight matrix is grouped continuously, the continuous The i-th group of weights includes the i-th Row, No. Row, No. , , , Wherein, i is a positive integer greater than 0 and less than M, and Nout is the number of output neurons.

在其中另一个实施方式中,对权值矩阵进行交叉分组时,权值矩阵的交替的行为同一组。第i组权值包括权值矩阵的第i行,第i+M行,第i+M*2行,……,第行。其中,i为大于0且小于M的正整数,Nout为输出神经元的个数。In another embodiment, when the weight matrix is cross-grouped, the alternating The i-th group of weights includes the i-th row, i+M-th row, i+M*2-th row, ..., i ... Wherein, i is a positive integer greater than 0 and less than M, and Nout is the number of output neurons.

如前所述,当权值矩阵的表现形式可以如图5B时,假设将上述12个权值分成4组,此时,每组中的权值数量为3个。当分组模式为连续分组时,针对上述12个权值的连续分组情况,可以参见图5C。如图5C所示,第一组权值为权值矩阵的第1行;第二组权值为权值矩阵的第2行;第3组权值为权值矩阵的第3行;第4组权值为权值矩阵的第4行。As mentioned above, when the weight matrix can be expressed as FIG5B, it is assumed that the above 12 weights are divided into 4 groups. At this time, the number of weights in each group is 3. When the grouping mode is continuous grouping, FIG5C can be referred to for the continuous grouping of the above 12 weights. As shown in FIG5C, the first group of weights is the first row of the weight matrix; the second group of weights is the second row of the weight matrix; the third group of weights is the third row of the weight matrix; and the fourth group of weights is the fourth row of the weight matrix.

同样地,在实际应用中,假设将上述12个权值分成2组,此时,每组中的权值数量为6个。当分组模式为交叉分组时,针对上述12个权值的交叉分组情况,可以参见图5D,如图5D所示,第一组权值为权值矩阵的第1行以及权值矩阵的第3行;第二组权值为权值矩阵的第2行以及权值矩阵的第4行。Similarly, in practical applications, it is assumed that the above 12 weights are divided into 2 groups, and the number of weights in each group is 6. When the grouping mode is cross grouping, for the cross grouping of the above 12 weights, see FIG5D . As shown in FIG5D , the first group of weights is the first row of the weight matrix and the third row of the weight matrix; the second group of weights is the second row of the weight matrix and the fourth row of the weight matrix.

以神经网络的卷积层为例,如图5E所示,卷积层可以认为是一个四维矩阵(Nfin,Nfout,Kx,Ky),其中,Nfin为输入特征图像的数量,Nfout为输出特征图像的数量,(Kx,Ky)为卷积层中卷积核的大小。Taking the convolution layer of a neural network as an example, as shown in Figure 5E, the convolution layer can be considered as a four-dimensional matrix (Nfin ,Nfout ,Kx ,Ky ), whereNfin is the number of input feature images,Nfout is the number of output feature images, and (Kx ,Ky ) is the size of the convolution kernel in the convolution layer.

在实际应用中,将卷积层的权值分成M组时,M为大于1且小于Nfout的正整数。In practical applications, when the weights of the convolutional layer are divided into M groups, M is a positive integer greater than 1 and less than Nfout .

在其中一个实施方式中,对权值矩阵进行连续分组时,权值矩阵的连续个卷积核为同一组。第i组权值包括权值矩阵的第个,第个,第个,……,第个卷积核。其中,i为大于0且小于M的正整数。In one embodiment, when the weight matrix is grouped continuously, the continuous The convolution kernels are in the same group. The i-th group of weights includes the , , , ..., convolution kernels. Where i is a positive integer greater than 0 and less than M.

在其中另一个实施方式中,对权值矩阵进行交叉分组时,权值矩阵中交替的个卷积核为同一组。第i组权值包括权值矩阵的第i个,第i+M行个,第i+M*2个,……,第个卷积核。其中,i为大于0且小于M的正整数。In another embodiment, when the weight matrix is cross-grouped, the alternating convolution kernels are in the same group. The i-th group of weights includes the i-th, i+M-th, i+M*2-th, ..., i ... convolution kernels. Where i is a positive integer greater than 0 and less than M.

如前所述,当权值矩阵中的卷积核的表现形式如5E所示时,卷积核的数量为4个,假设将上述4个卷积核分成2组,此时,每组中的卷积核的数量为2个。当分组模式为连续分组时,针对上述4个卷积核的连续分布情况,可以参见图5F,如图5F所示,第一组权值为权值矩阵中的第1个卷积核以及第2个卷积核;第二组权值为权值矩阵中的第3个卷积核以及第4个卷积核。As mentioned above, when the convolution kernel in the weight matrix is expressed as shown in 5E, the number of convolution kernels is 4. Assuming that the above 4 convolution kernels are divided into 2 groups, the number of convolution kernels in each group is 2. When the grouping mode is continuous grouping, the continuous distribution of the above 4 convolution kernels can be seen in Figure 5F. As shown in Figure 5F, the first group of weights is the first convolution kernel and the second convolution kernel in the weight matrix; the second group of weights is the third convolution kernel and the fourth convolution kernel in the weight matrix.

同样地,在实际应用中,假设将上述4个卷积核分成2组,此时,每组中的卷积核数量为2个。当分出模式为交叉分组时,针对上述4个卷积核的交叉分组情况,可以参见图5G,如图5G所示,第一组权值为权值矩阵中的第1个卷积核以及第3个卷积核;第二组权值为权值矩阵中的第2个卷积核以及第4个卷积核。Similarly, in practical applications, it is assumed that the above 4 convolution kernels are divided into 2 groups. At this time, the number of convolution kernels in each group is 2. When the separation mode is cross grouping, the cross grouping of the above 4 convolution kernels can be seen in FIG5G . As shown in FIG5G , the first group of weights is the first convolution kernel and the third convolution kernel in the weight matrix; the second group of weights is the second convolution kernel and the fourth convolution kernel in the weight matrix.

以神经网络的长短时记忆LSTM层(LSTM,Long Short-term Memory)为例,LSTM层的权值由多个全连接层权值组成。假设LSTM层的权值由t个全连接层权值组成,t为大于0的正整数。例如,第j个全连接层权值分别为(Nin_j,Nout_j),其中,Nin_j表示第j个全连接层输入神经元个数,Nout_j表示第j个全连接层输出神经元个数,第j个全连接层的权值数量为Nin_j*Nout_jTaking the LSTM layer (Long Short-term Memory) of the neural network as an example, the weights of the LSTM layer are composed of multiple fully connected layer weights. Assume that the weights of the LSTM layer are composed of t fully connected layer weights, and t is a positive integer greater than 0. For example, the weights of the j-th fully connected layer are (Nin_j ,Nout_j ), where Nin_j represents the number of input neurons of the j-th fully connected layer, Nout_j represents the number of output neurons of the j-th fully connected layer, and the number of weights of the j-th fully connected layer is Nin_j *Nout_j .

在实际应用中,对上述t个全连接层中每一个全连接层进行分组。以第j个全连接层为例,将第j个全连接层的权值分成M组,那么,第j个全连接层中每一组权值数量为:其中,M为大于1且小于Nout_j的正整数。In practical applications, each of the t fully connected layers mentioned above is grouped. Taking the jth fully connected layer as an example, the weights of the jth fully connected layer are divided into M groups, then the number of weights in each group of the jth fully connected layer is: Wherein, M is a positive integer greater than 1 and less than Nout_j .

在其中一个实施方式中,对第j个全连接层中的权值矩阵进行连续分组时,权值矩阵的连续行为同一组。第i组权值包括权值矩阵的第行,第行,第行,……,第行。其中,i为大于0且小于M的正整数,Nout_j为第j个全连接层的输出神经元的个数。In one embodiment, when the weight matrices in the j-th fully connected layer are grouped continuously, the continuous weight matrices The i-th group of weights includes the i-th Row, No. Row, No. , , , Where i is a positive integer greater than 0 and less than M, and Nout_j is the number of output neurons in the jth fully connected layer.

在其中另一个实施方式中,对第j个全连接层中的权值矩阵进行连续分组时,第j个全连接层的权值矩阵中交替的行为同一组。第i组权值包括权值矩阵的第i行,第i+M行,第i+M*2行,……,第行。其中,i为大于0且小于M的正整数,Nout_j为第j个全连接层的输出神经元的个数。In another embodiment, when the weight matrix in the j-th fully connected layer is continuously grouped, the alternating The i-th group of weights includes the i-th row, i+M-th row, i+M*2-th row, ..., i ... Where i is a positive integer greater than 0 and less than M, and Nout_j is the number of output neurons in the jth fully connected layer.

以图5A所示的神经网络架构为例,该神经网络包括输入层、隐含层以及输出层,其中,输入层到隐含层之间为第1个全连接层,隐含层到输出层之间为第2个全连接层。针对输入层到隐含层之间的这一全连接层结构的具体阐述请参考前述描述,此处不多加赘述。由图5A可知,隐含层到输出层之间的这一全连接层的二维参数矩阵为(4,2),该二维参数矩阵(4,2)表示在隐含层到输出层之间的全连接层结构中,输入神经元的个数为4,输出神经元的个数为2,权值数量为8。具体实现中,这8个权值可以表示为2行4列权值矩阵,其权值矩阵的表现形式可以如图5H所示。此外,针对该权值矩阵的分组形式(例如,连续分组或交叉分组)可以参考前述描述,此处不多加赘述。Taking the neural network architecture shown in FIG5A as an example, the neural network includes an input layer, a hidden layer, and an output layer, wherein the first fully connected layer is between the input layer and the hidden layer, and the second fully connected layer is between the hidden layer and the output layer. For the specific elaboration of this fully connected layer structure between the input layer and the hidden layer, please refer to the above description, which will not be repeated here. As can be seen from FIG5A, the two-dimensional parameter matrix of this fully connected layer between the hidden layer and the output layer is (4,2), and the two-dimensional parameter matrix (4,2) indicates that in the fully connected layer structure between the hidden layer and the output layer, the number of input neurons is 4, the number of output neurons is 2, and the number of weights is 8. In a specific implementation, these 8 weights can be expressed as a 2-row 4-column weight matrix, and the expression form of the weight matrix can be shown in FIG5H. In addition, the grouping form of the weight matrix (for example, continuous grouping or cross grouping) can refer to the above description, which will not be repeated here.

步骤S22、根据预设的稀疏度P确定M组权值中每组权值的阈值。Step S22: Determine the threshold of each group of weights in the M groups of weights according to the preset sparsity P.

具体实现中,稀疏度P是指稀疏系数向量中非零元素的占比。其中,稀疏度P为大于0且小于1的实数。例如,稀疏度可以为0.7。In a specific implementation, the sparsity P refers to the proportion of non-zero elements in the sparse coefficient vector. The sparsity P is a real number greater than 0 and less than 1. For example, the sparsity may be 0.7.

以神经网络的全连接层为例,根据预设的稀疏度P确定M组权值中每组权值的阈值,包括:Taking the fully connected layer of a neural network as an example, the threshold of each weight group in the M weight groups is determined according to the preset sparsity P, including:

确定所述M组权值中的第i组权值中第Q个权值为每组权值的阈值,其中,Nin为输入神经元的个数,Nout为输出神经元的个数,所述第i组中的权值按绝对值从小到大顺序排列,所述i为小于等于M的正整数。Determine the Qth weight in the i-th group of weights in the M groups of weights as the threshold value for each group of weights, wherein, Nin is the number of input neurons, Nout is the number of output neurons, the weights in the i-th group are arranged in ascending order of absolute value, and i is a positive integer less than or equal to M.

在实际应用中,考虑到采用上述关于Q的计算公式计算得到的Q可以包括整数,也可以包括非整数。在其中一个可选的实施方式中,当Q为非整数时,可以对Q进行取整运算。具体实现中,这里所涉及的取整运算可以包括向上取整运算,也可以包括向下取整运算。In practical applications, it is considered that Q calculated using the above calculation formula for Q may include an integer or a non-integer. In one optional implementation, when Q is a non-integer, a rounding operation may be performed on Q. In a specific implementation, the rounding operation involved here may include an upward rounding operation or a downward rounding operation.

例如,以图5A所示的神经网络架构中输入层到隐含层的这一全连接层为例,输入神经元的个数为3,输出神经元的个数为4,权值数量为12,采用连续分组的分组方式,将该权值矩阵分成4组。根据预设的稀疏度P确定第1组权值中第2个权值为每组权值的阈值。其中,第1组权值中的第2个权值为0.5,也即第1组权值的阈值为0.5;第2组权值中的第2个权值为0.4,也即第2组权值的阈值为0.4;第3组权值中的第2个权值为0.65,也即第3组权值的阈值为0.65;第4组权值中的第2个权值为0.45,也即第4组权值的阈值为0.45。For example, taking the fully connected layer from the input layer to the hidden layer in the neural network architecture shown in FIG5A as an example, the number of input neurons is 3, the number of output neurons is 4, the number of weights is 12, and the weight matrix is divided into 4 groups by adopting the grouping method of continuous grouping. The second weight in the first group of weights is determined as the threshold of each group of weights according to the preset sparsity P. Among them, the second weight in the first group of weights is 0.5, that is, the threshold of the first group of weights is 0.5; the second weight in the second group of weights is 0.4, that is, the threshold of the second group of weights is 0.4; the second weight in the third group of weights is 0.65, that is, the threshold of the third group of weights is 0.65; the second weight in the fourth group of weights is 0.45, that is, the threshold of the fourth group of weights is 0.45.

以神经网络的卷积层为例,根据预设的稀疏度P确定M组权值中每组权值的阈值,包括:Taking the convolution layer of a neural network as an example, the threshold of each weight group in M weight groups is determined according to the preset sparsity P, including:

确定所述M组权值中的第i组权值中第R个权值为每组权值的阈值,其中,Nfin为输入特征图像的数量,Nfout为输出特征图像的数量,Kx以及Ky为卷积层神经网络中卷积核的大小,所述第i组中的权值按绝对值从小到大顺序排列,所述i为小于等于M的正整数。Determine the Rth weight in the i-th group of weights in the M groups of weights as the threshold value for each group of weights, wherein, Nfin is the number of input feature images, Nfout is the number of output feature images, Kx andKy are the sizes of convolution kernels in the convolutional layer neural network, the weights in the i-th group are arranged in ascending order of absolute value, and i is a positive integer less than or equal to M.

作为一种可选的实现方式,R为经过取整运算得到的。同样地,取整运算可以包括向上取整运算或向下取整运算。As an optional implementation, R is obtained by a rounding operation. Similarly, the rounding operation may include an upward rounding operation or a downward rounding operation.

例如,以图5G所示的卷积层,采用交叉分组的方式,将该权值矩阵分成2组,第一组权值为权值矩阵中的第1个卷积核以及第3个卷积核;第二组权值为权值矩阵中的第2个卷积核以及第4个卷积核。根据预设的稀疏度确定第一组权值中的第6个权值为每组权值的阈值。其中,第一组权值中的第6个权值为0.7,也即第一组权值的阈值为0.7;第二组权值中的第6个权值为0.45,也即第二组权值的阈值为0.45。For example, with the convolution layer shown in FIG5G, the weight matrix is divided into two groups by cross grouping. The first group of weights is the first convolution kernel and the third convolution kernel in the weight matrix; the second group of weights is the second convolution kernel and the fourth convolution kernel in the weight matrix. The sixth weight in the first group of weights is determined as the threshold of each group of weights according to the preset sparsity. Among them, the sixth weight in the first group of weights is 0.7, that is, the threshold of the first group of weights is 0.7; the sixth weight in the second group of weights is 0.45, that is, the threshold of the second group of weights is 0.45.

以神经网络的LSTM层为例,根据预设的稀疏度P确定M组权值中每组权值的阈值,包括:Taking the LSTM layer of the neural network as an example, the threshold of each weight group in the M weight groups is determined according to the preset sparsity P, including:

确定第j个全连接层的第i组权值中第S个权值为每组权值的阈值,其中,Nin_j为第j个全连接层输入神经元的个数,Nout_j为第j个全连接层输出神经元的个数,所述第i组中的权值按绝对值从小到大顺序排列,所述i为小于等于M的正整数,所述j为小于等于N的正整数。作为一种可选的实现方式S为经过取整运算得到的。同样地,取整运算可以包括向上取整运算或向下取整运算。Determine the Sth weight in the i-th group of weights of the j-th fully connected layer as the threshold for each group of weights, where Nin_j is the number of input neurons of the jth fully connected layer, Nout_j is the number of output neurons of the jth fully connected layer, the weights in the i-th group are arranged in ascending order according to the absolute value, i is a positive integer less than or equal to M, and j is a positive integer less than or equal to N. As an optional implementation, S is obtained through a rounding operation. Similarly, the rounding operation may include a rounding up operation or a rounding down operation.

以图5A所示的神经网络架构为例,在该神经网络架构中包括2个全连接层,分别为输入层到隐含层之间的网络结构以及隐含层到输出层之间的网络结构。如前所述,在第1个全连接层中,当分组模式为连续分组时,将第1个全连接层的权值矩阵分为2组,确定第一组权值的阈值为0.5,确定第二组权值的阈值为0.4。在第2个全连接层中,当分组模式为连续分组时,将第2个全连接层的权值矩阵2组,其中,第1组权值包括权值矩阵的第1行,第2组权值包括权值矩阵的第2行,确定第一组权值的阈值为0.65,确定第二组权值的阈值为0.5。Taking the neural network architecture shown in FIG5A as an example, the neural network architecture includes two fully connected layers, which are the network structure between the input layer and the hidden layer and the network structure between the hidden layer and the output layer. As mentioned above, in the first fully connected layer, when the grouping mode is continuous grouping, the weight matrix of the first fully connected layer is divided into two groups, and the threshold of the first group of weights is determined to be 0.5, and the threshold of the second group of weights is determined to be 0.4. In the second fully connected layer, when the grouping mode is continuous grouping, the weight matrix of the second fully connected layer is divided into two groups, wherein the first group of weights includes the first row of the weight matrix, and the second group of weights includes the second row of the weight matrix, the threshold of the first group of weights is determined to be 0.65, and the threshold of the second group of weights is determined to be 0.5.

步骤S23、根据确定好的阈值对M组权值中的每组权值进行剪枝,得到第二权值数据。Step S23: Prune each group of weights in the M groups of weights according to the determined threshold value to obtain second weight data.

通俗的说,对M组权值中的每组权值进行剪枝是指去除一些非必要的权值。In layman's terms, pruning each of the M groups of weights means removing some unnecessary weights.

具体实现中,根据确定好的阈值对M组权值中的每组权值进行剪枝,得到第二权值数据,包括:In a specific implementation, each group of weights in the M groups of weights is pruned according to a determined threshold value to obtain second weight data, including:

对第i组(i=0,1,……,M)权值中小于所述确定好的阈值的权值进行剪枝,得到所述第二权值数据。The weights in the i-th group (i=0, 1, ..., M) of weights that are smaller than the determined threshold are pruned to obtain the second weight data.

如前所述,以神经网络的全连接层为例,以图5C所示,当分组模式为连续分组时,将12个权值分为4组,其中,第一组权值为权值矩阵的第1行;第二组权值为权值矩阵的第2行;第3组权值为权值矩阵的第3行;第4组权值为权值矩阵的第4行。根据预设的稀疏度P确定第1组权值中阈值为0.5,第2组权值中的阈值为0.4,第3组权值中的阈值为0.65,第4组权值中的阈值为0.45,,继而,根据上述确定好的阈值对每组权值进行剪枝。具体来说,在第一组权值中,将小于0.5的权值去除。在第2组权值中,将小于0.4的权值去除。在第3组权值中,将小于0.65的权值去除。在第4组权值中,将小于0.45的权值去除。具体地,对图5C所示的4组权值进行剪枝后的示意图可以如5I所示,如图5I所示,在剪枝之后,可以确保这4组权值的稀疏度相同。As mentioned above, taking the fully connected layer of the neural network as an example, as shown in FIG5C, when the grouping mode is continuous grouping, the 12 weights are divided into 4 groups, wherein the first group of weights is the first row of the weight matrix; the second group of weights is the second row of the weight matrix; the third group of weights is the third row of the weight matrix; and the fourth group of weights is the fourth row of the weight matrix. According to the preset sparsity P, the threshold value in the first group of weights is determined to be 0.5, the threshold value in the second group of weights is 0.4, the threshold value in the third group of weights is 0.65, and the threshold value in the fourth group of weights is 0.45, and then, each group of weights is pruned according to the above-determined threshold value. Specifically, in the first group of weights, weights less than 0.5 are removed. In the second group of weights, weights less than 0.4 are removed. In the third group of weights, weights less than 0.65 are removed. In the fourth group of weights, weights less than 0.45 are removed. Specifically, a schematic diagram after pruning the four groups of weights shown in FIG. 5C may be shown in FIG. 5I . As shown in FIG. 5I , after pruning, it can be ensured that the sparsity of the four groups of weights is the same.

在其中一个实施方式中,当M=Nout,也即分组数量与输出神经元的个数相等时,可以确保每个神经元的计算量相同,从而解决负载不均衡的问题。In one implementation, when M=Nout , that is, the number of groups is equal to the number of output neurons, it can be ensured that the computation amount of each neuron is the same, thereby solving the problem of load imbalance.

如前所述,以神经网络的卷积层为例,以图5G所示的卷积层,当分组模式为交叉分组时,将权值矩阵分成2组,其中,第一组权值为权值矩阵中的第1个卷积核以及第3个卷积核;第二组权值为权值矩阵中的第2个卷积核以及第4个卷积核。根据预设的稀疏度P确定的第一组权值中的阈值为0.7,第二组权值中的阈值为0.45,继而,根据上述确定好的阈值对每组权值进行剪枝。具体来说,在第一组权值中,将小于0.7的权值去除。在第2组权值中,将小于0.45的权值去除。在剪枝之后,可以确保这2组权值的稀疏度相同。As mentioned above, taking the convolution layer of the neural network as an example, taking the convolution layer shown in Figure 5G, when the grouping mode is cross grouping, the weight matrix is divided into 2 groups, wherein the first group of weights is the 1st convolution kernel and the 3rd convolution kernel in the weight matrix; the second group of weights is the 2nd convolution kernel and the 4th convolution kernel in the weight matrix. The threshold in the first group of weights determined according to the preset sparsity P is 0.7, and the threshold in the second group of weights is 0.45. Then, each group of weights is pruned according to the above-determined thresholds. Specifically, in the first group of weights, weights less than 0.7 are removed. In the second group of weights, weights less than 0.45 are removed. After pruning, it can be ensured that the sparsity of the two groups of weights is the same.

以神经网络的LSTM层为例,根据确定好的阈值对M组权值中的每组权值进行剪枝,得到第二权值数据的这一方案的具体实现可以参考前述针对全连接层剪枝的文字描述,此处不多加赘述。Taking the LSTM layer of the neural network as an example, each group of weights in the M groups of weights is pruned according to a determined threshold value to obtain the second weight data. The specific implementation of this solution can refer to the aforementioned text description of pruning the fully connected layer, which will not be elaborated here.

通过本发明实施例,控制器单元在获取到负载均衡指令后,将其进行解析可以得到多个操作指令,之后,将这多个操作指令以及第一权值数据发送给负载均衡单元,继而负载均衡单元通过对第一权值数据进行分组,计算每组权值中的阈值,根据阈值进行剪枝等操作,可以将第一权值数据调整得到第二权值数据,可以确保每组权值的稀疏度相同,解决了因稀疏性问题带来的每个神经元的运算量不同而出现的负载不均衡的问题,提高了运算速度。Through the embodiments of the present invention, after obtaining the load balancing instruction, the controller unit parses it to obtain multiple operation instructions, and then sends the multiple operation instructions and the first weight data to the load balancing unit. The load balancing unit then groups the first weight data, calculates the threshold in each group of weights, and performs pruning and other operations according to the threshold. The first weight data can be adjusted to obtain the second weight data, which can ensure that the sparsity of each group of weights is the same, solves the problem of load imbalance caused by the different amount of calculation of each neuron caused by the sparsity problem, and improves the calculation speed.

S3、控制器单元获取第二输入数据以及计算指令,其中,第二输入数据包括第二权值数据以及输入神经元数据。S3. The controller unit obtains second input data and calculation instructions, wherein the second input data includes second weight data and input neuron data.

S4、控制器单元将计算指令解析为运算指令,将运算指令以及第二输入数据发送给运算单元。S4. The controller unit parses the calculation instruction into an operation instruction, and sends the operation instruction and the second input data to the operation unit.

具体实现中,针对控制器单元获取计算指令,并将计算指令进行解析,以得到多个运算指令的实现方式,请参考前述控制器单元获取负载均衡指令的文字描述,此处不多加赘述。In a specific implementation, a controller unit obtains calculation instructions and parses the calculation instructions to obtain implementation methods of multiple operation instructions. Please refer to the text description of the controller unit obtaining the load balancing instruction mentioned above, and no further details will be given here.

S5、运算单元接收控制器单元发送的运算指令,并根据运算指令以及第二输入数据执行神经网络计算。S5. The operation unit receives the operation instruction sent by the controller unit, and performs neural network calculation according to the operation instruction and the second input data.

在实际应用中,这里所涉及的神经网络计算可以包括人工神经网络运算,也可以包括卷积神经网络运算等等。In practical applications, the neural network calculations involved here may include artificial neural network operations, convolutional neural network operations, and so on.

以人工神经网络运算为例,对于人工神经网络运算来说,如果该人工神经网络运算具有多层运算,多层运算的输入神经元和输出神经元并非是指整个神经网络的输入层中神经元和输出层中神经元,而是对于网络中任意相邻的两层,处于网络正向运算下层中的神经元即为输入神经元,处于网络正向运算上层中的神经元即为输出神经元。以卷积神经网络为例,设一个卷积神经网络有L层,K=1,2,...,L-1,对于第K层和第K+1层来说,我们将第K层称为输入层,其中的神经元为所述输入神经元,第K+1层称为输出层,其中的神经元为所述输出神经元。即除最顶层外,每一层都可以作为输入层,其下一层为对应的输出层。Taking artificial neural network operation as an example, for artificial neural network operation, if the artificial neural network operation has multi-layer operation, the input neurons and output neurons of the multi-layer operation do not refer to the neurons in the input layer and the neurons in the output layer of the entire neural network, but for any two adjacent layers in the network, the neurons in the lower layer of the network forward operation are the input neurons, and the neurons in the upper layer of the network forward operation are the output neurons. Taking convolutional neural network as an example, suppose a convolutional neural network has L layers, K=1,2,...,L-1, for the Kth layer and the K+1th layer, we call the Kth layer the input layer, the neurons therein are the input neurons, and the K+1th layer is called the output layer, the neurons therein are the output neurons. That is, except for the top layer, each layer can be used as an input layer, and the next layer is the corresponding output layer.

具体实现中,对于神经网络中的运算可以为神经网络中的一层的运算,对于多层神经网络,其实现过程是,在正向运算中,当上一层人工神经网络执行完成之后,下一层的运算指令会将运算单元中计算出的输出神经元作为下一层的输入神经元进行运算(或者是对该输出神经元进行某些操作再作为下一层的输入神经元),同时,将权值也替换为下一层的权值;在反向运算中,当上一层人工神经网络的反向运算执行完成后,下一层运算指令会将运算单元中计算出的输入神经元梯度作为下一层的输出神经元梯度进行运算(或者是对该输入神经元梯度进行某些操作再作为下一层的输出神经元梯度),同时将权值替换为下一层的权值。In a specific implementation, the operation in the neural network can be the operation of a layer in the neural network. For a multi-layer neural network, the implementation process is that in the forward operation, after the execution of the previous layer of artificial neural network is completed, the operation instruction of the next layer will use the output neuron calculated in the operation unit as the input neuron of the next layer for operation (or perform certain operations on the output neuron and then use it as the input neuron of the next layer), and at the same time, the weights are also replaced by the weights of the next layer; in the reverse operation, when the reverse operation of the previous layer of artificial neural network is completed, the operation instruction of the next layer will use the input neuron gradient calculated in the operation unit as the output neuron gradient of the next layer for operation (or perform certain operations on the input neuron gradient and then use it as the output neuron gradient of the next layer), and at the same time, the weights are replaced by the weights of the next layer.

以完成神经网络的正向运算过程为例,首先,运算单元从存储单元中读取第二输入数据,其中,第二输入数据包括第二权值数据以及输入神经元数据。Taking the forward operation process of the neural network as an example, first, the operation unit reads the second input data from the storage unit, wherein the second input data includes the second weight data and the input neuron data.

其次,主处理电路读取相对应的神经元数据,并将所述神经元数据按照指定顺序依次广播给各个从处理电路。在实际应用中,神经元数据可以只广播一次,从处理电路接收该数据后暂存到缓存或寄存器中,便于对其进行复用。此外,神经元数据也可以进行多次广播,从处理电路接收到数据之后直接使用,无需复用。在一种可能的实施方式中,主处理电路读取所述神经元数据之后,直接将神经元数据进行广播。Secondly, the main processing circuit reads the corresponding neuron data and broadcasts the neuron data to each slave processing circuit in the specified order. In practical applications, the neuron data can be broadcast only once, and the slave processing circuit receives the data and temporarily stores it in a cache or register to facilitate reuse. In addition, the neuron data can also be broadcast multiple times, and the slave processing circuit uses the data directly after receiving it without multiplexing. In one possible implementation, the main processing circuit directly broadcasts the neuron data after reading it.

之后,每个从处理电路将读入的神经元数据和第二权值数据根据运算指令进行内积运算,而后将内积结果传递回主处理电路。Afterwards, each slave processing circuit performs an inner product operation on the read neuron data and the second weight data according to the operation instruction, and then transmits the inner product result back to the master processing circuit.

在其中一个实施方式中,从处理电路可以将每次执行内积运算得到的部分和传输回主处理电路进行累加;在在其中一个实施方式中,也可以将每次从处理电路执行的内积运算得到的部分和保存在从处理电路的寄存器和/或片上缓存中,累加结束之后传输回主处理电路;在其中一个实施方式中,也可以将每次从处理电路执行的内积运算得到的部分和在部分情况下保存在从处理电路的寄存器和/或片上缓存中进行累加,部分情况下传输到主处理电路进行累加,累加结束之后传输回主处理电路。In one of the embodiments, the slave processing circuit may transmit the partial sum obtained from each inner product operation back to the main processing circuit for accumulation; in one of the embodiments, the partial sum obtained from each inner product operation performed by the slave processing circuit may be stored in the register and/or on-chip cache of the slave processing circuit, and transmitted back to the main processing circuit after the accumulation is completed; in one of the embodiments, the partial sum obtained from each inner product operation performed by the slave processing circuit may be stored in the register and/or on-chip cache of the slave processing circuit for accumulation in some cases, and transmitted to the main processing circuit for accumulation in some cases, and transmitted back to the main processing circuit after the accumulation is completed.

最后,主处理电路将各从处理电路的结果进行累加、激活等操作后,直到完成神经网络的正向运算过程,得到预测结果和实际结果间的误差值,即最后一层的神经元梯度数据,保存到存储单元。Finally, the main processing circuit accumulates and activates the results of each slave processing circuit until the forward operation process of the neural network is completed, and the error value between the predicted result and the actual result, that is, the neuron gradient data of the last layer, is obtained and saved in the storage unit.

在本发明实施例中,运算单元12可以设置成一主多从结构。在一种可选实施例中,运算单元12如图6所示,可以包括一个主处理电路101和多个从处理电路102。在一个实施例里,如图6所示,多个从处理电路呈阵列分布;每个从处理电路与相邻的其他从处理电路连接,主处理电路连接所述多个从处理电路中的k个从处理电路,所述k个从处理电路为:第1行的n个从处理电路、第m行的n个从处理电路以及第1列的m个从处理电路,需要说明的是,如图6所示的K个从处理电路仅包括第1行的n个从处理电路、第m行的n个从处理电路以及第1列的m个从处理电路,即该k个从处理电路为多个从处理电路中直接与主处理电路连接的从处理电路。In an embodiment of the present invention, the operation unit 12 can be set to a one-master-multiple-slave structure. In an optional embodiment, the operation unit 12, as shown in FIG6 , may include a master processing circuit 101 and multiple slave processing circuits 102. In one embodiment, as shown in FIG6 , multiple slave processing circuits are distributed in an array; each slave processing circuit is connected to other adjacent slave processing circuits, and the master processing circuit is connected to k slave processing circuits among the multiple slave processing circuits, and the k slave processing circuits are: n slave processing circuits in the first row, n slave processing circuits in the mth row, and m slave processing circuits in the first column. It should be noted that the K slave processing circuits shown in FIG6 only include n slave processing circuits in the first row, n slave processing circuits in the mth row, and m slave processing circuits in the first column, that is, the k slave processing circuits are slave processing circuits directly connected to the master processing circuit among the multiple slave processing circuits.

K个从处理电路,用于在所述主处理电路以及多个从处理电路之间的数据以及指令的转发。K slave processing circuits are used to forward data and instructions between the master processing circuit and multiple slave processing circuits.

可选的,如图7所示,该主处理电路还可以包括:转换处理电路120、激活处理电路121、加法处理电路122中的一种或任意组合;Optionally, as shown in FIG7 , the main processing circuit may further include: one or any combination of a conversion processing circuit 120 , an activation processing circuit 121 , and an addition processing circuit 122 ;

转换处理电路120,用于将主处理电路接收的数据块或中间结果执行第一数据结构与第二数据结构之间的互换(例如连续数据与离散数据的转换);或将主处理电路接收的数据块或中间结果执行第一数据类型与第二数据类型之间的互换(例如定点类型与浮点类型的转换);The conversion processing circuit 120 is used to perform an exchange between a first data structure and a second data structure (e.g., conversion between continuous data and discrete data) on the data block or intermediate result received by the main processing circuit; or perform an exchange between a first data type and a second data type (e.g., conversion between a fixed-point type and a floating-point type) on the data block or intermediate result received by the main processing circuit;

激活处理电路121,用于执行主处理电路内数据的激活运算;An activation processing circuit 121, used to perform activation operations on data in the main processing circuit;

加法处理电路122,用于执行加法运算或累加运算。The addition processing circuit 122 is used to perform addition operations or accumulation operations.

所述主处理电路,用于将确定所述输入神经元为广播数据,权值为分发数据,将分发数据分配成多个数据块,将所述多个数据块中的至少一个数据块以及多个运算指令中的至少一个运算指令发送给所述从处理电路;The master processing circuit is used to determine that the input neuron is broadcast data, the weight is distribution data, distribute the distribution data into multiple data blocks, and send at least one data block of the multiple data blocks and at least one operation instruction of the multiple operation instructions to the slave processing circuit;

所述多个从处理电路,用于依据该运算指令对接收到的数据块执行运算得到中间结果,并将运算结果传输给所述主处理电路;The multiple slave processing circuits are used to perform operations on the received data blocks according to the operation instructions to obtain intermediate results, and transmit the operation results to the master processing circuit;

所述主处理电路,用于将多个从处理电路发送的中间结果进行处理得到该计算指令的结果,将该计算指令的结果发送给所述控制器单元。The main processing circuit is used to process the intermediate results sent by the multiple slave processing circuits to obtain the result of the calculation instruction, and send the result of the calculation instruction to the controller unit.

所述从处理电路包括:乘法处理电路;The slave processing circuit includes: a multiplication processing circuit;

所述乘法处理电路,用于对接收到的数据块执行乘积运算得到乘积结果;The multiplication processing circuit is used to perform a product operation on the received data block to obtain a product result;

转发处理电路(可选的),用于将接收到的数据块或乘积结果转发。A forwarding processing circuit (optional) is used to forward the received data block or product result.

累加处理电路,所述累加处理电路,用于对该乘积结果执行累加运算得到该中间结果。The accumulation processing circuit is used to perform an accumulation operation on the product result to obtain the intermediate result.

另一个实施例里,该运算指令为矩阵乘以矩阵的指令、累加指令、激活指令等等计算指令。In another embodiment, the operation instruction is a matrix multiplication instruction, an accumulation instruction, an activation instruction, or the like.

下面通过神经网络运算指令来说明如图1所示的计算装置的具体计算方法。对于神经网络运算指令来说,其实际需要执行的公式可以为:s=s(Σwxi+b),其中,即将权值w乘以输入数据xi,进行求和,然后加上偏置b后做激活运算s(h),得到最终的输出结果s。The specific calculation method of the calculation device shown in FIG1 is explained below by using a neural network operation instruction. For the neural network operation instruction, the formula that actually needs to be executed can be: s=s(Σwxi +b), where the weight w is multiplied by the inputdataxi , the sum is taken, and then the bias b is added and the activation operation s(h) is performed to obtain the final output result s.

在一种可选的实施方案中,如图8所示,所述运算单元包括:树型模块40,所述树型模块包括:一个根端口401和多个支端口404,所述树型模块的根端口连接所述主处理电路,所述树型模块的多个支端口分别连接多个从处理电路中的一个从处理电路;上述树型模块具有收发功能,所述树型模块具有收发功能,用于转发所述主处理电路与所述多个从处理电路之间的数据块、权值以及运算指令,即可以将主处理电路的数据传送给各个从处理电路,也可以将各个从处理电路的数据传送给主处理电路。In an optional implementation, as shown in Figure 8, the operation unit includes: a tree module 40, the tree module includes: a root port 401 and multiple branch ports 404, the root port of the tree module is connected to the main processing circuit, and the multiple branch ports of the tree module are respectively connected to one of the multiple slave processing circuits; the above-mentioned tree module has a transceiver function, and the tree module has a transceiver function for forwarding data blocks, weights and operation instructions between the main processing circuit and the multiple slave processing circuits, that is, the data of the main processing circuit can be transmitted to each slave processing circuit, and the data of each slave processing circuit can also be transmitted to the main processing circuit.

可选的,该树型模块为计算装置的可选择结果,其可以包括至少1层节点,该节点为具有转发功能的线结构,该节点本身可以不具有计算功能。如树型模块具有零层节点,即无需该树型模块。Optionally, the tree module is an optional result of the computing device, which may include at least one layer of nodes, the nodes are line structures with forwarding functions, and the nodes themselves may not have computing functions. If the tree module has zero-layer nodes, the tree module is not needed.

可选的,该树型模块可以为n叉树结构,例如,如图9所示的二叉树结构,当然也可以为三叉树结构,该n可以为大于等于2的整数。本申请具体实施方式并不限制上述n的具体取值,上述层数也可以为2,从处理电路可以连接除倒数第二层节点以外的其他层的节点,例如可以连接如图9所示的倒数第一层的节点。Optionally, the tree module may be an n-ary tree structure, for example, a binary tree structure as shown in FIG9 , or a ternary tree structure, and n may be an integer greater than or equal to 2. The specific implementation of the present application does not limit the specific value of n, and the number of layers may also be 2, and the slave processing circuit may be connected to nodes of other layers except the penultimate layer nodes, for example, the penultimate layer nodes as shown in FIG9 may be connected.

可选的,上述运算单元可以携带单独的缓存,如图10所示,可以包括:神经元缓存单元,该神经元缓存单元63缓存该从处理电路的输入神经元向量数据和输出神经元值数据。Optionally, the above-mentioned operation unit may carry a separate cache, as shown in FIG10 , and may include: a neuron cache unit, wherein the neuron cache unit 63 caches the input neuron vector data and output neuron value data of the slave processing circuit.

如图11所示,该运算单元还可以包括:权值缓存单元64,用于缓存该从处理电路在计算过程中需要的权值数据。As shown in FIG. 11 , the operation unit may further include: a weight cache unit 64 for caching weight data required by the slave processing circuit during the calculation process.

在一种可选实施例中,运算单元12如图12所示,可以包括分支处理电路103;其具体的连接结构如图12所示,其中,In an optional embodiment, the operation unit 12 is shown in FIG. 12 and may include a branch processing circuit 103; its specific connection structure is shown in FIG. 12, wherein:

主处理电路101与分支处理电路103(一个或多个)连接,分支处理电路103与一个或多个从处理电路102连接;The master processing circuit 101 is connected to the branch processing circuit 103 (one or more), and the branch processing circuit 103 is connected to one or more slave processing circuits 102;

分支处理电路103,用于执行转发主处理电路101与从处理电路102之间的数据或指令。The branch processing circuit 103 is used to forward data or instructions between the main processing circuit 101 and the slave processing circuit 102 .

在一种可选实施例中,以神经网络运算中的全连接运算为例,过程可以为:y=f(wx+b),其中,x为输入神经元矩阵,w为权值矩阵,b为偏置标量,f为激活函数,具体可以为:sigmoid函数,tanh、relu、softmax函数中的任意一个。这里假设为二叉树结构,具有8个从处理电路,其实现的方法可以为:In an optional embodiment, taking the full connection operation in the neural network operation as an example, the process can be: y = f (wx + b), where x is the input neuron matrix, w is the weight matrix, b is the bias scalar, and f is the activation function, which can be: sigmoid function, any one of tanh, relu, and softmax functions. Here, it is assumed that the binary tree structure has 8 slave processing circuits, and the implementation method can be:

控制器单元从存储单元内获取输入神经元矩阵x,权值矩阵w以及全连接运算指令,将输入神经元矩阵x,权值矩阵w以及全连接运算指令传输给主处理电路;The controller unit obtains the input neuron matrix x, the weight matrix w and the full connection operation instruction from the storage unit, and transmits the input neuron matrix x, the weight matrix w and the full connection operation instruction to the main processing circuit;

主处理电路确定该输入神经元矩阵x为广播数据,确定权值矩阵w为分发数据,将权值矩阵w拆分成8个子矩阵,然后将8个子矩阵通过树型模块分发给8个从处理电路,将输入神经元矩阵x广播给8个从处理电路,The master processing circuit determines that the input neuron matrix x is broadcast data, determines that the weight matrix w is distributed data, splits the weight matrix w into 8 sub-matrices, and then distributes the 8 sub-matrices to 8 slave processing circuits through the tree module, and broadcasts the input neuron matrix x to the 8 slave processing circuits.

从处理电路并行执行8个子矩阵与输入神经元矩阵x的乘法运算和累加运算得到8个中间结果,将8个中间结果发送给主处理电路;The slave processing circuit performs multiplication and accumulation operations of the eight sub-matrices and the input neuron matrix x in parallel to obtain eight intermediate results, and sends the eight intermediate results to the master processing circuit;

主处理电路,用于将8个中间结果排序得到wx的运算结果,将该运算结果执行偏置b的运算后执行激活操作得到最终结果y,将最终结果y发送至控制器单元,控制器单元将该最终结果y输出或存储至存储单元内。The main processing circuit is used to sort the 8 intermediate results to obtain the operation result of wx, perform the operation of bias b on the operation result and then perform the activation operation to obtain the final result y, and send the final result y to the controller unit, and the controller unit outputs or stores the final result y in the storage unit.

如图1所示的计算装置执行神经网络正向运算指令的方法具体可以为:The method for the computing device shown in FIG1 to execute the neural network forward operation instruction may specifically be:

控制器单元从指令存储单元内提取神经网络正向运算指令、神经网络运算指令对应的操作域以及至少一个操作码,控制器单元将该操作域传输至数据访问单元,将该至少一个操作码发送至运算单元。The controller unit extracts the neural network forward operation instruction, the operation domain corresponding to the neural network operation instruction and at least one operation code from the instruction storage unit, transmits the operation domain to the data access unit, and sends the at least one operation code to the operation unit.

控制器单元从存储单元内提取该操作域对应的权值w和偏置b(当b为0时,不需要提取偏置b),将权值w和偏置b传输至运算单元的主处理电路,控制器单元从存储单元内提取输入数据Xi,将该输入数据Xi发送至主处理电路。The controller unit extracts the weight w and bias b corresponding to the operation domain from the storage unit (when b is 0, there is no need to extract bias b), and transmits the weight w and bias b to the main processing circuit of the operation unit. The controller unit extracts the input data Xi from the storage unit and sends the input data Xi to the main processing circuit.

主处理电路依据该至少一个操作码确定为乘法运算,确定输入数据Xi为广播数据,确定权值数据为分发数据,将权值w拆分成n个数据块;The main processing circuit determines that the at least one operation code is a multiplication operation, determines that the input data Xi is broadcast data, determines that the weight data is distribution data, and splits the weight w into n data blocks;

控制器单元的指令处理单元依据该至少一个操作码确定乘法指令、偏置指令和累加指令,将乘法指令、偏置指令和累加指令发送至主处理电路,主处理电路将该乘法指令、输入数据Xi以广播的方式发送给多个从处理电路,将该n个数据块分发给该多个从处理电路(例如具有n个从处理电路,那么每个从处理电路发送一个数据块);多个从处理电路,用于依据该乘法指令将该输入数据Xi与接收到的数据块执行乘法运算得到中间结果,将该中间结果发送至主处理电路,该主处理电路依据该累加指令将多个从处理电路发送的中间结果执行累加运算得到累加结果,依据该偏置指令将该累加结果执行加偏置b得到最终结果,将该最终结果发送至该控制器单元。The instruction processing unit of the controller unit determines the multiplication instruction, the bias instruction and the accumulation instruction according to the at least one operation code, and sends the multiplication instruction, the bias instruction and the accumulation instruction to the main processing circuit, the main processing circuit sends the multiplication instruction and the input data Xi to multiple slave processing circuits in a broadcast manner, and distributes the n data blocks to the multiple slave processing circuits (for example, if there are n slave processing circuits, then each slave processing circuit sends a data block); multiple slave processing circuits are used to perform multiplication operations on the input data Xi and the received data blocks according to the multiplication instruction to obtain an intermediate result, and send the intermediate result to the main processing circuit, the main processing circuit performs accumulation operations on the intermediate results sent by the multiple slave processing circuits according to the accumulation instruction to obtain an accumulation result, adds the bias b to the accumulation result according to the bias instruction to obtain a final result, and sends the final result to the controller unit.

另外,加法运算和乘法运算的顺序可以调换。In addition, the order of addition and multiplication operations can be reversed.

本申请提供的技术方案通过一个指令即神经网络运算指令即实现了神经网络的乘法运算以及偏置运算,在神经网络计算的中间结果均无需存储或提取,减少了中间数据的存储以及提取操作,所以其具有减少对应的操作步骤,提高神经网络的计算效果的优点。The technical solution provided in the present application implements the multiplication operation and bias operation of the neural network through one instruction, namely the neural network operation instruction. The intermediate results of the neural network calculation do not need to be stored or extracted, which reduces the storage and extraction operations of the intermediate data. Therefore, it has the advantages of reducing the corresponding operation steps and improving the calculation effect of the neural network.

本申请还揭露了一个机器学习运算装置,其包括一个或多个在本申请中提到的计算装置,用于从其他处理装置中获取待运算数据和控制信息,执行指定的机器学习运算,执行结果通过I/O接口传递给外围设备。外围设备譬如摄像头,显示器,鼠标,键盘,网卡,wifi接口,服务器。当包含一个以上计算装置时,计算装置间可以通过特定的结构进行链接并传输数据,譬如,通过PCIE总线进行互联并传输数据,以支持更大规模的机器学习的运算。此时,可以共享同一控制系统,也可以有各自独立的控制系统;可以共享内存,也可以每个加速器有各自的内存。此外,其互联方式可以是任意互联拓扑。The present application also discloses a machine learning computing device, which includes one or more computing devices mentioned in the present application, and is used to obtain data to be calculated and control information from other processing devices, perform specified machine learning operations, and transmit the execution results to peripheral devices through I/O interfaces. Peripheral devices include cameras, displays, mice, keyboards, network cards, wifi interfaces, and servers. When more than one computing device is included, the computing devices can be linked and data can be transmitted through a specific structure, for example, interconnected and data can be transmitted through a PCIE bus to support larger-scale machine learning operations. At this time, the same control system can be shared, or each independent control system can be provided; memory can be shared, or each accelerator can have its own memory. In addition, the interconnection method can be any interconnection topology.

该机器学习运算装置具有较高的兼容性,可通过PCIE接口与各种类型的服务器相连接。The machine learning computing device has high compatibility and can be connected to various types of servers through a PCIE interface.

本申请还揭露了一个组合处理装置,其包括上述的机器学习运算装置,通用互联接口,和其他处理装置。机器学习运算装置与其他处理装置进行交互,共同完成用户指定的操作。图13为组合处理装置的示意图。The present application also discloses a combined processing device, which includes the above-mentioned machine learning computing device, a universal interconnection interface, and other processing devices. The machine learning computing device interacts with other processing devices to jointly complete the operation specified by the user. FIG13 is a schematic diagram of the combined processing device.

其他处理装置,包括中央处理器CPU、图形处理器GPU、神经网络处理器等通用/专用处理器中的一种或以上的处理器类型。其他处理装置所包括的处理器数量不做限制。其他处理装置作为机器学习运算装置与外部数据和控制的接口,包括数据搬运,完成对本机器学习运算装置的开启、停止等基本控制;其他处理装置也可以和机器学习运算装置协作共同完成运算任务。Other processing devices include one or more types of processors such as central processing unit (CPU), graphics processing unit (GPU), neural network processor, and other general/special processors. There is no limit on the number of processors included in other processing devices. Other processing devices serve as interfaces between the machine learning computing device and external data and control, including data handling, to complete basic control of the machine learning computing device such as starting and stopping; other processing devices can also collaborate with the machine learning computing device to complete computing tasks.

通用互联接口,用于在所述机器学习运算装置与其他处理装置间传输数据和控制指令。该机器学习运算装置从其他处理装置中获取所需的输入数据,写入机器学习运算装置片上的存储装置;可以从其他处理装置中获取控制指令,写入机器学习运算装置片上的控制缓存;也可以读取机器学习运算装置的存储模块中的数据并传输给其他处理装置。A universal interconnection interface is used to transmit data and control instructions between the machine learning computing device and other processing devices. The machine learning computing device can obtain the required input data from other processing devices and write it into the storage device on the machine learning computing device chip; it can obtain control instructions from other processing devices and write them into the control cache on the machine learning computing device chip; it can also read data in the storage module of the machine learning computing device and transmit it to other processing devices.

可选的,该结构如图14所示,还可以包括存储装置,存储装置分别与所述机器学习运算装置和所述其他处理装置连接。存储装置用于保存在所述机器学习运算装置和所述其他处理装置的数据,尤其适用于所需要运算的数据在本机器学习运算装置或其他处理装置的内部存储中无法全部保存的数据。Optionally, as shown in FIG14 , the structure may further include a storage device, which is connected to the machine learning operation device and the other processing device, respectively. The storage device is used to store data in the machine learning operation device and the other processing device, and is particularly suitable for data that cannot be fully stored in the internal storage of the machine learning operation device or other processing devices.

该组合处理装置可以作为手机、机器人、无人机、视频监控设备等设备的SOC片上系统,有效降低控制部分的核心面积,提高处理速度,降低整体功耗。此情况时,该组合处理装置的通用互联接口与设备的某些部件相连接。某些部件譬如摄像头,显示器,鼠标,键盘,网卡,wifi接口。The combined processing device can be used as a SOC chip system for mobile phones, robots, drones, video surveillance equipment and other devices, effectively reducing the core area of the control part, improving the processing speed, and reducing the overall power consumption. In this case, the universal interconnection interface of the combined processing device is connected to certain components of the device. Certain components include cameras, displays, mice, keyboards, network cards, and wifi interfaces.

在一些实施例里,还申请了一种芯片,其包括了上述机器学习运算装置或组合处理装置。In some embodiments, a chip is also applied for, which includes the above-mentioned machine learning computing device or combined processing device.

在一些实施例里,申请了一种芯片封装结构,其包括了上述芯片。In some embodiments, a chip packaging structure is applied for, which includes the above-mentioned chip.

在一些实施例里,申请了一种板卡,其包括了上述芯片封装结构。参阅图15,图15提供了一种板卡,上述板卡除了包括上述芯片389以外,还可以包括其他的配套部件,该配套部件包括但不限于:存储器件390、接口装置391和控制器件392;In some embodiments, a board card is applied, which includes the above chip packaging structure. Referring to FIG. 15 , FIG. 15 provides a board card, which includes, in addition to the above chip 389 , other supporting components, including but not limited to: a storage device 390 , an interface device 391 and a control device 392 ;

所述存储器件390与所述芯片封装结构内的芯片通过总线连接,用于存储数据。所述存储器件可以包括多组存储单元393。每一组所述存储单元与所述芯片通过总线连接。可以理解,每一组所述存储单元可以是DDR SDRAM(英文:Double Data Rate SDRAM,双倍速率同步动态随机存储器)。The memory device 390 is connected to the chip in the chip package structure via a bus for storing data. The memory device may include multiple groups of memory cells 393. Each group of memory cells is connected to the chip via a bus. It is understood that each group of memory cells may be DDR SDRAM (English: Double Data Rate SDRAM, double rate synchronous dynamic random access memory).

DDR不需要提高时钟频率就能加倍提高SDRAM的速度。DDR允许在时钟脉冲的上升沿和下降沿读出数据。DDR的速度是标准SDRAM的两倍。在一个实施例中,所述存储装置可以包括4组所述存储单元。每一组所述存储单元可以包括多个DDR4颗粒(芯片)。在一个实施例中,所述芯片内部可以包括4个72位DDR4控制器,上述72位DDR4控制器中64bit用于传输数据,8bit用于ECC校验。可以理解,当每一组所述存储单元中采用DDR4-3200颗粒时,数据传输的理论带宽可达到25600MB/s。DDR can double the speed of SDRAM without increasing the clock frequency. DDR allows data to be read out on the rising and falling edges of the clock pulse. The speed of DDR is twice that of standard SDRAM. In one embodiment, the storage device may include 4 groups of storage units. Each group of storage units may include multiple DDR4 particles (chips). In one embodiment, the chip may include 4 72-bit DDR4 controllers, 64 bits of the above 72-bit DDR4 controllers are used to transmit data, and 8 bits are used for ECC verification. It can be understood that when DDR4-3200 particles are used in each group of storage units, the theoretical bandwidth of data transmission can reach 25600MB/s.

在一个实施例中,每一组所述存储单元包括多个并联设置的双倍速率同步动态随机存储器。DDR在一个时钟周期内可以传输两次数据。在所述芯片中设置控制DDR的控制器,用于对每个所述存储单元的数据传输与数据存储的控制。In one embodiment, each group of the storage units includes a plurality of double rate synchronous dynamic random access memories arranged in parallel. DDR can transmit data twice in one clock cycle. A controller for controlling DDR is arranged in the chip to control the data transmission and data storage of each of the storage units.

所述接口装置与所述芯片封装结构内的芯片电连接。所述接口装置用于实现所述芯片与外部设备(例如服务器或计算机)之间的数据传输。例如在一个实施例中,所述接口装置可以为标准PCIE接口。比如,待处理的数据由服务器通过标准PCIE接口传递至所述芯片,实现数据转移。优选的,当采用PCIE 3.0X 16接口传输时,理论带宽可达到16000MB/s。在另一个实施例中,所述接口装置还可以是其他的接口,本申请并不限制上述其他的接口的具体表现形式,所述接口单元能够实现转接功能即可。另外,所述芯片的计算结果仍由所述接口装置传送回外部设备(例如服务器)。The interface device is electrically connected to the chip in the chip packaging structure. The interface device is used to realize data transmission between the chip and an external device (such as a server or a computer). For example, in one embodiment, the interface device can be a standard PCIE interface. For example, the data to be processed is transmitted to the chip by the server through the standard PCIE interface to realize data transfer. Preferably, when the PCIE 3.0X 16 interface is used for transmission, the theoretical bandwidth can reach 16000MB/s. In another embodiment, the interface device can also be other interfaces. This application does not limit the specific forms of expression of the above-mentioned other interfaces. The interface unit can realize the switching function. In addition, the calculation results of the chip are still transmitted back to the external device (such as a server) by the interface device.

所述控制器件与所述芯片电连接。所述控制器件用于对所述芯片的状态进行监控。具体的,所述芯片与所述控制器件可以通过SPI接口电连接。所述控制器件可以包括单片机(Micro Controller Unit,MCU)。如所述芯片可以包括多个处理芯片、多个处理核或多个处理电路,可以带动多个负载。因此,所述芯片可以处于多负载和轻负载等不同的工作状态。通过所述控制装置可以实现对所述芯片中多个处理芯片、多个处理和或多个处理电路的工作状态的调控。The control device is electrically connected to the chip. The control device is used to monitor the state of the chip. Specifically, the chip and the control device can be electrically connected via an SPI interface. The control device may include a single-chip microcomputer (Micro Controller Unit, MCU). For example, the chip may include multiple processing chips, multiple processing cores or multiple processing circuits, which can drive multiple loads. Therefore, the chip can be in different working states such as multi-load and light load. The control device can realize the regulation of the working states of multiple processing chips, multiple processing and/or multiple processing circuits in the chip.

在一些实施例里,申请了一种电子设备,其包括了上述板卡。In some embodiments, an electronic device is applied for, which includes the above-mentioned board.

电子设备包括数据处理装置、机器人、电脑、打印机、扫描仪、平板电脑、智能终端、手机、行车记录仪、导航仪、传感器、摄像头、服务器、云端服务器、相机、摄像机、投影仪、手表、耳机、移动存储、可穿戴设备、交通工具、家用电器、和/或医疗设备。Electronic devices include data processing devices, robots, computers, printers, scanners, tablet computers, smart terminals, mobile phones, driving recorders, navigators, sensors, cameras, servers, cloud servers, cameras, camcorders, projectors, watches, headphones, mobile storage, wearable devices, vehicles, household appliances, and/or medical devices.

所述交通工具包括飞机、轮船和/或车辆;所述家用电器包括电视、空调、微波炉、冰箱、电饭煲、加湿器、洗衣机、电灯、燃气灶、油烟机;所述医疗设备包括核磁共振仪、B超仪和/或心电图仪。The transportation means include airplanes, ships and/or vehicles; the household appliances include televisions, air conditioners, microwave ovens, refrigerators, rice cookers, humidifiers, washing machines, electric lights, gas stoves, and range hoods; the medical equipment includes magnetic resonance imaging, ultrasound machines and/or electrocardiographs.

在本发明实施例中,考虑到针对神经网络的剪枝方法可以包括但不限于应用在上述计算装置中,还可以应用在其它场景下,例如,减少神经网络的精度损失。基于此,下面结合图16所示的本发明实施例提供的神经网络剪枝方法的流程示意图,具体说明本发明是如何实现针对第一权值数据的平衡剪枝,以得到第二权值数据的,可以包括但不限于如下步骤:In the embodiment of the present invention, it is considered that the pruning method for a neural network can be applied to, but not limited to, the above-mentioned computing device, and can also be applied to other scenarios, for example, to reduce the accuracy loss of the neural network. Based on this, the following is a flowchart of the neural network pruning method provided by the embodiment of the present invention shown in FIG16 to specifically explain how the present invention implements balanced pruning for the first weight data to obtain the second weight data, which can include but is not limited to the following steps:

步骤S100、获取第一输入数据;其中,所述第一输入数据包括第一权值数据。Step S100: Obtain first input data; wherein the first input data includes first weight data.

具体实现中,第一权值数据可以为任意实数。这里,权值数据是指神经网络层与层之间的连接值,也即神经元之间的信息传递强度。In a specific implementation, the first weight data may be any real number. Here, the weight data refers to the connection value between the neural network layers, that is, the information transmission strength between neurons.

步骤S102、将所述第一权值数据调整为第二权值数据。Step S102: Adjust the first weight data to second weight data.

在其中一个实施方式中,所述将所述第一权值数据调整为第二权值数据,包括:In one implementation manner, adjusting the first weight data to second weight data includes:

对所述第一权值数据进行分组,得到M组权值;其中,M为大于0的正整数;Grouping the first weight data to obtain M groups of weights; wherein M is a positive integer greater than 0;

根据预设的稀疏度P确定所述M组权值中至少一组权值的阈值;Determining a threshold of at least one set of weights in the M sets of weights according to a preset sparsity P;

根据确定好的所述阈值对所述M组权值中的至少一组权值进行剪枝,得到第二权值数据。At least one group of weights in the M groups of weights is pruned according to the determined threshold value to obtain second weight data.

在其中另一个实施方式中,所述将所述第一权值数据调整为第二权值数据包括:In another embodiment, adjusting the first weight data to second weight data includes:

对所述第一权值数据进行分组,得到M组权值;其中,M为大于0的正整数;Grouping the first weight data to obtain M groups of weights; wherein M is a positive integer greater than 0;

根据预设的稀疏度P确定所述M组权值中每组权值的阈值;Determine a threshold value of each group of weights in the M groups of weights according to a preset sparsity P;

根据确定好的所述阈值对所述M组权值中的每组权值进行剪枝,得到第二权值数据。Each group of weights in the M groups of weights is pruned according to the determined threshold value to obtain second weight data.

在本发明实施例中,对第一权值数据进行分组可以包括对第一权值数据进行连续分组,也可以包括对第一权值数据进行交叉分组。In the embodiment of the present invention, grouping the first weight data may include continuously grouping the first weight data, or may include cross-grouping the first weight data.

具体实现中,将第一权值数据调整为第二权值数据的实现过程中,当应用到不同的神经网络时(例如,全连接层神经网络、卷积层神经网络、LSTM层神经网络),上述所涉及的针对第一权值数据的分组操作、确定每组权值中的阈值以及根据阈值进行剪枝操作将有所差异,接下来将进行具体阐述:In a specific implementation, in the process of adjusting the first weight data to the second weight data, when applied to different neural networks (for example, a fully connected layer neural network, a convolutional layer neural network, and an LSTM layer neural network), the grouping operation on the first weight data, determining the threshold in each group of weights, and performing pruning operations according to the threshold will be different, which will be specifically described below:

(1)全连接层神经网络:(1) Fully connected layer neural network:

全连接层是指对n-1层和n层而言,n-1层的任意一个节点,都和n层的所有节点有连接。具体地,参见图5A,是本发明实施例提供的一种神经网络的一维全连接层的结构示意图,如图5A所示,该神经网络包括输入层、隐含层以及输出层,其中,输入层到隐含层之间的这一全连接层的二维参数矩阵为(3,4),该二维参数矩阵(3,4)表示在输入层到隐含层之间的全连接层结构中,输入神经元的个数为3,输出神经元的个数为4,权值数量为12。具体实现中,这12个权值可以表示为4行3列的权值矩阵,其权值矩阵的表现形式可以如图5B所示。A fully connected layer means that for the n-1 layer and the n layer, any node in the n-1 layer is connected to all the nodes in the n layer. Specifically, referring to FIG5A, it is a schematic diagram of the structure of a one-dimensional fully connected layer of a neural network provided in an embodiment of the present invention. As shown in FIG5A, the neural network includes an input layer, a hidden layer, and an output layer, wherein the two-dimensional parameter matrix of the fully connected layer between the input layer and the hidden layer is (3,4), and the two-dimensional parameter matrix (3,4) indicates that in the fully connected layer structure between the input layer and the hidden layer, the number of input neurons is 3, the number of output neurons is 4, and the number of weights is 12. In a specific implementation, these 12 weights can be represented as a weight matrix of 4 rows and 3 columns, and the representation of the weight matrix can be shown in FIG5B.

在实际应用中,将全连接层的权值分成M组时,M为大于1且小于Nout的正整数。In practical applications, when the weights of the fully connected layer are divided into M groups, M is a positive integer greater than 1 and less than Nout .

在其中一个实施方式中,对权值矩阵进行连续分组时,权值矩阵的连续行为同一组。第i组权值包括权值矩阵的第行,第行,第行,……,第行。其中,i为大于0且小于M的正整数,Nout为输出神经元的个数。In one embodiment, when the weight matrix is grouped continuously, the continuous The i-th group of weights includes the i-th Row, No. Row, No. , , , Wherein, i is a positive integer greater than 0 and less than M, and Nout is the number of output neurons.

在其中另一个实施方式中,对权值矩阵进行交叉分组时,权值矩阵的交替的行为同一组。第i组权值包括权值矩阵的第i行,第i+M行,第i+M*2行,……,第行。其中,i为大于0且小于M的正整数,Nout为输出神经元的个数。In another embodiment, when the weight matrix is cross-grouped, the alternating The i-th group of weights includes the i-th row, i+M-th row, i+M*2-th row, ..., i ... Wherein, i is a positive integer greater than 0 and less than M, and Nout is the number of output neurons.

如前所述,当权值矩阵的表现形式可以如图5B时,假设将上述12个权值分成4组,此时,每组中的权值数量为3个。当分组模式为连续分组时,针对上述12个权值的连续分组情况,可以参见图5C。如图5C所示,第一组权值为权值矩阵的第1行;第二组权值为权值矩阵的第2行;第3组权值为权值矩阵的第3行;第4组权值为权值矩阵的第4行。As mentioned above, when the weight matrix can be expressed as FIG5B, it is assumed that the above 12 weights are divided into 4 groups. At this time, the number of weights in each group is 3. When the grouping mode is continuous grouping, FIG5C can be referred to for the continuous grouping of the above 12 weights. As shown in FIG5C, the first group of weights is the first row of the weight matrix; the second group of weights is the second row of the weight matrix; the third group of weights is the third row of the weight matrix; and the fourth group of weights is the fourth row of the weight matrix.

同样地,在实际应用中,假设将上述12个权值分成2组,此时,每组中的权值数量为6个。当分组模式为交叉分组时,针对上述12个权值的交叉分组情况,可以参见图5D,如图5D所示,第一组权值为权值矩阵的第1行以及权值矩阵的第3行;第二组权值为权值矩阵的第2行以及权值矩阵的第4行。Similarly, in practical applications, it is assumed that the above 12 weights are divided into 2 groups, and the number of weights in each group is 6. When the grouping mode is cross grouping, for the cross grouping of the above 12 weights, see FIG5D . As shown in FIG5D , the first group of weights is the first row of the weight matrix and the third row of the weight matrix; the second group of weights is the second row of the weight matrix and the fourth row of the weight matrix.

在对第一权值数据进行分组之后,根据预设的稀疏度P确定M组权值中每组权值的阈值,包括:After the first weight data is grouped, a threshold value of each weight group in the M weight groups is determined according to a preset sparsity P, including:

确定所述M组权值中的第i组权值中第Q个权值为每组权值的阈值,其中,Nin为输入神经元的个数,Nout为输出神经元的个数,所述第i组中的权值按绝对值从小到大顺序排列,所述i为小于等于M的正整数。Determine the Qth weight in the i-th group of weights in the M groups of weights as the threshold value for each group of weights, wherein, Nin is the number of input neurons, Nout is the number of output neurons, the weights in the i-th group are arranged in ascending order of absolute value, and i is a positive integer less than or equal to M.

在实际应用中,考虑到采用上述关于Q的计算公式计算得到的Q可以包括整数,也可以包括非整数。在其中一个可选的实施方式中,当Q为非整数时,可以对Q进行取整运算。具体实现中,这里所涉及的取整运算可以包括向上取整运算,也可以包括向下取整运算。In practical applications, it is considered that Q calculated using the above calculation formula for Q may include an integer or a non-integer. In one optional implementation, when Q is a non-integer, a rounding operation may be performed on Q. In a specific implementation, the rounding operation involved here may include an upward rounding operation or a downward rounding operation.

例如,以图5A所示的神经网络架构中输入层到隐含层的这一全连接层为例,输入神经元的个数为3,输出神经元的个数为4,权值数量为12,采用连续分组的分组方式,将该权值矩阵分成4组。根据预设的稀疏度P确定第1组权值中第2个权值为每组权值的阈值。其中,第1组权值中的第2个权值为0.5,也即第1组权值的阈值为0.5;第2组权值中的第2个权值为0.4,也即第2组权值的阈值为0.4;第3组权值中的第2个权值为0.65,也即第3组权值的阈值为0.65;第4组权值中的第2个权值为0.45,也即第4组权值的阈值为0.45。For example, taking the fully connected layer from the input layer to the hidden layer in the neural network architecture shown in FIG5A as an example, the number of input neurons is 3, the number of output neurons is 4, the number of weights is 12, and the weight matrix is divided into 4 groups by adopting the grouping method of continuous grouping. The second weight in the first group of weights is determined as the threshold of each group of weights according to the preset sparsity P. Among them, the second weight in the first group of weights is 0.5, that is, the threshold of the first group of weights is 0.5; the second weight in the second group of weights is 0.4, that is, the threshold of the second group of weights is 0.4; the second weight in the third group of weights is 0.65, that is, the threshold of the third group of weights is 0.65; the second weight in the fourth group of weights is 0.45, that is, the threshold of the fourth group of weights is 0.45.

在根据预设的稀疏度P确定每组权值的阈值之后,根据确定好的阈值对每组权值进行剪枝。通俗的说,对M组权值中的每组权值进行剪枝是指去除一些非必要的权值,以减小神经网络的参数的规模。After determining the threshold of each group of weights according to the preset sparsity P, each group of weights is pruned according to the determined threshold. In layman's terms, pruning each group of weights in the M groups of weights means removing some unnecessary weights to reduce the scale of the parameters of the neural network.

具体实现中,根据确定好的阈值对M组权值中的每组权值进行剪枝,得到第二权值数据,包括:In a specific implementation, each group of weights in the M groups of weights is pruned according to a determined threshold value to obtain second weight data, including:

对第i组(i=0,1,……,M)权值中小于所述确定好的阈值的权值进行剪枝,得到所述第二权值数据。The weights in the i-th group (i=0, 1, ..., M) of weights that are smaller than the determined threshold are pruned to obtain the second weight data.

如前所述,以神经网络的全连接层为例,以图5C所示,当分组模式为连续分组时,将12个权值分为4组,其中,第一组权值为权值矩阵的第1行;第二组权值为权值矩阵的第2行;第3组权值为权值矩阵的第3行;第4组权值为权值矩阵的第4行。根据预设的稀疏度P确定第1组权值中阈值为0.5,第2组权值中的阈值为0.4,第3组权值中的阈值为0.65,第4组权值中的阈值为0.45,,继而,根据上述确定好的阈值对每组权值进行剪枝。具体来说,在第一组权值中,将小于0.5的权值去除。在第2组权值中,将小于0.4的权值去除。在第3组权值中,将小于0.65的权值去除。在第4组权值中,将小于0.45的权值去除。具体地,对图5C所示的4组权值进行剪枝后的示意图可以如5I所示,如图5I所示,在剪枝之后,可以确保这4组权值的稀疏度相同。As mentioned above, taking the fully connected layer of the neural network as an example, as shown in FIG5C, when the grouping mode is continuous grouping, the 12 weights are divided into 4 groups, wherein the first group of weights is the first row of the weight matrix; the second group of weights is the second row of the weight matrix; the third group of weights is the third row of the weight matrix; and the fourth group of weights is the fourth row of the weight matrix. According to the preset sparsity P, the threshold value in the first group of weights is determined to be 0.5, the threshold value in the second group of weights is 0.4, the threshold value in the third group of weights is 0.65, and the threshold value in the fourth group of weights is 0.45, and then, each group of weights is pruned according to the above-determined threshold value. Specifically, in the first group of weights, weights less than 0.5 are removed. In the second group of weights, weights less than 0.4 are removed. In the third group of weights, weights less than 0.65 are removed. In the fourth group of weights, weights less than 0.45 are removed. Specifically, a schematic diagram after pruning the four groups of weights shown in FIG. 5C may be shown in FIG. 5I . As shown in FIG. 5I , after pruning, it can be ensured that the sparsity of the four groups of weights is the same.

在其中一个实施方式中,当M=Nout,也即分组数量与输出神经元的个数相等时,可以确保每个神经元的计算量相同,从而解决负载不均衡的问题。In one implementation, when M=Nout , that is, the number of groups is equal to the number of output neurons, it can be ensured that the computation amount of each neuron is the same, thereby solving the problem of load imbalance.

(2)卷积层神经网络:(2) Convolutional neural network:

以神经网络的卷积层为例,如图5G所示,卷积层可以认为是一个四维矩阵(Nfin,Nfout,Kx,Ky),其中,Nfin为输入特征图像的数量,Nfout为输出特征图像的数量,(Kx,Ky)为卷积层中卷积核的大小。Taking the convolution layer of a neural network as an example, as shown in Figure 5G, the convolution layer can be considered as a four-dimensional matrix (Nfin ,Nfout ,Kx ,Ky ), whereNfin is the number of input feature images,Nfout is the number of output feature images, and (Kx ,Ky ) is the size of the convolution kernel in the convolution layer.

在实际应用中,将卷积层的权值分成M组时,M为大于1且小于Nfout的正整数。In practical applications, when the weights of the convolutional layer are divided into M groups, M is a positive integer greater than 1 and less than Nfout .

在其中一个实施方式中,对权值矩阵进行连续分组时,权值矩阵的连续个卷积核为同一组。第i组权值包括权值矩阵的第个,第个,第个,……,第个卷积核。其中,i为大于0且小于M的正整数。In one embodiment, when the weight matrix is grouped continuously, the continuous The convolution kernels are in the same group. The i-th group of weights includes the , , , ..., convolution kernels. Where i is a positive integer greater than 0 and less than M.

在其中另一个实施方式中,对权值矩阵进行交叉分组时,权值矩阵中交替的个卷积核为同一组。第i组权值包括权值矩阵的第i个,第i+M行个,第i+M*2个,……,第个卷积核。其中,i为大于0且小于M的正整数。In another embodiment, when the weight matrix is cross-grouped, the alternating convolution kernels are in the same group. The i-th group of weights includes the i-th, i+M-th, i+M*2-th, ..., i ... convolution kernels. Where i is a positive integer greater than 0 and less than M.

如前所述,当权值矩阵中的卷积核的表现形式如5E所示时,卷积核的数量为4个,假设将上述4个卷积核分成2组,此时,每组中的卷积核的数量为2个。当分组模式为连续分组时,针对上述4个卷积核的连续分布情况,可以参见图5F,如图5F所示,第一组权值为权值矩阵中的第1个卷积核以及第2个卷积核;第二组权值为权值矩阵中的第3个卷积核以及第4个卷积核。As mentioned above, when the convolution kernel in the weight matrix is expressed as shown in 5E, the number of convolution kernels is 4. Assuming that the above 4 convolution kernels are divided into 2 groups, the number of convolution kernels in each group is 2. When the grouping mode is continuous grouping, the continuous distribution of the above 4 convolution kernels can be seen in Figure 5F. As shown in Figure 5F, the first group of weights is the first convolution kernel and the second convolution kernel in the weight matrix; the second group of weights is the third convolution kernel and the fourth convolution kernel in the weight matrix.

同样地,在实际应用中,假设将上述4个卷积核分成2组,此时,每组中的卷积核数量为2个。当分出模式为交叉分组时,针对上述4个卷积核的交叉分组情况,可以参见图5G,如图5G所示,第一组权值为权值矩阵中的第1个卷积核以及第3个卷积核;第二组权值为权值矩阵中的第2个卷积核以及第4个卷积核。Similarly, in practical applications, it is assumed that the above 4 convolution kernels are divided into 2 groups. At this time, the number of convolution kernels in each group is 2. When the separation mode is cross grouping, the cross grouping of the above 4 convolution kernels can be seen in FIG5G . As shown in FIG5G , the first group of weights is the first convolution kernel and the third convolution kernel in the weight matrix; the second group of weights is the second convolution kernel and the fourth convolution kernel in the weight matrix.

在对第一权值数据进行分组之后,根据预设的稀疏度P确定M组权值中每组权值的阈值,包括:After the first weight data is grouped, a threshold value of each weight group in the M weight groups is determined according to a preset sparsity P, including:

确定所述M组权值中的第i组权值中第R个权值为每组权值的阈值,其中,Nfin为输入特征图像的数量,Nfout为输出特征图像的数量,Kx以及Ky为卷积层神经网络中卷积核的大小,所述第i组中的权值按绝对值从小到大顺序排列,所述i为小于等于M的正整数。Determine the Rth weight in the i-th group of weights in the M groups of weights as the threshold value for each group of weights, wherein, Nfin is the number of input feature images, Nfout is the number of output feature images, Kx andKy are the sizes of convolution kernels in the convolutional layer neural network, the weights in the i-th group are arranged in ascending order of absolute value, and i is a positive integer less than or equal to M.

作为一种可选的实现方式,R为经过取整运算得到的。同样地,取整运算可以包括向上取整运算或向下取整运算。As an optional implementation, R is obtained by a rounding operation. Similarly, the rounding operation may include an upward rounding operation or a downward rounding operation.

例如,以图5G所示的卷积层,采用交叉分组的方式,将该权值矩阵分成2组,第一组权值为权值矩阵中的第1个卷积核以及第3个卷积核;第二组权值为权值矩阵中的第2个卷积核以及第4个卷积核。根据预设的稀疏度确定第一组权值中的第6个权值为每组权值的阈值。其中,第一组权值中的第6个权值为0.7,也即第一组权值的阈值为0.7;第二组权值中的第6个权值为0.45,也即第二组权值的阈值为0.45。For example, with the convolution layer shown in FIG5G, the weight matrix is divided into two groups by cross grouping. The first group of weights is the first convolution kernel and the third convolution kernel in the weight matrix; the second group of weights is the second convolution kernel and the fourth convolution kernel in the weight matrix. The sixth weight in the first group of weights is determined as the threshold of each group of weights according to the preset sparsity. Among them, the sixth weight in the first group of weights is 0.7, that is, the threshold of the first group of weights is 0.7; the sixth weight in the second group of weights is 0.45, that is, the threshold of the second group of weights is 0.45.

在根据预设的稀疏度P确定每组权值的阈值之后,根据确定好的阈值对每组权值进行剪枝。具体实现中,根据确定好的阈值对M组权值中的每组权值进行剪枝,得到第二权值数据,包括:After determining the threshold of each group of weights according to the preset sparsity P, pruning each group of weights according to the determined threshold. In a specific implementation, pruning each group of weights in the M groups of weights according to the determined threshold to obtain second weight data includes:

对第i组(i=0,1,……,M)权值中小于所述确定好的阈值的权值进行剪枝,得到所述第二权值数据。The weights in the i-th group (i=0, 1, ..., M) of weights that are smaller than the determined threshold are pruned to obtain the second weight data.

以图5G所示的卷积层为例进行阐述,当分组模式为交叉分组时,将权值矩阵分成2组,其中,第一组权值为权值矩阵中的第1个卷积核以及第3个卷积核;第二组权值为权值矩阵中的第2个卷积核以及第4个卷积核。根据预设的稀疏度P确定的第一组权值中的阈值为0.7,第二组权值中的阈值为0.45,继而,根据上述确定好的阈值对每组权值进行剪枝。具体来说,在第一组权值中,将小于0.7的权值去除。在第2组权值中,将小于0.45的权值去除。在剪枝之后,可以确保这2组权值的稀疏度相同。Taking the convolution layer shown in Figure 5G as an example, when the grouping mode is cross grouping, the weight matrix is divided into 2 groups, where the first group of weights is the 1st convolution kernel and the 3rd convolution kernel in the weight matrix; the second group of weights is the 2nd convolution kernel and the 4th convolution kernel in the weight matrix. The threshold in the first group of weights determined according to the preset sparsity P is 0.7, and the threshold in the second group of weights is 0.45. Then, each group of weights is pruned according to the above-determined thresholds. Specifically, in the first group of weights, weights less than 0.7 are removed. In the second group of weights, weights less than 0.45 are removed. After pruning, it can be ensured that the sparsity of the two groups of weights is the same.

(3)LSTM层神经网络:(3) LSTM layer neural network:

具体实现中,LSTM层的权值由多个全连接层权值组成。假设LSTM层的权值由t个全连接层权值组成,t为大于0的正整数。例如,第j个全连接层权值分别为(Nin_j,Nout_j),其中,Nin_j表示第j个全连接层输入神经元个数,Nout_j表示第j个全连接层输出神经元个数,第j个全连接层的权值数量为Nin_j*Nout_jIn the specific implementation, the weights of the LSTM layer are composed of multiple fully connected layer weights. Assume that the weights of the LSTM layer are composed of t fully connected layer weights, and t is a positive integer greater than 0. For example, the weights of the jth fully connected layer are (Nin_j ,Nout_j ), where Nin_j represents the number of input neurons of the jth fully connected layer, Nout_j represents the number of output neurons of the jth fully connected layer, and the number of weights of the jth fully connected layer is Nin_j *Nout_j .

在实际应用中,对上述t个全连接层中每一个全连接层进行分组。以第j个全连接层为例,将第j个全连接层的权值分成M组,那么,第j个全连接层中每一组权值数量为:其中,M为大于1且小于Nout_j的正整数。In practical applications, each of the t fully connected layers mentioned above is grouped. Taking the jth fully connected layer as an example, the weights of the jth fully connected layer are divided into M groups, then the number of weights in each group of the jth fully connected layer is: Wherein, M is a positive integer greater than 1 and less than Nout_j .

在其中一个实施方式中,对第j个全连接层中的权值矩阵进行连续分组时,权值矩阵的连续行为同一组。第i组权值包括权值矩阵的第行,第行,第行,……,第行。其中,i为大于0且小于M的正整数,Nout_j为第j个全连接层的输出神经元的个数。In one embodiment, when the weight matrices in the j-th fully connected layer are grouped continuously, the continuous weight matrices The i-th group of weights includes the i-th Row, No. Row, No. , , , Where i is a positive integer greater than 0 and less than M, and Nout_j is the number of output neurons in the jth fully connected layer.

在其中另一个实施方式中,对第j个全连接层中的权值矩阵进行连续分组时,第j个全连接层的权值矩阵中交替的行为同一组。第i组权值包括权值矩阵的第i行,第i+M行,第i+M*2行,……,第行。其中,i为大于0且小于M的正整数,Nout_j为第j个全连接层的输出神经元的个数。In another embodiment, when the weight matrix in the j-th fully connected layer is continuously grouped, the alternating The i-th group of weights includes the i-th row, i+M-th row, i+M*2-th row, ..., i ... Where i is a positive integer greater than 0 and less than M, and Nout_j is the number of output neurons in the jth fully connected layer.

在对第一权值数据进行分组之后,根据预设的稀疏度P确定每组权值中的阈值,包括:After the first weight data is grouped, a threshold in each group of weights is determined according to a preset sparsity P, including:

确定第j个全连接层的第i组权值中第S个权值为每组权值的阈值,其中,为第j个全连接层输入神经元的个数,Nout_j为第j个全连接层输出神经元的个数,所述第i组中的权值按绝对值从小到大顺序排列,所述i为小于等于M的正整数,所述j为小于等于N的正整数。作为一种可选的实现方式S为经过取整运算得到的。同样地,取整运算可以包括向上取整运算或向下取整运算。Determine the Sth weight in the i-th group of weights of the j-th fully connected layer as the threshold for each group of weights, where is the number of input neurons of the jth fully connected layer, Nout_j is the number of output neurons of the jth fully connected layer, the weights in the i-th group are arranged in ascending order according to the absolute value, i is a positive integer less than or equal to M, and j is a positive integer less than or equal to N. As an optional implementation, S is obtained by rounding operation. Similarly, the rounding operation may include rounding up operation or rounding down operation.

在根据预设的稀疏度P确定每组权值的阈值之后,根据确定好的阈值对每组权值进行剪枝。具体实现中,根据确定好的阈值对M组权值中的每组权值进行剪枝,得到第二权值数据,包括:After determining the threshold of each group of weights according to the preset sparsity P, pruning each group of weights according to the determined threshold. In a specific implementation, pruning each group of weights in the M groups of weights according to the determined threshold to obtain second weight data includes:

对第i组(i=0,1,……,M)权值中小于所述确定好的阈值的权值进行剪枝,得到所述第二权值数据。The weights in the i-th group (i=0, 1, ..., M) of weights that are smaller than the determined threshold are pruned to obtain the second weight data.

在实际应用中,当本申请中所描述的平衡剪枝方法应用到LSTM层神经网络时,根据确定好的阈值对M组权值中的每组权值进行剪枝,得到第二权值数据的这一方案的具体实现可以参考前述针对全连接层剪枝的文字描述,此处不多加赘述。In practical applications, when the balanced pruning method described in this application is applied to the LSTM layer neural network, each group of weights in the M groups of weights is pruned according to a determined threshold value to obtain the second weight data. The specific implementation of this solution can refer to the aforementioned text description of pruning the fully connected layer, and no further details will be given here.

本发明实施例通过对第一权值数据进行分组,计算每组权值中的阈值,并根据确定好的阈值对每组权值进行剪枝,可以确保每组权值的稀疏度相同,解决了因稀疏性问题带来的每个神经元的运算量不同而出现的负载不均衡的问题,提高了运算速度。The embodiment of the present invention groups the first weight data, calculates the threshold in each group of weights, and prunes each group of weights according to the determined threshold, thereby ensuring that the sparsity of each group of weights is the same, solving the problem of load imbalance caused by the different amount of calculation of each neuron due to the sparsity problem, and improving the calculation speed.

为了便于更好地实施本发明实施例的上述方案,本发明还对应提供了一种神经网络剪枝装置,下面结合附图来进行详细说明:In order to better implement the above-mentioned solution of the embodiment of the present invention, the present invention also provides a neural network pruning device, which is described in detail below with reference to the accompanying drawings:

如图17A所示的本发明实施例提供的神经网络剪枝装置的结构示意图,该神经网络剪枝装置包括:获取单元300、负载均衡单元13以及计算单元304;As shown in FIG. 17A , a schematic diagram of the structure of a neural network pruning device provided by an embodiment of the present invention, the neural network pruning device includes: an acquisition unit 300, a load balancing unit 13, and a calculation unit 304;

其中,所述获取单元300,用于获取第一输入数据;其中,所述第一输入数据包括第一权值数据;Wherein, the acquisition unit 300 is used to acquire first input data; wherein, the first input data includes first weight data;

所述负载均衡单元13,用于将所述第一权值数据调整为第二权值数据;The load balancing unit 13 is used to adjust the first weight data to second weight data;

所述计算单元304,用于根据第二输入数据执行神经网络计算,其中,所述第二输入数据包括所述第二权值数据以及输入神经元数据。The calculation unit 304 is used to perform neural network calculation according to the second input data, wherein the second input data includes the second weight data and input neuron data.

在其中一个实施方式中,如图17B所示,负载均衡单元13包括分组单元130、计算阈值单元131以及剪枝单元132;In one embodiment, as shown in FIG17B , the load balancing unit 13 includes a grouping unit 130 , a threshold calculation unit 131 , and a pruning unit 132 ;

其中,所述分组单元130,用于对所述第一权值数据进行分组,得到M组权值;其中,M为大于0的正整数;The grouping unit 130 is used to group the first weight data to obtain M groups of weights; wherein M is a positive integer greater than 0;

所述计算阈值单元131,用于根据预设的稀疏度P确定所述M组权值中至少一组权值的阈值;The threshold value calculating unit 131 is used to determine the threshold value of at least one group of weights in the M groups of weights according to a preset sparsity P;

所述剪枝单元132,用于根据确定好的所述阈值对所述M组权值中的至少一组权值进行剪枝,得到第二权值数据。The pruning unit 132 is used to prune at least one group of weights in the M groups of weights according to the determined threshold value to obtain second weight data.

在其中另一个实施方式中,负载均衡单元13包括分组单元130、计算阈值单元131以及剪枝单元132;In another embodiment, the load balancing unit 13 includes a grouping unit 130, a threshold calculation unit 131, and a pruning unit 132;

其中,所述分组单元130,用于对所述第一权值数据进行分组,得到M组权值;其中,M为大于0的正整数;The grouping unit 130 is used to group the first weight data to obtain M groups of weights; wherein M is a positive integer greater than 0;

所述计算阈值单元131,用于根据预设的稀疏度P确定所述M组权值中每组权值的阈值;The threshold value calculating unit 131 is used to determine the threshold value of each group of weights in the M groups of weights according to a preset sparsity P;

所述剪枝单元132,用于根据确定好的所述阈值对所述M组权值中的每组权值进行剪枝,得到第二权值数据。The pruning unit 132 is used to prune each group of weights in the M groups of weights according to the determined threshold value to obtain second weight data.

可选的,所述神经网络为全连接层神经网络;所述计算阈值单元131具体用于:Optionally, the neural network is a fully connected layer neural network; the threshold calculation unit 131 is specifically used for:

确定所述M组权值中的第i组权值中第Q个权值为每组权值的阈值,其中,Nin为输入神经元的个数,Nout为输出神经元的个数,所述第i组中的权值按绝对值从小到大顺序排列,所述i为小于等于M的正整数。Determine the Qth weight in the i-th group of weights in the M groups of weights as the threshold value for each group of weights, wherein, Nin is the number of input neurons, Nout is the number of output neurons, the weights in the i-th group are arranged in ascending order of absolute value, and i is a positive integer less than or equal to M.

可选的,所述Q为经过取整计算得到的。Optionally, the Q is obtained by rounding off.

可选的,所述神经网络为卷积层神经网络;所述计算阈值单元131还具体用于:Optionally, the neural network is a convolutional layer neural network; the threshold calculation unit 131 is further specifically used for:

确定所述M组权值中的第i组权值中第R个权值为每组权值的阈值,其中,Nfin为输入特征图像的数量,Nfout为输出特征图像的数量,Kx以及Ky为卷积层神经网络中卷积核的大小,所述第i组中的权值按绝对值从小到大顺序排列,所述i为小于等于M的正整数。Determine the Rth weight in the i-th group of weights in the M groups of weights as the threshold value for each group of weights, wherein, Nfin is the number of input feature images, Nfout is the number of output feature images, Kx andKy are the sizes of convolution kernels in the convolutional layer neural network, the weights in the i-th group are arranged in ascending order of absolute value, and i is a positive integer less than or equal to M.

可选的,所述神经网络为LSTM层神经网络;所述LSTM层神经网络包括N个全连接层,所述N为大于0的正整数;所述计算阈值单元131还具体用于:Optionally, the neural network is an LSTM layer neural network; the LSTM layer neural network includes N fully connected layers, where N is a positive integer greater than 0; the threshold calculation unit 131 is further specifically used for:

确定第j个全连接层的第i组权值中第S个权值为每组权值的阈值,其中,Nin_j为第j个全连接层输入神经元的个数,Nout_j为第j个全连接层输出神经元的个数,所述第i组中的权值按绝对值从小到大顺序排列,所述i为小于等于M的正整数,所述j为小于等于N的正整数。Determine the Sth weight in the i-th group of weights of the j-th fully connected layer as the threshold for each group of weights, where Nin_j is the number of input neurons of the j-th fully connected layer, Nout_j is the number of output neurons of the j-th fully connected layer, the weights in the i-th group are arranged in ascending order of absolute value, i is a positive integer less than or equal to M, and j is a positive integer less than or equal to N.

可选的,所述剪枝单元132具体用于:Optionally, the pruning unit 132 is specifically configured to:

对第i组(i=0,1,……,M)权值中小于所述确定好的阈值的权值进行剪枝,得到所述第二权值数据。The weights in the i-th group (i=0, 1, ..., M) of weights that are smaller than the determined threshold are pruned to obtain the second weight data.

本发明实施例通过对第一权值数据进行分组,计算每组权值中的阈值,并根据确定好的阈值对每组权值进行剪枝,可以确保每组权值的稀疏度相同,解决了因稀疏性问题带来的每个神经元的运算量不同而出现的负载不均衡的问题,提高了运算速度。The embodiment of the present invention groups the first weight data, calculates the threshold in each group of weights, and prunes each group of weights according to the determined threshold, thereby ensuring that the sparsity of each group of weights is the same, solving the problem of load imbalance caused by the different amount of calculation of each neuron due to the sparsity problem, and improving the calculation speed.

为了便于更好地实施本发明实施例的上述方案,本发明还对应提供了另一种电子设备,下面结合附图来进行详细说明:In order to better implement the above solution of the embodiment of the present invention, the present invention also provides another electronic device, which is described in detail below with reference to the accompanying drawings:

如图18示出的本发明实施例提供的电子设备的结构示意图,电子设备40可以包括处理器401、存储器404和通信模块405,处理器401、存储器404和通信模块405可以通过总线406相互连接。存储器404可以是高速随机存储记忆体(Random Access Memory,RAM)存储器,也可以是非易失性的存储器(non-volatile memory),例如至少一个磁盘存储器。存储器404可选的还可以是至少一个位于远离前述处理器401的存储系统。存储器404用于存储应用程序代码,可以包括操作系统、网络通信模块、用户接口模块以及数据处理程序,通信模块405用于与外部设备进行信息交互;处理器401被配置用于调用该程序代码,执行以下步骤:As shown in FIG18 , a schematic diagram of the structure of an electronic device provided by an embodiment of the present invention, the electronic device 40 may include a processor 401, a memory 404 and a communication module 405, and the processor 401, the memory 404 and the communication module 405 may be interconnected via a bus 406. The memory 404 may be a high-speed random access memory (RAM) memory, or a non-volatile memory, such as at least one disk memory. The memory 404 may optionally be at least one storage system located away from the aforementioned processor 401. The memory 404 is used to store application code, and may include an operating system, a network communication module, a user interface module and a data processing program. The communication module 405 is used to interact with external devices for information; the processor 401 is configured to call the program code and execute the following steps:

获取第一输入数据;其中,所述第一输入数据包括第一权值数据;Acquire first input data; wherein the first input data includes first weight data;

将所述第一权值数据调整为第二权值数据;adjusting the first weight data to second weight data;

根据第二输入数据执行神经网络计算,其中,所述第二输入数据包括所述第二权值数据以及输入神经元数据。A neural network calculation is performed according to second input data, wherein the second input data includes the second weight data and input neuron data.

其中,处理器401将所述第一权值数据调整为第二权值数据,可以包括:The processor 401 adjusting the first weight data to the second weight data may include:

对所述第一权值数据进行分组,得到M组权值;其中,M为大于0的正整数;Grouping the first weight data to obtain M groups of weights; wherein M is a positive integer greater than 0;

根据预设的稀疏度P确定所述M组权值中至少一组权值的阈值;Determining a threshold of at least one set of weights in the M sets of weights according to a preset sparsity P;

根据确定好的所述阈值对所述M组权值中的至少一组权值进行剪枝,得到第二权值数据。At least one group of weights in the M groups of weights is pruned according to the determined threshold value to obtain second weight data.

其中,处理器401将所述第一权值数据调整为第二权值数据,可以包括:The processor 401 adjusting the first weight data to the second weight data may include:

对所述第一权值数据进行分组,得到M组权值;其中,M为大于0的正整数;Grouping the first weight data to obtain M groups of weights; wherein M is a positive integer greater than 0;

根据预设的稀疏度P确定所述M组权值中每组权值的阈值;Determine a threshold value of each group of weights in the M groups of weights according to a preset sparsity P;

根据确定好的所述阈值对所述M组权值中的每组权值进行剪枝,得到第二权值数据。Each group of weights in the M groups of weights is pruned according to the determined threshold value to obtain second weight data.

其中,所述神经网络为全连接层神经网络;Wherein, the neural network is a fully connected layer neural network;

处理器401根据预设的稀疏度P确定所述M组权值中每组权值的阈值,可以包括:The processor 401 determines the threshold of each group of weights in the M groups of weights according to the preset sparsity P, which may include:

确定所述M组权值中的第i组权值中第Q个权值为每组权值的阈值,其中,Nin为输入神经元的个数,Nout为输出神经元的个数,所述第i组中的权值按绝对值从小到大顺序排列,所述i为小于等于M的正整数。Determine the Qth weight in the i-th group of weights in the M groups of weights as the threshold value for each group of weights, wherein, Nin is the number of input neurons, Nout is the number of output neurons, the weights in the i-th group are arranged in ascending order of absolute value, and i is a positive integer less than or equal to M.

其中,所述神经网络为卷积层神经网络;Wherein, the neural network is a convolutional layer neural network;

处理器401根据预设的稀疏度P确定所述M组权值中每组权值的阈值,可以包括:The processor 401 determines the threshold of each group of weights in the M groups of weights according to the preset sparsity P, which may include:

确定所述M组权值中的第i组权值中第R个权值为每组权值的阈值,其中,Nfin为输入特征图像的数量,Nfout为输出特征图像的数量,Kx以及Ky为卷积层神经网络中卷积核的大小,所述第i组中的权值按绝对值从小到大顺序排列,所述i为小于等于M的正整数。Determine the Rth weight in the i-th group of weights in the M groups of weights as the threshold value for each group of weights, wherein, Nfin is the number of input feature images, Nfout is the number of output feature images, Kx andKy are the sizes of convolution kernels in the convolutional layer neural network, the weights in the i-th group are arranged in ascending order of absolute value, and i is a positive integer less than or equal to M.

其中,所述神经网络为LSTM层神经网络;所述LSTM层神经网络包括N个全连接层,所述N为大于0的正整数;Wherein, the neural network is an LSTM layer neural network; the LSTM layer neural network includes N fully connected layers, and N is a positive integer greater than 0;

处理器401根据预设的稀疏度P确定所述M组权值中每组权值的阈值,可以包括:The processor 401 determines the threshold of each group of weights in the M groups of weights according to the preset sparsity P, which may include:

确定第j个全连接层的第i组权值中第S个权值为每组权值的阈值,其中,Nin_j为第j个全连接层输入神经元的个数,Nout_j为第j个全连接层输出神经元的个数,所述第i组中的权值按绝对值从小到大顺序排列,所述i为小于等于M的正整数,所述j为小于等于N的正整数。Determine the Sth weight in the i-th group of weights of the j-th fully connected layer as the threshold for each group of weights, where Nin_j is the number of input neurons of the j-th fully connected layer, Nout_j is the number of output neurons of the j-th fully connected layer, the weights in the i-th group are arranged in ascending order of absolute value, i is a positive integer less than or equal to M, and j is a positive integer less than or equal to N.

其中,处理器401根据确定好的所述阈值对所述M组权值中的每组权值进行剪枝,得到第二权值数据,包括:The processor 401 prunes each group of weights in the M groups of weights according to the determined threshold value to obtain second weight data, including:

对第i组(i=0,1,……,M)权值中小于所述确定好的阈值的权值进行剪枝,得到所述第二权值数据。The weights in the i-th group (i=0, 1, ..., M) of weights that are smaller than the determined threshold are pruned to obtain the second weight data.

需要说明的是,本发明实施例中的电子设备40中处理器的执行步骤可参考上述各方法实施例中图16实施例中的电子设备运行的具体实现方式,这里不再赘述。It should be noted that the execution steps of the processor in the electronic device 40 in the embodiment of the present invention can refer to the specific implementation methods of the operation of the electronic device in the embodiment of Figure 16 in the above-mentioned method embodiments, which will not be repeated here.

在实际应用中,电子设备40中的处理器401包括但不限于只有一个。在其中一个实施方式中,电子设备40中还包括处理图像的图形处理器GPU(GPU,Graphic ProcessingUni),也还可以包括嵌入式神经网络处理器(NPU,Neural-network Process Units)。此时,针对神经网络的剪枝方法可以被集成在NPU中。在其中一个实施方式中,处理器401可以控制NPU执行针对第一权值数据的剪枝方法。In practical applications, the processor 401 in the electronic device 40 includes but is not limited to only one. In one embodiment, the electronic device 40 also includes a graphics processor GPU (GPU, Graphic ProcessingUni) for processing images, and may also include an embedded neural network processor (NPU, Neural-network Process Units). In this case, the pruning method for the neural network can be integrated in the NPU. In one embodiment, the processor 401 can control the NPU to execute the pruning method for the first weight data.

在具体实现中,如前所述,电子设备40可以包括数据处理装置、机器人、电脑、打印机、扫描仪、平板电脑、智能终端、手机、行车记录仪、导航仪、传感器、摄像头、服务器、云端服务器、相机、摄像机、投影仪、手表、耳机、移动存储、可穿戴设备、交通工具、家用电器、和/或医疗设备,本发明实施例不作具体限定。In a specific implementation, as mentioned above, the electronic device 40 may include a data processing device, a robot, a computer, a printer, a scanner, a tablet computer, a smart terminal, a mobile phone, a driving recorder, a navigator, a sensor, a camera, a server, a cloud server, a camera, a camcorder, a projector, a watch, a headset, a mobile storage, a wearable device, a vehicle, a household appliance, and/or a medical device, which is not specifically limited in the embodiments of the present invention.

本发明实施例还提供了一种计算机存储介质,用于存储为上述图16所示的电子设备所用的计算机软件指令,其包含用于执行上述方法实施例所涉及的程序。通过执行存储的程序,可以实现针对第一权值数据的平衡剪枝,从而解决因稀疏性问题带来的每个神经元的运算量不同而出现的负载不均衡的问题。The embodiment of the present invention further provides a computer storage medium for storing computer software instructions used by the electronic device shown in FIG. 16, which includes a program for executing the method embodiment. By executing the stored program, balanced pruning of the first weight data can be achieved, thereby solving the problem of load imbalance caused by different computational loads of each neuron due to sparsity problems.

需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请并不受所描述的动作顺序的限制,因为依据本申请,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于可选实施例,所涉及的动作和模块并不一定是本申请所必须的。It should be noted that, for the aforementioned method embodiments, for the sake of simplicity, they are all expressed as a series of action combinations, but those skilled in the art should be aware that the present application is not limited by the described order of actions, because according to the present application, certain steps can be performed in other orders or simultaneously. Secondly, those skilled in the art should also be aware that the embodiments described in the specification are all optional embodiments, and the actions and modules involved are not necessarily required by the present application.

在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。In the above embodiments, the description of each embodiment has its own emphasis. For parts that are not described in detail in a certain embodiment, reference can be made to the relevant descriptions of other embodiments.

在本申请所提供的几个实施例中,应该理解到,所揭露的装置,可通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性或其它的形式。In the several embodiments provided in the present application, it should be understood that the disclosed device can be implemented in other ways. For example, the device embodiments described above are only schematic, such as the division of the units, which is only a logical function division. There may be other division methods in actual implementation, such as multiple units or components can be combined or integrated into another system, or some features can be ignored or not executed. Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be through some interfaces, and the indirect coupling or communication connection of the device or unit can be electrical or other forms.

所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place or distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件程序模块的形式实现。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above-mentioned integrated unit may be implemented in the form of hardware or in the form of a software program module.

所述集成的单元如果以软件程序模块的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储器中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储器中,包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储器包括:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit is implemented in the form of a software program module and sold or used as an independent product, it can be stored in a computer-readable memory. Based on this understanding, the technical solution of the present application is essentially or the part that contributes to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a memory, including a number of instructions to enable a computer device (which can be a personal computer, server or network device, etc.) to perform all or part of the steps of the method described in each embodiment of the present application. The aforementioned memory includes: U disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, disk or optical disk and other media that can store program codes.

本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序可以存储于一计算机可读存储器中,存储器可以包括:闪存盘、只读存储器(英文:Read-Only Memory,简称:ROM)、随机存取器(英文:Random Access Memory,简称:RAM)、磁盘或光盘等。A person skilled in the art can understand that all or part of the steps in the various methods of the above embodiments can be completed by instructing related hardware through a program, and the program can be stored in a computer-readable memory, and the memory can include: a flash drive, a read-only memory (English: Read-Only Memory, abbreviated as: ROM), a random access memory (English: Random Access Memory, abbreviated as: RAM), a magnetic disk or an optical disk, etc.

以上对本申请实施例进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。The embodiments of the present application are introduced in detail above. Specific examples are used in this article to illustrate the principles and implementation methods of the present application. The description of the above embodiments is only used to help understand the method of the present application and its core idea. At the same time, for general technical personnel in this field, according to the idea of the present application, there will be changes in the specific implementation method and application scope. In summary, the content of this specification should not be understood as a limitation on the present application.

Claims (22)

CN201811507488.1A2018-12-102018-12-10Computing device and related productActiveCN111291871B (en)

Priority Applications (2)

Application NumberPriority DateFiling DateTitle
CN201811507488.1ACN111291871B (en)2018-12-102018-12-10Computing device and related product
CN201811538782.9ACN111291884B (en)2018-12-102018-12-10Neural network pruning method, device, electronic equipment and computer readable medium

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201811507488.1ACN111291871B (en)2018-12-102018-12-10Computing device and related product

Related Child Applications (1)

Application NumberTitlePriority DateFiling Date
CN201811538782.9ADivisionCN111291884B (en)2018-12-102018-12-10Neural network pruning method, device, electronic equipment and computer readable medium

Publications (2)

Publication NumberPublication Date
CN111291871A CN111291871A (en)2020-06-16
CN111291871Btrue CN111291871B (en)2024-08-23

Family

ID=71026468

Family Applications (2)

Application NumberTitlePriority DateFiling Date
CN201811538782.9AActiveCN111291884B (en)2018-12-102018-12-10Neural network pruning method, device, electronic equipment and computer readable medium
CN201811507488.1AActiveCN111291871B (en)2018-12-102018-12-10Computing device and related product

Family Applications Before (1)

Application NumberTitlePriority DateFiling Date
CN201811538782.9AActiveCN111291884B (en)2018-12-102018-12-10Neural network pruning method, device, electronic equipment and computer readable medium

Country Status (1)

CountryLink
CN (2)CN111291884B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
WO2022264252A1 (en)*2021-06-152022-12-22日本電気株式会社Neural network model conversion device and method
CN114444657B (en)*2021-12-302025-08-26浪潮电子信息产业股份有限公司 Image processing method, system, device and readable storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN105184362A (en)*2015-08-212015-12-23中国科学院自动化研究所Depth convolution neural network acceleration and compression method based on parameter quantification
CN106066783A (en)*2016-06-022016-11-02华为技术有限公司The neutral net forward direction arithmetic hardware structure quantified based on power weight

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US7023979B1 (en)*2002-03-072006-04-04Wai WuTelephony control system with intelligent call routing
CN107689224B (en)*2016-08-222020-09-01北京深鉴智能科技有限公司Deep neural network compression method for reasonably using mask
CN107239824A (en)*2016-12-052017-10-10北京深鉴智能科技有限公司Apparatus and method for realizing sparse convolution neutral net accelerator
WO2018112892A1 (en)*2016-12-232018-06-28北京中科寒武纪科技有限公司Device and method for supporting fast artificial neural network operation
CN111291878A (en)*2016-12-282020-06-16上海寒武纪信息科技有限公司Processor for artificial neural network computation
CN108416427A (en)*2018-02-222018-08-17重庆信络威科技有限公司Convolution kernel accumulates data flow, compressed encoding and deep learning algorithm
CN108932548A (en)*2018-05-222018-12-04中国科学技术大学苏州研究院A kind of degree of rarefication neural network acceleration system based on FPGA

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN105184362A (en)*2015-08-212015-12-23中国科学院自动化研究所Depth convolution neural network acceleration and compression method based on parameter quantification
CN106066783A (en)*2016-06-022016-11-02华为技术有限公司The neutral net forward direction arithmetic hardware structure quantified based on power weight

Also Published As

Publication numberPublication date
CN111291884A (en)2020-06-16
CN111291871A (en)2020-06-16
CN111291884B (en)2024-08-20

Similar Documents

PublicationPublication DateTitle
CN111353591B (en)Computing device and related product
CN110383300B (en) A computing device and method
CN107895191B (en)Information processing method and related product
CN109543832B (en)Computing device and board card
CN109522052B (en)Computing device and board card
CN111047022B (en)Computing device and related product
CN111488976B (en)Neural network computing device, neural network computing method and related products
CN111126590B (en)Device and method for artificial neural network operation
CN111488963B (en)Neural network computing device and method
CN110059797B (en)Computing device and related product
CN113469365A (en)Inference and compilation method based on neural network model and related products thereof
CN111291871B (en)Computing device and related product
CN111047021B (en)Computing device and related product
CN111382848B (en)Computing device and related product
CN110059809B (en)Computing device and related product
CN111368986B (en)Neural network computing device and method
CN111368967B (en)Neural network computing device and method
CN111047024B (en) Computing device and related products
CN110472734B (en) A computing device and related products
CN111368990B (en)Neural network computing device and method
CN111368987B (en)Neural network computing device and method
CN111367567B (en)Neural network computing device and method
CN111368985B (en) A neural network computing device and method
CN118278472A (en)Quantization processing method and related device

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant
TG01Patent term adjustment
TG01Patent term adjustment

[8]ページ先頭

©2009-2025 Movatter.jp