Movatterモバイル変換


[0]ホーム

URL:


CN111831356A - Weight accuracy configuration method, device, device and storage medium - Google Patents

Weight accuracy configuration method, device, device and storage medium
Download PDF

Info

Publication number
CN111831356A
CN111831356ACN202010659069.0ACN202010659069ACN111831356ACN 111831356 ACN111831356 ACN 111831356ACN 202010659069 ACN202010659069 ACN 202010659069ACN 111831356 ACN111831356 ACN 111831356A
Authority
CN
China
Prior art keywords
layer
weight
neural network
precision
target layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010659069.0A
Other languages
Chinese (zh)
Other versions
CN111831356B (en
Inventor
祝夭龙
何伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Lynxi Technology Co Ltd
Original Assignee
Beijing Lynxi Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Lynxi Technology Co LtdfiledCriticalBeijing Lynxi Technology Co Ltd
Priority to CN202010659069.0ApriorityCriticalpatent/CN111831356B/en
Publication of CN111831356ApublicationCriticalpatent/CN111831356A/en
Priority to PCT/CN2021/105172prioritypatent/WO2022007879A1/en
Priority to US18/015,065prioritypatent/US11797850B2/en
Application grantedgrantedCritical
Publication of CN111831356BpublicationCriticalpatent/CN111831356B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Landscapes

Abstract

The embodiment of the invention discloses a weight precision configuration method, a weight precision configuration device, weight precision configuration equipment and a storage medium. The method comprises the following steps: determining a current target layer, wherein all layers are sequenced according to the influence degree on the recognition rate, the layer with the high influence degree is preferentially determined as the target layer, the weight precision corresponding to the current target layer is reduced to the preset lowest precision, then the weight precision corresponding to the current target layer is increased, if the current recognition rate of the neural network is greater than the preset threshold value, the weight precision corresponding to the current target layer is locked as the weight precision before the increase, and the current target layer is determined again under the condition that the switching condition of the target layer is met. By adopting the technical scheme, the upper limit of the weight precision of each layer can be reasonably controlled under the condition of considering the recognition rate of the neural network, the resource utilization rate of the artificial intelligent chip bearing the neural network is improved, the performance of the chip is improved, and the power consumption of the chip is reduced.

Description

Translated fromChinese
权重精度配置方法、装置、设备及存储介质Weight accuracy configuration method, device, device and storage medium

技术领域technical field

本发明实施例涉及人工智能技术领域,尤其涉及权重精度配置方法、装置、设备及存储介质。The embodiments of the present invention relate to the technical field of artificial intelligence, and in particular, to a weight precision configuration method, apparatus, device, and storage medium.

背景技术Background technique

随着大数据信息网络和智能移动设备的蓬勃发展,产生了海量非结构化信息,伴生了对这些信息高效能处理需求的急剧增长。近年来,深度学习技术发展迅速,在图像识别、语音识别以及自然语言处理等诸多领域取得了很高的准确率。但如今的深度学习研究绝大多数仍是基于传统的冯诺依曼计算机实现的,而冯诺依曼计算机由于处理器和存储器分离,在处理大型复杂问题时不仅能耗高、效率低,而且面向数值计算的特性使其在处理非形式化问题时软件编程复杂度高,甚至无法实现。With the vigorous development of big data information networks and smart mobile devices, massive amounts of unstructured information have been generated, accompanied by a sharp increase in the demand for high-efficiency processing of this information. In recent years, deep learning technology has developed rapidly, and has achieved high accuracy in many fields such as image recognition, speech recognition, and natural language processing. However, most of today's deep learning research is still based on traditional von Neumann computers. Due to the separation of processors and memory, von Neumann computers not only have high energy consumption and low efficiency when dealing with large and complex problems, but also The characteristics of numerical calculation make software programming complex or even impossible to deal with informal problems.

随着脑科学的发展,由于大脑相比传统的冯诺依曼计算机具有超低功耗以及高容错性等特点,且在处理非结构化信息和智能任务方面具有显著优势,借鉴大脑的计算模式构建新型的人工智能系统和人工智能芯片已经成为一个新兴的发展方向,因此,借鉴人脑发展的人工智能技术应运而生。人工智能技术中的神经网络由大量神经元构成,神经网络中通过信息的分布式存储和并行协同处理,定义基本的学习规则即可模拟出大脑的自适应学习过程,无需明确的编程,在处理一些非形式化问题时具有优势。人工智能技术可以使用大规模集成模拟、数字或数模混合的电路及软件系统,即基于神经形态器件来实现。With the development of brain science, compared with traditional Von Neumann computers, the brain has the characteristics of ultra-low power consumption and high fault tolerance, and has significant advantages in processing unstructured information and intelligent tasks. The construction of new artificial intelligence systems and artificial intelligence chips has become an emerging development direction. Therefore, artificial intelligence technology that draws on the development of the human brain emerges as the times require. The neural network in artificial intelligence technology is composed of a large number of neurons. Through the distributed storage of information and parallel collaborative processing, the neural network can simulate the adaptive learning process of the brain by defining basic learning rules without explicit programming. Some informal problems have advantages. Artificial intelligence technology can be implemented using large-scale integrated analog, digital or mixed digital-analog circuits and software systems, that is, based on neuromorphic devices.

目前,深度学习算法可以工作在不同的数据精度下,高精度可以获得更好的性能(如准确度或识别率),但在应用到人工智能芯片后,存储代价和计算代价都比较高昂,而低精度可以用一定程度的性能损失换取存储和计算的显著节省,从而让芯片具有很高的功耗效用。目前常见的人工智能芯片中,由于计算精度的需求不同,处理芯片也需要提供多种数据精度的存储支持,包括整型(integer,Int)和浮点(floating-point,FP)等,如8位整型(Int8),16位浮点(FP16),32位浮点(FP32)以及64位浮点(FP)等,但类脑芯片中承载的神经网络的各层的权重精度是相同的,使得人工智能芯片中的权重精度配置方案不够灵活,需要改进。At present, deep learning algorithms can work under different data precisions, and high precision can achieve better performance (such as accuracy or recognition rate), but after being applied to artificial intelligence chips, the storage cost and computational cost are relatively high, while Low precision can trade some performance penalty for significant savings in storage and computation, making the chip very power efficient. In the current common artificial intelligence chips, due to the different requirements for computing precision, the processing chip also needs to provide storage support for various data precisions, including integer (integer, Int) and floating-point (floating-point, FP), etc., such as 8 Bit integer (Int8), 16-bit floating point (FP16), 32-bit floating point (FP32) and 64-bit floating point (FP), etc., but the weight precision of each layer of the neural network carried in the brain-like chip is the same , which makes the weight precision configuration scheme in the artificial intelligence chip inflexible and needs to be improved.

发明内容SUMMARY OF THE INVENTION

本发明实施例提供了权重精度配置方法、装置、设备及存储介质,可以优化现有的权重精度配置方案。The embodiments of the present invention provide a weight precision configuration method, apparatus, device, and storage medium, which can optimize the existing weight precision configuration scheme.

第一方面,本发明实施例提供了一种权重精度配置方法,包括:In a first aspect, an embodiment of the present invention provides a weight precision configuration method, including:

确定所述神经网络中的当前目标层,其中,所述神经网络中的所有层被按照对识别率的影响程度进行排序,影响程度高的层优先被确定为目标层;Determine the current target layer in the neural network, wherein all layers in the neural network are sorted according to the degree of influence on the recognition rate, and the layer with a high degree of influence is preferentially determined as the target layer;

将当前目标层对应的权重精度降低至预设最低精度;Reduce the weight accuracy corresponding to the current target layer to the preset minimum accuracy;

升高当前目标层对应的权重精度,并判断所述神经网络的当前识别率是否大于预设阈值,若大于,则将当前目标层对应的权重精度锁定为本次升高前的权重精度;Raise the weight accuracy corresponding to the current target layer, and determine whether the current recognition rate of the neural network is greater than a preset threshold, and if it is greater than, lock the weight accuracy corresponding to the current target layer to the weight accuracy before this increase;

在满足目标层切换条件的情况下,重新确定当前目标层。When the target layer switching condition is satisfied, the current target layer is re-determined.

第二方面,本发明实施例提供了一种权重精度配置装置,包括:In a second aspect, an embodiment of the present invention provides a weight precision configuration device, including:

目标层确定模块,用于确定神经网络中的当前目标层,其中,所述神经网络中的所有层被按照对识别率的影响程度进行排序,影响程度高的层优先被确定为目标层;The target layer determination module is used to determine the current target layer in the neural network, wherein all layers in the neural network are sorted according to the degree of influence on the recognition rate, and the layer with a high degree of influence is preferentially determined as the target layer;

权重精度降低模块,用于将当前目标层对应的权重精度降低至预设最低精度;The weight precision reduction module is used to reduce the weight precision corresponding to the current target layer to the preset minimum precision;

权重精度升高模块,用于升高当前目标层对应的权重精度,并判断所述神经网络的当前识别率是否大于预设阈值,若大于,则将当前目标层对应的权重精度锁定为本次升高前的权重精度;The weight accuracy increasing module is used to increase the weight accuracy corresponding to the current target layer, and determine whether the current recognition rate of the neural network is greater than the preset threshold. If it is greater, then lock the weight accuracy corresponding to the current target layer as this time Weight accuracy before boosting;

目标层切换模块,用于在满足目标层切换条件的情况下,重新确定当前目标层。The target layer switching module is used to re-determine the current target layer under the condition that the target layer switching condition is satisfied.

第三方面,本发明实施例提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如本发明实施例提供的权重精度配置方法。In a third aspect, an embodiment of the present invention provides a computer device, including a memory, a processor, and a computer program stored in the memory and running on the processor, where the processor implements the present invention when the processor executes the computer program The weight precision configuration method provided by the embodiment.

第四方面,本发明实施例提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现如本发明实施例提供的权重精度配置方法。In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, implements the weight precision configuration method provided by the embodiment of the present invention.

本发明实施例中提供的权重精度配置方案,神经网络中的所有层被按照对识别率的影响程度进行排序,影响程度高的层优先被确定为目标层,先将当前目标层对应的权重精度降低至预设最低精度,再升高当前目标层对应的权重精度,若神经网络的当前识别率大于预设阈值,则将当前目标层对应的权重精度锁定为本次升高前的权重精度,在满足目标层切换条件的情况下,重新确定当前目标层。通过采用上述技术方案,按照对识别率的影响程度对神经网络中的所有层进行排序,依次尝试对当前目标层的权重精度先降至最低再进行升高,在兼顾神经网络的识别率的情况下,对各层的权重精度上限进行合理控制,提高承载神经网络的人工智能芯片中的资源利用率,提高芯片性能,降低芯片功耗。In the weight precision configuration scheme provided in the embodiment of the present invention, all layers in the neural network are sorted according to the degree of influence on the recognition rate, and the layer with a high degree of influence is preferentially determined as the target layer, and the weight precision corresponding to the current target layer is firstly determined. Reduce to the preset minimum accuracy, and then increase the weight accuracy corresponding to the current target layer. If the current recognition rate of the neural network is greater than the preset threshold, lock the weight accuracy corresponding to the current target layer to the weight accuracy before this increase. When the target layer switching condition is satisfied, the current target layer is re-determined. By adopting the above technical solution, all layers in the neural network are sorted according to the degree of influence on the recognition rate, and the weight accuracy of the current target layer is firstly reduced to the lowest and then increased, taking into account the recognition rate of the neural network. In this way, the upper limit of the weight accuracy of each layer is reasonably controlled, so as to improve the resource utilization rate in the artificial intelligence chip carrying the neural network, improve the chip performance, and reduce the chip power consumption.

附图说明Description of drawings

图1为本发明实施例提供的一种权重精度配置方法的流程示意图;FIG. 1 is a schematic flowchart of a weight precision configuration method according to an embodiment of the present invention;

图2为现有技术中的一种输出数据的精度配置方案示意图;Fig. 2 is a schematic diagram of a precision configuration scheme of output data in the prior art;

图3为本发明实施例提供的一种输出数据的精度配置方案示意图;FIG. 3 is a schematic diagram of a precision configuration scheme of output data provided by an embodiment of the present invention;

图4为本发明实施例提供的又一种权重精度配置方法的流程示意图;FIG. 4 is a schematic flowchart of yet another weight precision configuration method provided by an embodiment of the present invention;

图5为本发明实施例提供的另一种权重精度配置方法的流程示意图;FIG. 5 is a schematic flowchart of another weight precision configuration method provided by an embodiment of the present invention;

图6为本发明实施例提供的再一种权重精度配置方法的流程示意图;FIG. 6 is a schematic flowchart of still another weight precision configuration method provided by an embodiment of the present invention;

图7为本发明实施例提供的一种权重精度配置装置的结构框图;7 is a structural block diagram of a weight precision configuration apparatus provided by an embodiment of the present invention;

图8为本发明实施例提供的一种计算机设备的结构框图。FIG. 8 is a structural block diagram of a computer device according to an embodiment of the present invention.

具体实施方式Detailed ways

下面结合附图并通过具体实施方式来进一步说明本发明的技术方案。可以理解的是,此处所描述的具体实施例仅仅用于解释本发明,而非对本发明的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与本发明相关的部分而非全部结构。The technical solutions of the present invention are further described below with reference to the accompanying drawings and through specific embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention. In addition, it should be noted that, for the convenience of description, the drawings only show some but not all structures related to the present invention.

在更加详细地讨论示例性实施例之前应当提到的是,一些示例性实施例被描述成作为流程图描绘的处理或方法。虽然流程图将各步骤描述成顺序的处理,但是其中的许多步骤可以被并行地、并发地或者同时实施。此外,各步骤的顺序可以被重新安排。当其操作完成时所述处理可以被终止,但是还可以具有未包括在附图中的附加步骤。所述处理可以对应于方法、函数、规程、子例程、子程序等等。Before discussing the exemplary embodiments in greater detail, it should be mentioned that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although the flowchart depicts the steps as a sequential process, many of the steps may be performed in parallel, concurrently, or concurrently. Furthermore, the order of the steps can be rearranged. The process may be terminated when its operation is complete, but may also have additional steps not included in the figures. The processes may correspond to methods, functions, procedures, subroutines, subroutines, and the like.

需要注意,本发明实施例中提及的“第一”、“第二”等概念仅用于对不同的装置、模块、单元或其他对象进行区分,并非用于限定这些装置、模块、单元或其他对象所执行的功能的顺序或者相互依存关系。It should be noted that concepts such as "first" and "second" mentioned in the embodiments of the present invention are only used to distinguish different devices, modules, units or other objects, and are not used to limit these devices, modules, units or other objects. The order or interdependence of functions performed by other objects.

为了更好地理解本发明实施例,下面对相关技术进行介绍。In order to better understand the embodiments of the present invention, related technologies are introduced below.

人工智能一般指借鉴大脑中进行信息处理的基本规律,在硬件实现与软件算法等多个层面,对于现有的计算体系与系统做出本质的变革,从而实现在计算能耗、计算能力与计算效率等诸多方面的大幅改进,是融合了脑科学与计算机科学、信息科学和人工智能等领域的交叉技术领域。人工智能芯片一般指非冯诺依曼架构的芯片,如脉冲神经网络芯片、忆阻器、忆容器以及忆感器等。Artificial intelligence generally refers to drawing on the basic laws of information processing in the brain, and making essential changes to the existing computing systems and systems at multiple levels such as hardware implementation and software algorithms, so as to realize the advantages of computing energy consumption, computing power and computing power. The substantial improvement in efficiency and many other aspects is an interdisciplinary field of technology that integrates brain science and computer science, information science, and artificial intelligence. Artificial intelligence chips generally refer to chips with non-von Neumann architectures, such as spiking neural network chips, memristors, memristors, and memensors.

本发明实施例中的人工智能芯片中可包括多个处理核,每个处理核中可包含处理器并自带存储区,计算数据可在本地操作,每个处理核可对应神经网络的一层,也即可将神经网络以层为单位部署或映射到对应的处理核上。本发明实施例中的神经网络可包括人工神经网络(Artificial Neural Network,ANN),也可包括脉冲神经网络(Spiking NeuralNetwork,SNN)以及其他类型的神经网络。神经网络的具体类型不做限定,例如可以是声学模型、语音识别模型以及图像识别模型等等,可以应用于数据中心、安防领域、智能医疗领域、自动驾驶领域、智慧交通领域、智能家居领域以及其他相关领域。本发明实施例提供的技术方案并未对神经网络算法本身进行改进,是对于用于实现神经网络的硬件平台的控制方式或应用方式的改进,属于神经形态电路及其系统,又被称为神经形态工程(neuromorphic engineering)。The artificial intelligence chip in the embodiment of the present invention may include multiple processing cores, each processing core may include a processor and has its own storage area, the calculation data may be operated locally, and each processing core may correspond to a layer of a neural network , that is, the neural network can be deployed or mapped to the corresponding processing core in units of layers. The neural network in the embodiment of the present invention may include an artificial neural network (Artificial Neural Network, ANN), and may also include a spiking neural network (Spiking Neural Network, SNN) and other types of neural networks. The specific type of neural network is not limited, for example, it can be an acoustic model, a speech recognition model, an image recognition model, etc., and can be applied to data centers, security fields, smart medical fields, autonomous driving fields, smart transportation fields, smart home fields and other related fields. The technical solutions provided by the embodiments of the present invention do not improve the neural network algorithm itself, but improve the control method or application method of the hardware platform used to realize the neural network, which belongs to the neuromorphic circuit and its system, and is also called neural network. Neuromorphic engineering.

在现有技术中,人工智能芯片中承载的神经网络的各层的权重精度是相同的。若将所有层的权重精度都配置为较低的Int4,这种情况下,为了保证识别率,不仅参数调整困难,导致训练时间大幅增加,而且还往往导致较大的精度损失。若将所有层的权重精度都配置为FP32或更高,此时,运算精度能够满足需求,且识别率高,但是神经网络的模型一般较大,会造成人工智能芯片的资源利用率低,消耗的功耗也高,影响芯片性能。In the prior art, the weight precision of each layer of the neural network carried in the artificial intelligence chip is the same. If the weight accuracy of all layers is configured as a lower Int4, in this case, in order to ensure the recognition rate, it is not only difficult to adjust the parameters, resulting in a significant increase in training time, but also often leads to a large loss of accuracy. If the weight accuracy of all layers is configured to be FP32 or higher, at this time, the operation accuracy can meet the requirements and the recognition rate is high, but the neural network model is generally large, which will result in low resource utilization of artificial intelligence chips and consumption. The power consumption is also high, which affects the chip performance.

本发明实施例中,摒弃了现有技术中神经网络中各层的权重精度是相同的这一限制条件,可以为每层配置不同的权重精度,也即采用混合精度,从而更好地平衡存储容量和计算能耗,与神经网络识别率(或准确率)之间的关系。基于混合精度的思路来配置权重精度,并提供了具体的配置方案。In the embodiment of the present invention, the limitation that the weight precision of each layer in the neural network in the prior art is the same is discarded, and different weight precisions can be configured for each layer, that is, mixed precision is used, so as to better balance the storage The relationship between capacity and computational energy consumption and the recognition rate (or accuracy) of a neural network. The weight precision is configured based on the idea of mixed precision, and a specific configuration scheme is provided.

图1为本发明实施例提供的一种权重精度配置方法的流程示意图,该方法可以由权重精度配置装置执行,其中该装置可由软件和/或硬件实现,一般可集成在计算机设备中。如图1所示,该方法包括:1 is a schematic flowchart of a weight precision configuration method provided by an embodiment of the present invention. The method may be executed by a weight precision configuration apparatus, where the apparatus may be implemented by software and/or hardware, and may generally be integrated in computer equipment. As shown in Figure 1, the method includes:

步骤101、确定神经网络中的当前目标层,其中,所述神经网络中的所有层被按照对识别率的影响程度进行排序,影响程度高的层优先被确定为目标层。Step 101: Determine the current target layer in the neural network, wherein all layers in the neural network are sorted according to the degree of influence on the recognition rate, and the layer with a high degree of influence is preferentially determined as the target layer.

本发明实施例中,对神经网络的具体结构不做限定,例如神经网络中包含的神经元层数可以是两层以上的任意层数。神经网络中的不同层可能对网络的识别率有不同程度的影响,产生影响的因素可能有很多,例如权重参数的数量、权重参数的值(权重值)以及权重精度(权重值的精度)等。可预先分别评估神经网络中的各层对识别率的影响程度,按照一定的顺序(如影响程度由高到低)对各层进行排序。在本步骤中,可以先将影响程度最高的层确定为当前的目标层,当需要切换目标层时,再将影响程度第二高的层确定为新的当前目标层。In the embodiment of the present invention, the specific structure of the neural network is not limited, for example, the number of neuron layers included in the neural network may be any number of layers above two. Different layers in the neural network may have different degrees of influence on the recognition rate of the network, and there may be many factors, such as the number of weight parameters, the value of weight parameters (weight value), and the weight accuracy (weight value accuracy), etc. . The degree of influence of each layer in the neural network on the recognition rate can be evaluated separately in advance, and each layer is sorted in a certain order (eg, from high to low in degree of influence). In this step, the layer with the highest degree of influence may be first determined as the current target layer, and when the target layer needs to be switched, the layer with the second highest degree of influence may be determined as the new current target layer.

步骤102、将当前目标层对应的权重精度降低至预设最低精度。Step 102: Reduce the weight precision corresponding to the current target layer to a preset minimum precision.

示例性的,神经网络中的所有层的初始权重精度可以根据实际需求设置,可以是相同的,也可以是不同的。所述预设最低精度可以根据实际需求设置,例如可以根据人工智能芯片的硬件配置确定。这样设置的好处在于,神经网络在需要部署到人工智能芯片之前,可能是由有应用需求的第三方提供的,而第三方在设计神经网络时并未考虑人工智能芯片的具体情况,因此各层的权重精度可能较高,所以在进行权重精度配置之前,可先降低至与人工智能芯片相匹配的预设最低权重精度,再尝试逐渐升高。Exemplarily, the initial weight accuracy of all layers in the neural network can be set according to actual requirements, which can be the same or different. The preset minimum precision may be set according to actual requirements, for example, may be determined according to the hardware configuration of the artificial intelligence chip. The advantage of this setting is that before the neural network needs to be deployed to the artificial intelligence chip, it may be provided by a third party with application requirements, and the third party does not consider the specific situation of the artificial intelligence chip when designing the neural network, so each layer The weight accuracy may be higher, so before configuring the weight accuracy, you can first reduce it to the preset minimum weight accuracy that matches the artificial intelligence chip, and then try to gradually increase it.

步骤103、升高当前目标层对应的权重精度,并判断所述神经网络的当前识别率是否大于预设阈值,若大于,则将当前目标层对应的权重精度锁定为本次升高前的权重精度。Step 103: Increase the weight accuracy corresponding to the current target layer, and determine whether the current recognition rate of the neural network is greater than a preset threshold, if greater, lock the weight accuracy corresponding to the current target layer as the weight before the increase this time. precision.

示例性的,在升高当前目标层对应的权重精度时,升高的幅度不做限定。另外,每次升高的幅度可以相同,也可以不同。升高的幅度可以以精度等级来衡量,精度等级用于表示数据精度的高低,精度越高,对应的精度等级越高,不同精度等级对应的精度值可根据实际需求设置。示例性的,可以按照Int4、Int8、FP16以及FP32的顺序升高,每次升高一个精度等级,如从Int4升到Int8。每次升高一个精度等级的好处在于,能够更精准地确定权重精度,也即配置的精度更高,若每次升高两层或更多,在当前识别率大于预设阈值时,锁定的权重精度相比于当前升高后的精度相差两个或更多的精度等级,而两者中间可能存在某个或某些权重精度对应的识别率是可以小于预设阈值的。Exemplarily, when increasing the weight accuracy corresponding to the current target layer, the range of the increase is not limited. In addition, the magnitude of each increase may be the same or different. The magnitude of the increase can be measured by the precision level, which is used to indicate the level of data precision. The higher the precision, the higher the corresponding precision level. The precision values corresponding to different precision levels can be set according to actual needs. Exemplarily, the order of Int4, Int8, FP16, and FP32 may be increased, and the precision level may be increased by one at a time, for example, from Int4 to Int8. The advantage of increasing the accuracy level by one each time is that the weight accuracy can be determined more accurately, that is, the configuration accuracy is higher. The weighting accuracy differs from the currently increased accuracy by two or more accuracy levels, and there may be a certain or some weighting accuracy between the two, and the recognition rate corresponding to it may be lower than the preset threshold.

示例性的,当神经网络部署在人工智能芯片上时,神经网络以层为单位部署或映射到对应的处理核上,当前目标层被映射到对应的处理核中,因此当前目标层对应的权重精度可理解为当前目标层对应的处理核的核精度,也即本发明实施例的方案可理解为对人工智能芯片中的处理核的核精度进行配置。Exemplarily, when the neural network is deployed on an artificial intelligence chip, the neural network is deployed in layers or mapped to the corresponding processing core, and the current target layer is mapped to the corresponding processing core, so the weight corresponding to the current target layer is The precision can be understood as the core precision of the processing core corresponding to the current target layer, that is, the solution in the embodiment of the present invention can be understood as configuring the core precision of the processing core in the artificial intelligence chip.

示例性的,神经网络的识别率可以用于衡量神经网络的性能。例如,可以采用预设数量的样本对神经网络进行测试,得到当前识别率。预设阈值可以根据神经网络的应用场景等实际使用需求来设置,可理解为在考虑芯片性能容忍度的情况下,所能够达到的最高识别率,具体数值不做限定,例如可以是0.95。若升高当前目标层的权重精度后,神经网络的当前识别率大于预设阈值,说明本次升高是不合适的,可能对芯片性能造成较大影响,所以可以将当前目标层对应的权重精度锁定为本次升高前的权重精度。例如,升高前是FP16,升高后是FP32,可以将当前目标层对应的权重精度锁定为FP16。Exemplarily, the recognition rate of the neural network can be used to measure the performance of the neural network. For example, a preset number of samples can be used to test the neural network to obtain the current recognition rate. The preset threshold can be set according to the actual use requirements such as the application scenario of the neural network. It can be understood as the highest recognition rate that can be achieved when the chip performance tolerance is considered. The specific value is not limited, for example, it can be 0.95. If the current recognition rate of the neural network is greater than the preset threshold after increasing the weight accuracy of the current target layer, it means that this increase is inappropriate and may have a great impact on the chip performance, so the weight corresponding to the current target layer can be changed. The precision is locked to the weight precision before this increase. For example, before the upgrade is FP16, and after the upgrade is FP32, the weight accuracy corresponding to the current target layer can be locked to FP16.

步骤104、在满足目标层切换条件的情况下,重新确定当前目标层。Step 104: Re-determine the current target layer when the target layer switching condition is satisfied.

示例性的,可以针对神经网络中的各层均尝试先降低至最低再升高权重精度,可以根据目标层切换条件来决定是否对下一个目标层的权重精度进行升高。在所有层对应的权重精度均被锁定后,可认为神经网络的权重精度配置完毕,此时的神经网络的权重精度得到了合理配置,既能够满足识别率的要求,还能够提高承载神经网络的人工智能芯片中的资源利用率。可选的,也可针对神经网络中的部分层尝试调整权重精度,在一定程度上提高承载神经网络的人工智能芯片中的资源利用率的同时,保证权重精度的配置效率。部分层的具体数量可以根据实际需求设置,如所有层的数量与预设比例的乘积。Exemplarily, for each layer in the neural network, it is possible to try to decrease the weight accuracy to the lowest level first and then increase the weight accuracy, and it can be determined whether to increase the weight accuracy of the next target layer according to the target layer switching condition. After the weight accuracy corresponding to all layers is locked, it can be considered that the weight accuracy of the neural network is configured. At this time, the weight accuracy of the neural network has been reasonably configured, which can not only meet the requirements of the recognition rate, but also improve the bearing capacity of the neural network. Resource utilization in artificial intelligence chips. Optionally, it is also possible to try to adjust the weight precision for some layers in the neural network, so as to improve the resource utilization of the artificial intelligence chip carrying the neural network to a certain extent, and at the same time ensure the configuration efficiency of the weight precision. The specific number of some layers can be set according to actual needs, such as the product of the number of all layers and a preset ratio.

本发明实施例中提供的权重精度配置方法,神经网络中的所有层被按照对识别率的影响程度进行排序,影响程度高的层优先被确定为目标层,先将当前目标层对应的权重精度降低至预设最低精度,再升高当前目标层对应的权重精度,若神经网络的当前识别率大于预设阈值,则将当前目标层对应的权重精度锁定为本次升高前的权重精度,在满足目标层切换条件的情况下,重新确定当前目标层。通过采用上述技术方案,按照对识别率的影响程度对神经网络中的所有层进行排序,并依次尝试对当前目标层的权重精度先降至最低再进行升高,在兼顾神经网络的识别率的情况下,对各层的权重精度上限进行合理控制,提高承载神经网络的人工智能芯片中的资源利用率,提高芯片性能,降低芯片功耗。In the weight precision configuration method provided in the embodiment of the present invention, all layers in the neural network are sorted according to the degree of influence on the recognition rate, and the layer with a high degree of influence is preferentially determined as the target layer, and the weight precision corresponding to the current target layer is firstly determined. Reduce to the preset minimum accuracy, and then increase the weight accuracy corresponding to the current target layer. If the current recognition rate of the neural network is greater than the preset threshold, lock the weight accuracy corresponding to the current target layer to the weight accuracy before this increase. When the target layer switching condition is satisfied, the current target layer is re-determined. By adopting the above technical solution, all layers in the neural network are sorted according to the degree of influence on the recognition rate, and the weight accuracy of the current target layer is tried to be reduced to the lowest first and then increased, taking into account the recognition rate of the neural network. In this case, the upper limit of the weight accuracy of each layer is reasonably controlled to improve the resource utilization in the artificial intelligence chip carrying the neural network, improve the chip performance, and reduce the chip power consumption.

可选的,判断所述神经网络的当前识别率是否大于预设阈值之后,还可包括:若小于或等于,则将当前目标层对应的权重精度锁定为升高后的权重精度。这样设置的好处在于,当前目标层的权重精度经过一次升高后,已经可以在一定程度上满足识别率需求,且有效控制了神经网络的模型面积,为了提升权重精度配置的效率,可以将升高后的权重精度锁定,继续升高其他层的权重精度。这样,神经网络的所有层对应的权重精度均被锁定,就相当于神经网络中的所有层的权重精度均被尝试过升高,此时结束神经网络的权重精度配置,可以在保证神经网络的识别率的情况下,快速配置神经网络的权重精度,并提高承载神经网络的人工智能芯片中的资源利用率,提高芯片性能,降低芯片功耗。Optionally, after judging whether the current recognition rate of the neural network is greater than a preset threshold, the method may further include: if it is less than or equal to, locking the weight accuracy corresponding to the current target layer to the increased weight accuracy. The advantage of this setting is that after the weight accuracy of the current target layer is increased once, the recognition rate requirement can be met to a certain extent, and the model area of the neural network is effectively controlled. In order to improve the efficiency of the weight accuracy configuration, the increase After the high weight accuracy is locked, continue to increase the weight accuracy of other layers. In this way, the weight accuracy corresponding to all layers of the neural network is locked, which means that the weight accuracy of all layers in the neural network has been tried to increase. In the case of recognition rate, the weight accuracy of the neural network is quickly configured, and the resource utilization in the artificial intelligence chip carrying the neural network is improved, the chip performance is improved, and the chip power consumption is reduced.

示例性的,锁定权重精度的方式可以是改写当前目标层的位数标志位或改写当前目标层对应的调用算子的名称。Exemplarily, the weight precision may be locked by rewriting the bit flag of the current target layer or rewriting the name of the calling operator corresponding to the current target layer.

在一些实施例中,在判断所述神经网络的当前识别率是否大于预设阈值之后,还包括:若小于或等于,则继续升高当前目标层对应的权重精度,并继续判断所述神经网络的当前识别率是否大于所述预设阈值;并且,所述目标层切换条件包括:所述神经网络的当前识别率大于预设阈值;所述影响程度高的层优先被确定为目标层包括:在对应的权重精度未被锁定的层中,影响程度高的层优先被确定为目标层。这样设置的好处在于,可以提升权重精度配置的效率。在确定神经网络的当前识别率小于或等于预设阈值时,说明当前目标层的权重精度仍存在升高的空间,因此,可以继续尝试升高当前目标层的权重精度,并继续判断当前识别率是否已升到预设阈值以上,直到当前识别率大于预设阈值,说明当前目标层的权重精度已不能再升高,因此,可以切换目标层,尝试对下一层的权重精度进行升高。In some embodiments, after judging whether the current recognition rate of the neural network is greater than a preset threshold, the method further includes: if it is less than or equal to, continuing to increase the weight accuracy corresponding to the current target layer, and continuing to judge the neural network Whether the current recognition rate of the neural network is greater than the preset threshold; and the target layer switching conditions include: the current recognition rate of the neural network is greater than the preset threshold; the layer with a high degree of influence is preferentially determined as the target layer includes: Among the layers whose corresponding weight accuracy is not locked, the layer with a high degree of influence is preferentially determined as the target layer. The advantage of this setting is that it can improve the efficiency of weight precision configuration. When it is determined that the current recognition rate of the neural network is less than or equal to the preset threshold, it means that there is still room for improvement in the weight accuracy of the current target layer. Therefore, you can continue to try to increase the weight accuracy of the current target layer and continue to judge the current recognition rate. Whether it has risen above the preset threshold until the current recognition rate is greater than the preset threshold, it means that the weight accuracy of the current target layer can no longer be increased. Therefore, you can switch the target layer and try to increase the weight accuracy of the next layer.

在一些实施例中,针对所述神经网络中的所有层对应的权重精度进行多轮升高操作,每轮升高操作中,每层对应的权重精度被升高最多一次;在判断所述神经网络的当前识别率是否大于预设阈值之后,还包括:若小于或等于,则暂存升高后的权重精度;相应的,所述目标层切换条件包括:当前目标层对应的权重精度在本轮升高操作中已被升高一次;所述将当前目标层对应的权重精度降低至预设最低精度包括:若当前目标层对应的权重精度未被调整过,则将当前目标层对应的权重精度降低至预设最低精度。对于当前目标层来说,若其首次被确定为目标层,则其对应的权重精度未被调整过,先将其对应的权重精度降低至预设最低精度,然后再进行一次升高;若其不是首次被确定为目标层,则已经经历过一次升高,在本轮中,可在上一轮暂存的升高后的权重精度的基础上再进行一次升高操作。这样设置的好处在于,可以均匀地升高各层的权重精度。例如神经网络中存在4层,分别为L1、L2、L3和L4,按照对识别率的影响程度进行排序,由影响最高到最低,依次为L1、L3、L2和L4,那么每轮升高操作中,L1先被确定为目标层,也即当前目标层为L1,先升高L1的权重精度,然后切换目标层,使得当前目标层为L3,升高L3的权重精度,随后依次升高L2和L4的权重精度。In some embodiments, multiple rounds of raising operations are performed for the weight accuracy corresponding to all layers in the neural network, and in each round of raising operation, the weight accuracy corresponding to each layer is raised at most once; After whether the current recognition rate of the network is greater than the preset threshold, it also includes: if it is less than or equal to, temporarily storing the increased weight accuracy; correspondingly, the target layer switching condition includes: the weight accuracy corresponding to the current target layer is in this The round-up operation has been raised once; the reducing the weighting accuracy corresponding to the current target layer to the preset minimum accuracy includes: if the weighting accuracy corresponding to the current target layer has not been adjusted, then reducing the weighting accuracy corresponding to the current target layer The precision is reduced to the preset minimum precision. For the current target layer, if it is determined as the target layer for the first time, its corresponding weight precision has not been adjusted, first reduce its corresponding weight precision to the preset minimum precision, and then increase it again; If it is not determined as the target layer for the first time, it has already experienced a boost. In this round, a boost operation can be performed on the basis of the boosted weight accuracy temporarily stored in the previous round. The advantage of this setting is that the weight accuracy of each layer can be increased evenly. For example, there are 4 layers in the neural network, namely L1, L2, L3 and L4, which are sorted according to the degree of influence on the recognition rate, from the highest to the lowest, L1, L3, L2 and L4, then each round of raising operation Among them, L1 is first determined as the target layer, that is, the current target layer is L1, the weight accuracy of L1 is first increased, and then the target layer is switched, so that the current target layer is L3, the weight accuracy of L3 is increased, and then L2 is increased in turn and L4 weight accuracy.

在一些实施例中,所述重新确定当前目标层,包括:重新确定当前目标层,直到所有层对应的权重精度均被锁定。这样设置的好处在于,针对神经网络中的所有层进行尝试权重精度的调整,使得配置结果更加合理,能够更好地提升芯片性能。In some embodiments, the re-determining the current target layer includes: re-determining the current target layer until the weight accuracy corresponding to all layers is locked. The advantage of this setting is that the adjustment of the weight precision of all layers in the neural network makes the configuration result more reasonable and can better improve the performance of the chip.

在一些实施例中,在判断所述神经网络的当前识别率是否大于预设阈值之后,还包括:若小于或等于,则训练所述神经网络以调节当前目标层的权重参数值,其中,训练目标为提高所述神经网络的识别率。这样设置的好处在于,当权重精度被升高后,一般识别率也会升高,而通过训练神经网络,调节当前目标层的权重参数值,可以进一步提高识别率,优化神经网络的性能,也能够使得下一次权重精度的升高后,更快接近预设阈值,提高权重精度配置效率。In some embodiments, after judging whether the current recognition rate of the neural network is greater than a preset threshold, the method further includes: if it is less than or equal to, training the neural network to adjust the weight parameter value of the current target layer, wherein the training The goal is to improve the recognition rate of the neural network. The advantage of this setting is that when the weight accuracy is increased, the general recognition rate will also increase. By training the neural network and adjusting the weight parameter value of the current target layer, the recognition rate can be further improved, and the performance of the neural network can be optimized. It can make the next time the weight accuracy increase, approach the preset threshold faster, and improve the weight accuracy configuration efficiency.

在一些实施例中,所述训练所述神经网络包括在人工智能芯片上训练所述神经网络。这样设置的好处在于,本发明实施例中的神经网络可以映射到人工智能芯片上进行应用,在人工智能芯片上进行训练,相当于在实际应用之前预先将神经网络映射到人工智能芯片上,使得训练过程更加符合实际的应用场景,从而更加准确高效地训练神经网络。In some embodiments, the training the neural network includes training the neural network on an artificial intelligence chip. The advantage of this setting is that the neural network in the embodiment of the present invention can be mapped to the artificial intelligence chip for application, and training on the artificial intelligence chip is equivalent to mapping the neural network to the artificial intelligence chip in advance before actual application, so that The training process is more in line with the actual application scenario, so that the neural network can be trained more accurately and efficiently.

在一些实施例中,在训练所述神经网络的过程中,包括:获取所述神经网络中的第一层的待输出数据的精度,其中,所述第一层包括所述神经网络中的最后一层之外的任意一层或多层;获取第二层的权重精度,其中,所述第二层为所述第一层的下一层;根据所述第二层的权重精度对所述待输出数据的精度进行配置。这样设置的好处在于,可以灵活配置部署在人工智能芯片中的神经网络中的一层或多层的输出数据的精度,进而优化人工智能芯片的性能。In some embodiments, the process of training the neural network includes: acquiring the accuracy of the data to be output in the first layer in the neural network, wherein the first layer includes the last layer in the neural network. Any layer or layers other than one layer; obtain the weight accuracy of the second layer, wherein the second layer is the next layer of the first layer; according to the weight accuracy of the second layer Configure the precision of the data to be output. The advantage of this setting is that the accuracy of the output data of one or more layers of the neural network deployed in the artificial intelligence chip can be flexibly configured, thereby optimizing the performance of the artificial intelligence chip.

目前,人工智能的神经网络通常以若干神经元为一层,每一层通常对应人工智能芯片中的一个处理核。神经网络的核心计算是矩阵向量乘操作,当数据输入神经网络的一层时,计算精度一般为数据精度与权重精度(也即权重值的精度)的乘积,且参照数据精度与权重精度中较高的精度来确定计算结果(也即当前层对应的处理核的输出数据)的精度。图2为现有技术中的一种输出数据的精度配置方案示意图,在现有技术中,人工智能芯片中承载的神经网络的各层的权重精度是相同的,如图2所示,为了便于说明,仅示出了神经网络中的四层,分别为L1、L2、L3和L4。L1的输入数据的精度(数据精度)为FP32(32位浮点),L1的权重精度为FP32,那么乘累加操作后得到的精度为FP32。而在本发明实施例中,不再参照数据精度与权重精度中较高的精度来确定计算结果的精度,而是根据下一层的权重精度来确定当前层的输出数据的精度。At present, the neural network of artificial intelligence usually uses several neurons as a layer, and each layer usually corresponds to a processing core in the artificial intelligence chip. The core calculation of the neural network is the matrix-vector multiplication operation. When the data is input to one layer of the neural network, the calculation accuracy is generally the product of the data accuracy and the weight accuracy (that is, the accuracy of the weight value), and the reference data accuracy and weight accuracy are compared. High precision is used to determine the precision of the calculation result (that is, the output data of the processing core corresponding to the current layer). Fig. 2 is a schematic diagram of an output data accuracy configuration scheme in the prior art. In the prior art, the weight accuracy of each layer of the neural network carried in the artificial intelligence chip is the same, as shown in Fig. 2, for convenience To illustrate, only four layers in the neural network are shown, L1, L2, L3, and L4. The precision (data precision) of the input data of L1 is FP32 (32-bit floating point), and the weight precision of L1 is FP32, so the precision obtained after the multiply-accumulate operation is FP32. However, in this embodiment of the present invention, the accuracy of the calculation result is no longer determined with reference to the higher accuracy of the data accuracy and the weight accuracy, but the accuracy of the output data of the current layer is determined according to the weight accuracy of the next layer.

本发明实施例中,第一层并不一定是神经网络中排在最前面的层,可以是除最后一层之外的任意一层。若将对应第一层的处理核记为第一处理核,则可理解为由第一处理核执行获取神经网络中的第一层的待输出数据的精度,获取第二层的权重精度,根据第二层的权重精度对第一层的待输出数据的精度进行配置,除对应最后一层的处理核之外的任意一个处理核均可以成为第一处理核。示例性的,由第一层对应的第一处理核中的处理器进行数据计算,例如根据第一层的输入数据和第一层的权重参数(如权重矩阵等)计算得到待输出数据,一般的,待输出数据的精度大于或等于输入数据精度和权重精度中的较高者。如果输入数据精度和权重精度本身就比较低(如Int2、Int4或Int8),在乘累加操作后之后,位数可能不足(例如无法满足对应的处理核等硬件配置方面的需求),就需要提高精度,那么待输出数据的精度通常来说,会增加得比较高(例如分别增加到Int8、Int16或Int16),且若输入数据精度和权重精度中较高者越低,需要提升的精度等级越多;相反,如果输入数据精度和权重精度本身就已经比较高了(比如FP16、FP32或FP64),那么待输出数据的精度可能就不会增加,或者增加的比较少(例如从FP16增加到FP32),因为,经过乘累加操作后的精度已经足够高。In this embodiment of the present invention, the first layer is not necessarily the first layer in the neural network, and may be any layer except the last layer. If the processing core corresponding to the first layer is recorded as the first processing core, it can be understood that the first processing core executes the acquisition of the accuracy of the data to be output in the first layer in the neural network, and the weight accuracy of the second layer is obtained according to The weight precision of the second layer configures the precision of the data to be output in the first layer, and any processing core except the processing core corresponding to the last layer can become the first processing core. Exemplarily, the data calculation is performed by the processor in the first processing core corresponding to the first layer, for example, the data to be output is calculated according to the input data of the first layer and the weight parameters (such as weight matrix, etc.) of the first layer, generally , the precision of the data to be output is greater than or equal to the higher of the precision of the input data and the precision of the weight. If the input data precision and weight precision are relatively low (such as Int2, Int4 or Int8), after the multiply-accumulate operation, the number of digits may be insufficient (for example, the corresponding processing core and other hardware configuration requirements cannot be met), it is necessary to improve precision, the precision of the data to be output will generally increase relatively high (for example, increase to Int8, Int16 or Int16 respectively), and if the input data precision and weight precision are higher, the higher the precision level that needs to be improved. On the contrary, if the input data precision and weight precision are already relatively high (such as FP16, FP32 or FP64), then the precision of the output data may not increase, or the increase is relatively small (such as increasing from FP16 to FP32) ), because the accuracy after multiply-accumulate operation is high enough.

可选的,获取所述神经网络中的第一层的待输出数据的精度可包括:获取神经网络中的第一层的输入数据的精度和所述第一层的权重精度;根据所述输入数据的精度和所述第一层的权重精度确定所述第一层的待输出数据的精度,所述待输出数据的精度大于或等于所述输入数据的精度和所述第一层的权重精度中较高的精度。Optionally, acquiring the precision of the data to be outputted of the first layer in the neural network may include: acquiring the precision of the input data of the first layer in the neural network and the weight precision of the first layer; according to the input The precision of the data and the weight precision of the first layer determine the precision of the data to be output from the first layer, and the precision of the data to be output is greater than or equal to the precision of the input data and the weight precision of the first layer higher precision.

本发明实施例中,不同层的权重精度可以是不同的,获取第二层的权重精度的具体方式不做限定。例如,可以在芯片的编译阶段将第二层的权重精度存储在第一处理核内的存储区,在获取到第一层的待输出数据后,从该存储区读取第二层的权重精度;又如,假设第二层对应的处理核为第二处理核,第二处理核内的存储区中可以存储有第二层的权重精度,第一处理核可通过核间通信的方式从第二处理核获取第二层的权重精度。In this embodiment of the present invention, the weight precision of different layers may be different, and the specific manner of obtaining the weight precision of the second layer is not limited. For example, the weight precision of the second layer can be stored in the storage area of the first processing core during the compilation phase of the chip, and after the output data of the first layer is obtained, the weight precision of the second layer can be read from the storage area. ; Another example, assuming that the processing core corresponding to the second layer is the second processing core, the storage area in the second processing core can store the weight precision of the second layer, and the first processing core The second processing kernel obtains the weight accuracy of the second layer.

本发明实施例中,参考第二层的权重精度对第一层的待输出数据的精度进行配置,具体的参考方式和配置方式不做限定。示例性的,可将待输出数据的精度配置成比第二层的权重精度低的精度,也可将待输出数据的精度配置成比第二层的权重精度高的精度,得到输出数据的精度,第二层的权重精度和输出数据的精度之间相差的精度等级可以是第一预设精度等级差值。例如,精度Int4和FP16之间,还存在Int8,相差的精度等级可以是2,而Int4和Int8之间相差的精度等级可以是1。假设第二层的权重精度为FP16,第一预设精度等级差值为2,若将待输出数据的精度配置成比第二层的权重精度低的精度,则将待输出数据的精度配置成Int4。In the embodiment of the present invention, the accuracy of the data to be outputted in the first layer is configured with reference to the weight accuracy of the second layer, and the specific reference method and configuration method are not limited. Exemplarily, the accuracy of the data to be output may be configured to be lower than the weight accuracy of the second layer, or the accuracy of the data to be output may be configured to be higher than the weight accuracy of the second layer to obtain the accuracy of the output data. , the precision level of the difference between the weight precision of the second layer and the precision of the output data may be the first preset precision level difference. For example, between the precisions Int4 and FP16, there is also Int8, the precision level of the difference can be 2, and the precision level of the difference between Int4 and Int8 can be 1. Assuming that the weight accuracy of the second layer is FP16, and the first preset accuracy level difference is 2, if the accuracy of the data to be output is configured to be lower than the weight accuracy of the second layer, the accuracy of the data to be output is configured as Int4.

在一些实施例中,所述根据所述第二层的权重精度对所述待输出数据的精度进行配置,包括:在所述第二层的权重精度低于所述待输出数据的精度时,根据所述第二层的权重精度确定目标精度,其中,所述目标精度低于所述待输出数据的精度;将所述待输出数据的精度配置成目标精度。可选的,所述目标精度等于或高于所述第二层的权重精度,这样设置的好处在于,相当于根据所述第二层的权重精度对所述待输出数据的精度进行截取操作,使得待输出数据的精度降低,从而降低数据传输量,在第二层进行数据计算时,也能够降低计算量,进而降低数据处理所带来的能耗。In some embodiments, the configuring the precision of the data to be output according to the weight precision of the second layer includes: when the weight precision of the second layer is lower than the precision of the data to be output, The target precision is determined according to the weight precision of the second layer, wherein the target precision is lower than the precision of the data to be output; the precision of the data to be output is configured as the target precision. Optionally, the target precision is equal to or higher than the weight precision of the second layer. The advantage of this setting is that the precision of the data to be output is intercepted according to the weight precision of the second layer. The accuracy of the data to be output is reduced, thereby reducing the amount of data transmission. When data calculation is performed in the second layer, the amount of calculation can also be reduced, thereby reducing the energy consumption caused by data processing.

在一些实施例中,所述根据所述第二层的权重精度确定目标精度,包括:将所述第二层的权重精度确定为目标精度。这样设置的好处在于,相当于将所述待输出数据的精度截取为与所述第二层的权重精度相一致的精度。可以进一步降低数据传输量以及降低数据处理所带来的能耗,提高芯片算力。可选的,也可不判断第二层的权重精度和第一层的待输出数据的精度的大小,直接将第二层的权重精度确定为目标精度。In some embodiments, the determining the target accuracy according to the weight accuracy of the second layer includes: determining the weight accuracy of the second layer as the target accuracy. The advantage of this setting is that it is equivalent to truncating the precision of the data to be output to the precision that is consistent with the weight precision of the second layer. It can further reduce the amount of data transmission and the energy consumption caused by data processing, and improve the computing power of the chip. Optionally, it is also possible to directly determine the weight accuracy of the second layer as the target accuracy without judging the weight accuracy of the second layer and the accuracy of the data to be outputted of the first layer.

在一些实施例中,可包括:判断第二层的权重精度是否低于第一层的待输出数据的精度,若是,则将第二层的权重精度确定为目标精度,将第一层的待输出数据的精度配置成目标精度,得到输出数据;否则,保持第一层的待输出数据的精度不变或将第一层的待输出数据的精度配置成第二层的权重精度,得到输出数据。其中,保持第一层的待输出数据的精度不变可以减少第一层与第二层之间的传输量。In some embodiments, it may include: judging whether the weight accuracy of the second layer is lower than the accuracy of the data to be outputted in the first layer, and if so, determining the weight accuracy of the second layer as the target accuracy, and determining the weight accuracy of the first layer to be the target accuracy. The accuracy of the output data is configured as the target accuracy, and the output data is obtained; otherwise, the accuracy of the data to be output in the first layer is kept unchanged or the accuracy of the data to be output in the first layer is configured as the weight accuracy of the second layer to obtain the output data. . Wherein, keeping the precision of the data to be output in the first layer unchanged can reduce the amount of transmission between the first layer and the second layer.

在一些实施例中,在所述根据所述第二层的权重精度对所述待输出数据的精度进行配置之后,还包括:将配置后得到的输出数据输出至所述第二层对应的处理核。这样设置的好处在于,将输出数据通过核间通信的方式发送到第二层对应的处理核,以便第二层对应的处理核进行第二层的相关计算。In some embodiments, after configuring the precision of the data to be output according to the weight precision of the second layer, the method further includes: outputting the output data obtained after the configuration to a process corresponding to the second layer nuclear. The advantage of this setting is that the output data is sent to the processing core corresponding to the second layer by means of inter-core communication, so that the processing core corresponding to the second layer can perform the related calculation of the second layer.

在一些实施例中,所述人工智能芯片基于众核架构实现,众核架构可以具备多核重组特性,核与核之间没有主从之分,可以灵活的用软件来配置任务,把不同的任务同时配置在不同的核中,实现多任务的并行处理,可以由一系列核构成阵列来完成神经网络的计算,能够高效支持各种神经网络算法,提高芯片性能。示例性的,人工智能芯片可采用2DMesh片上网络结构,用于核与核之间的通信互联,芯片与外部的通信可以通过高速串口实现。In some embodiments, the artificial intelligence chip is implemented based on a many-core architecture, and the many-core architecture can have multi-core reorganization characteristics, there is no master-slave distinction between cores, and software can be used to configure tasks flexibly and combine different tasks. At the same time, it is configured in different cores to realize multi-task parallel processing. A series of cores can form an array to complete the calculation of the neural network, which can efficiently support various neural network algorithms and improve the performance of the chip. Exemplarily, the artificial intelligence chip can adopt a 2DMesh on-chip network structure for communication interconnection between cores, and the communication between the chip and the outside can be realized through a high-speed serial port.

图3为本发明实施例提供的一种输出数据的精度配置方案示意图,如图3所示,为了便于说明,仅示出了神经网络中的四层,分别为L1、L2、L3和L4。FIG. 3 is a schematic diagram of an output data precision configuration scheme provided by an embodiment of the present invention. As shown in FIG. 3 , for convenience of description, only four layers in the neural network are shown, namely L1, L2, L3, and L4.

对于L1来说,输入数据的精度为Int8,L1的权重精度为Int8,那么乘累加操作后得到的精度为Int8,但乘累加过程中可能出现精度饱和的情况,导致丢失信息。在现有技术中,参照数据精度与权重精度中较高的精度来确定计算结果,由于L2的权重精度为FP16,则需要将截取后的Int8的精度补齐再输出,在这个过程中造成先截取掉的那些精度的损失。而本发明实施例中,先获取L2的权重精度,那么得知L1的待输出数据的精度与L2的权重精度相同,不会进行精度截取操作,可减少数据转换中的精度损失。For L1, the precision of the input data is Int8, and the weight precision of L1 is Int8, then the precision obtained after the multiply-accumulate operation is Int8, but the precision may be saturated during the multiply-accumulate process, resulting in loss of information. In the prior art, the calculation result is determined with reference to the higher precision of the data precision and the weight precision. Since the weight precision of L2 is FP16, it is necessary to fill in the precision of the intercepted Int8 before outputting. Those losses of precision that are truncated. However, in the embodiment of the present invention, the weight precision of L2 is obtained first, then it is known that the precision of the data to be output from L1 is the same as the weight precision of L2, and no precision interception operation is performed, which can reduce precision loss in data conversion.

对于L3来说,输入数据的精度为FP16,权重精度为FP16,在现有技术中,输出数据的精度也应该是FP16。而本发明实施例中,先获取L4的权重精度Int8,那么得知L1的待输出数据的精度高于L2的权重精度,可将待输出数据的精度配置为Int8,相比于现有技术,进一步降低了输出数据的精度,减少了L3层和L4层之间的数据传输量,也即减少了L3层所在的处理核和L4层所在的处理核中间的数据通信量,且对L4层的计算精度不会产生影响,大大提升了芯片性能。For L3, the precision of input data is FP16, and the precision of weight is FP16. In the prior art, the precision of output data should also be FP16. In the embodiment of the present invention, the weight precision Int8 of L4 is obtained first, then it is known that the precision of the data to be output in L1 is higher than the weight precision of L2, and the precision of the data to be output can be configured as Int8. Compared with the prior art, The accuracy of the output data is further reduced, and the amount of data transmission between the L3 layer and the L4 layer is reduced, that is, the data traffic between the processing core where the L3 layer is located and the processing core where the L4 layer is located is reduced. The calculation accuracy will not be affected, which greatly improves the performance of the chip.

在一些实施例中,通过以下方式对所述神经网络中的所有层按照对识别率的影响程度进行排序:计算神经网络的初始识别率;对于所述神经网络中的每一层,将当前层的权重精度从第一精度降低至第二精度,并计算所述神经网络的识别率相对于所述初始识别率的下降值;依据所述下降值对所有层进行排序,其中,所述下降值越大,对识别率的影响程度越高。这样设置的好处在于,可以快速准确的评估不同层对识别率的影响。其中,第一精度和第二精度可根据实际需求设置,第一精度例如可以是神经网络的初始精度,第一精度和第二精度之间相差的精度等级也不做限定。例如,第一精度可以是FP32,第二精度可以是FP16。In some embodiments, all layers in the neural network are sorted according to the degree of influence on the recognition rate by: calculating the initial recognition rate of the neural network; for each layer in the neural network, the current layer The weight precision of , is reduced from the first precision to the second precision, and the drop value of the recognition rate of the neural network relative to the initial recognition rate is calculated; all layers are sorted according to the drop value, wherein the drop value The larger the value, the higher the impact on the recognition rate. The advantage of this setting is that the impact of different layers on the recognition rate can be quickly and accurately evaluated. The first precision and the second precision can be set according to actual requirements, and the first precision can be, for example, the initial precision of the neural network, and the precision level of the difference between the first precision and the second precision is not limited. For example, the first precision may be FP32 and the second precision may be FP16.

在一些实施例中,若存在下降值相同的至少两层,则按照与神经网络的输入层的距离大小对所述至少两层进行排序,其中,距离越小,对识别率的影响程度越高。这样设置的好处在于,可以更加合理地对各层进行排序。In some embodiments, if there are at least two layers with the same drop value, the at least two layers are sorted according to the distance from the input layer of the neural network, wherein the smaller the distance, the higher the impact on the recognition rate. . The advantage of this setting is that the layers can be sorted more reasonably.

图4为本发明实施例提供的又一种权重精度配置方法的流程示意图,如图4所示,该方法包括:FIG. 4 is a schematic flowchart of another weight precision configuration method provided by an embodiment of the present invention. As shown in FIG. 4 , the method includes:

步骤401、确定神经网络中的当前目标层。Step 401: Determine the current target layer in the neural network.

其中,所述神经网络中的所有层被按照对识别率的影响程度进行排序,在对应的权重精度未被锁定的层中,影响程度高的层优先被确定为目标层。Wherein, all layers in the neural network are sorted according to the degree of influence on the recognition rate, and among the layers whose corresponding weight accuracy is not locked, the layer with a high degree of influence is preferentially determined as the target layer.

示例性的,在本步骤之前,还可包括:计算神经网络的初始识别率,对于神经网络中的每一层,将当前层的权重精度从第一精度降低至第二精度,并计算神经网络的识别率相对于初始识别率的下降值,依据下降值对所有层进行排序,得到排序结果,其中,下降值越大,表示对识别率的影响程度越高。Exemplarily, before this step, the method may further include: calculating the initial recognition rate of the neural network, for each layer in the neural network, reducing the weight accuracy of the current layer from the first accuracy to the second accuracy, and calculating the neural network. Compared with the drop value of the initial recognition rate, all layers are sorted according to the drop value, and the ranking result is obtained. The larger the drop value, the higher the impact on the recognition rate.

本步骤中,根据排序结果确定当前目标层。在首次执行本步骤时,将下降值最高的层确定为当前目标层,也即将对识别率的影响程序最大的层确定为当前目标层。再次执行本步骤时,根据排序结果切换新的当前目标层,若某层对应的权重精度已被锁定,则不会再次成为目标层,也即不会成为新的当前目标层。In this step, the current target layer is determined according to the sorting result. When this step is performed for the first time, the layer with the highest drop value is determined as the current target layer, that is, the layer with the greatest impact on the recognition rate is determined as the current target layer. When this step is performed again, the new current target layer is switched according to the sorting result. If the weight accuracy corresponding to a layer has been locked, it will not become the target layer again, that is, it will not become the new current target layer.

步骤402、将当前目标层对应的权重精度降低至预设最低精度。Step 402: Reduce the weight precision corresponding to the current target layer to a preset minimum precision.

步骤403、升高当前目标层对应的权重精度。Step 403: Increase the weight precision corresponding to the current target layer.

示例性的,可以将当前目标层对应的权重精度升高一个精度等级。下文中的每次升高,均可以是升高一个精度等级。Exemplarily, the weight precision corresponding to the current target layer may be increased by one precision level. Each increase in the following can be an increase of one precision level.

步骤404、判断神经网络的当前识别率是否大于预设阈值,若是,则执行步骤405;否则,执行步骤407。Step 404: Determine whether the current recognition rate of the neural network is greater than a preset threshold, if so, go toStep 405; otherwise, go toStep 407.

步骤405、将当前目标层对应的权重精度锁定为本次升高前的权重精度。Step 405: Lock the weight precision corresponding to the current target layer to the weight precision before the current increase.

步骤406、判断是否所有层对应的权重精度均被锁定,若是,则结束流程;否则,返回执行步骤401。Step 406 , determine whether the weight precisions corresponding to all layers are locked, and if so, end the process; otherwise, return to step 401 .

示例性的,在当前目标层的位数标志位或调用算子的名称中标记锁定的权重精度。Exemplarily, the locked weight precision is marked in the bit flag of the current target layer or the name of the calling operator.

步骤407、训练神经网络以调节当前目标层的权重参数值,返回执行步骤403。Step 407: Train the neural network to adjust the weight parameter value of the current target layer, and return to step 403.

其中,训练目标为提高神经网络的识别率。Among them, the training goal is to improve the recognition rate of the neural network.

可选的,在人工智能芯片上进行神经网络的训练,训练过程可参考上文相关内容,此处不再赘述。Optionally, the training of the neural network is performed on the artificial intelligence chip. For the training process, reference may be made to the relevant content above, which will not be repeated here.

本发明实施例中,按照对识别率的影响程度对神经网络中的所有层进行排序,并依次尝试对当前目标层的权重精度进行先降至最低再逐渐升高直到神经网络的识别率大于预设阈值,可以快速实现权重精度的配置,在兼顾神经网络的识别率的情况下,有效控制神经网络的模型面积,提高承载神经网络的人工智能芯片中的资源利用率,提高芯片性能,降低芯片功耗。In the embodiment of the present invention, all layers in the neural network are sorted according to the degree of influence on the recognition rate, and the weight accuracy of the current target layer is firstly reduced to the lowest and then gradually increased until the recognition rate of the neural network is greater than the predetermined value. Setting the threshold value can quickly realize the configuration of weight accuracy, and effectively control the model area of the neural network while taking into account the recognition rate of the neural network, improve the resource utilization in the artificial intelligence chip carrying the neural network, improve chip performance, and reduce chip performance. power consumption.

图5为本发明实施例提供的另一种权重精度配置方法的流程示意图,针对神经网络中的所有层对应的权重精度进行多轮升高操作,每轮升高操作中,每层对应的权重精度被升高一次,如图5所示,该方法包括:5 is a schematic flowchart of another method for configuring weight accuracy provided by an embodiment of the present invention. Multiple rounds of increasing operations are performed on the weight accuracy corresponding to all layers in the neural network. In each round of increasing operations, the weight corresponding to each layer is The accuracy is boosted once, as shown in Figure 5, the method consists of:

步骤501、确定神经网络中的当前目标层。Step 501: Determine the current target layer in the neural network.

其中,所述神经网络中的所有层被按照对识别率的影响程度进行排序,影响程度低的层优先被确定为目标层。Wherein, all layers in the neural network are sorted according to the degree of influence on the recognition rate, and the layer with a low degree of influence is preferentially determined as the target layer.

示例性的,在本步骤之前,还可包括:计算神经网络的初始识别率,对于神经网络中的每一层,将当前层的权重精度从第一精度降低至第二精度,并计算神经网络的识别率相对于初始识别率的下降值,依据下降值对所有层进行排序,得到排序结果,其中,下降值越大,表示对识别率的影响程度越高。Exemplarily, before this step, the method may further include: calculating the initial recognition rate of the neural network, for each layer in the neural network, reducing the weight accuracy of the current layer from the first accuracy to the second accuracy, and calculating the neural network. Compared with the drop value of the initial recognition rate, all layers are sorted according to the drop value, and the ranking result is obtained. The larger the drop value, the higher the impact on the recognition rate.

本步骤中,根据排序结果确定当前目标层。在首次执行本步骤时,将下降值最大的层确定为当前目标层,也即将对识别率的影响程序最大的层确定为当前目标层。再次执行本步骤时,根据排序结果切换新的当前目标层。可选的,当所确定的当前目标层的权重精度已被锁定时,可跳过该层,根据排序结果将下一层确定为当前目标层。In this step, the current target layer is determined according to the sorting result. When this step is performed for the first time, the layer with the largest drop value is determined as the current target layer, that is, the layer with the greatest impact on the recognition rate is determined as the current target layer. When this step is performed again, the new current target layer is switched according to the sorting result. Optionally, when the determined weight accuracy of the current target layer has been locked, the layer can be skipped, and the next layer is determined as the current target layer according to the sorting result.

步骤502、判断当前目标层对应的权重精度是否被调整过,若是,则执行步骤504;否则,执行步骤503。Step 502: Determine whether the weight precision corresponding to the current target layer has been adjusted, if so, go toStep 504; otherwise, go toStep 503.

步骤503、将当前目标层对应的权重精度降低至预设最低精度。Step 503: Reduce the weight precision corresponding to the current target layer to a preset minimum precision.

步骤504、升高当前目标层对应的权重精度。Step 504: Increase the weight precision corresponding to the current target layer.

示例性的,可以将当前目标层对应的权重精度升高一个精度等级。下文中的每次升高,均可以是升高一个精度等级。Exemplarily, the weight precision corresponding to the current target layer may be increased by one precision level. Each increase in the following can be an increase of one precision level.

步骤505、判断神经网络的当前识别率是否大于预设阈值,若是,则执行步骤506;否则,执行步骤508。Step 505: Determine whether the current recognition rate of the neural network is greater than a preset threshold, if so, go toStep 506; otherwise, go toStep 508.

步骤506、将当前目标层对应的权重精度锁定为本次升高前的权重精度。Step 506: Lock the weight precision corresponding to the current target layer to the weight precision before the current increase.

步骤507、判断是否所有层对应的权重精度均被锁定,若是,则结束流程;否则,返回执行步骤501。Step 507 , determine whether the weight precisions corresponding to all layers are locked, and if so, end the process; otherwise, return to step 501 .

示例性的,在当前目标层的位数标志位或调用算子的名称中标记锁定的权重精度。Exemplarily, the locked weight precision is marked in the bit flag of the current target layer or the name of the calling operator.

步骤508、暂存升高后的权重精度。Step 508: Temporarily store the increased weight precision.

示例性的,在当前目标层的位数标志位或调用算子的名称中标记暂存的权重精度。Exemplarily, the temporarily stored weight precision is marked in the bit flag of the current target layer or the name of the calling operator.

步骤509、训练神经网络以调节当前目标层的权重参数值,返回执行步骤501。Step 509 , train the neural network to adjust the weight parameter value of the current target layer, and return to step 501 .

其中,训练目标为提高神经网络的识别率。Among them, the training goal is to improve the recognition rate of the neural network.

可选的,在人工智能芯片上进行神经网络的训练,训练过程可参考上文相关内容,此处不再赘述。Optionally, the training of the neural network is performed on the artificial intelligence chip. For the training process, reference may be made to the relevant content above, which will not be repeated here.

本发明实施例中,按照对识别率的影响程度对神经网络中的所有层进行排序,针对神经网络中的所有层对应的权重精度进行多轮升高操作,每轮升高操作中,每层对应的权重精度先被降至最低再被升高一次,直到每层神经网络的识别率大于预设阈值,可以快速均匀地实现权重精度的配置,在兼顾神经网络的识别率的情况下,有效控制神经网络的模型面积,提高承载神经网络的人工智能芯片中的资源利用率,提高芯片性能,降低芯片功耗。In the embodiment of the present invention, all layers in the neural network are sorted according to the degree of influence on the recognition rate, and multiple rounds of raising operations are performed for the weight accuracy corresponding to all layers in the neural network. The corresponding weight accuracy is first reduced to a minimum and then increased again, until the recognition rate of each layer of neural network is greater than the preset threshold, which can quickly and evenly realize the configuration of weight accuracy, taking into account the recognition rate of the neural network. Control the model area of the neural network, improve the resource utilization in the artificial intelligence chip carrying the neural network, improve chip performance, and reduce chip power consumption.

图6为本发明实施例提供的再一种权重精度配置方法的流程示意图,以神经网络为图像识别模型为例,假设图像识别模型为卷积神经网络模型,该方法可包括:6 is a schematic flowchart of still another weight accuracy configuration method provided by an embodiment of the present invention. Taking a neural network as an image recognition model as an example, assuming that the image recognition model is a convolutional neural network model, the method may include:

步骤601、确定图像识别模型中的当前目标层。Step 601: Determine the current target layer in the image recognition model.

其中,图像识别模型中的所有层被按照对识别率的影响程度进行排序,影响程度高的层优先被确定为目标层。Among them, all layers in the image recognition model are sorted according to the degree of influence on the recognition rate, and the layer with a high degree of influence is preferentially determined as the target layer.

示例性的,在本步骤之前,还可包括:计算图像识别模型的初始识别率,对于图像识别模型中的每一层,将当前层的权重精度从第一精度降低至第二精度,并计算图像识别模型的识别率相对于初始识别率的下降值,依据下降值对所有层进行排序,得到排序结果,其中,下降值越大,表示对识别率的影响程度越高。示例性的,图像识别模型可包括卷积层、池化层和全连接层。例如,初始识别率为0.98,卷积层、池化层和全连接层的初始权重精度为FP32。将卷积层的权重精度降低至FP16后,识别率变为0.9,则下降值为0.08;将池化层的权重精度降低至FP16后,识别率变为0.94,则下降值为0.04;将全连接层的权重精度降低至FP16后,识别率变为0.96,则下降值为0.02。则排序结果按照下降值从大到小排序为卷积层、池化层和全连接层。Exemplarily, before this step, it may further include: calculating the initial recognition rate of the image recognition model, for each layer in the image recognition model, reducing the weight accuracy of the current layer from the first accuracy to the second accuracy, and calculating The recognition rate of the image recognition model is relative to the drop value of the initial recognition rate, and all layers are sorted according to the drop value to obtain the ranking result. The larger the drop value, the higher the impact on the recognition rate. Exemplarily, the image recognition model may include convolutional layers, pooling layers and fully connected layers. For example, the initial recognition rate is 0.98, and the initial weight accuracy of convolutional, pooling, and fully connected layers is FP32. After reducing the weight accuracy of the convolutional layer to FP16, the recognition rate becomes 0.9, and the drop value is 0.08; after reducing the weight accuracy of the pooling layer to FP16, the recognition rate becomes 0.94, and the drop value is 0.04; After the weight accuracy of the connection layer is reduced to FP16, the recognition rate becomes 0.96, and the drop value is 0.02. Then the sorting results are sorted into convolutional layers, pooling layers and fully connected layers according to the descending value.

本步骤中,根据排序结果确定当前目标层。在首次执行本步骤时,将卷积层确定为当前目标层,在卷积层的权重精度被锁定后,再将池化层确定为当前目标层,在池化层的权重精度被锁定后,再将全连接层确定为当前目标层。In this step, the current target layer is determined according to the sorting result. When this step is performed for the first time, the convolution layer is determined as the current target layer. After the weight accuracy of the convolution layer is locked, the pooling layer is determined as the current target layer. After the weight accuracy of the pooling layer is locked, The fully connected layer is then determined as the current target layer.

步骤602、将当前目标层对应的权重精度降低至预设最低精度。Step 602: Reduce the weight precision corresponding to the current target layer to a preset minimum precision.

步骤603、升高当前目标层对应的权重精度。Step 603: Increase the weight precision corresponding to the current target layer.

示例性的,可以将当前目标层对应的权重精度升高一个精度等级。下文中的每次升高,均可以是升高一个精度等级。Exemplarily, the weight precision corresponding to the current target layer may be increased by one precision level. Each increase in the following can be an increase of one precision level.

步骤604、判断图像识别模型的当前识别率是否大于预设阈值,若是,则执行步骤605;否则,执行步骤607。Step 604: Determine whether the current recognition rate of the image recognition model is greater than a preset threshold, if so, go toStep 605; otherwise, go toStep 607.

步骤605、将当前目标层对应的权重精度锁定为本次升高前的权重精度。Step 605: Lock the weight precision corresponding to the current target layer to the weight precision before the current increase.

步骤606、判断是否所有层对应的权重精度均被锁定,若是,则结束流程;否则,返回执行步骤601。Step 606 , determine whether the weight precisions corresponding to all layers are locked, and if so, end the process; otherwise, return to step 601 .

示例性的,在当前目标层的位数标志位或调用算子的名称中标记锁定的权重精度。Exemplarily, the locked weight precision is marked in the bit flag of the current target layer or the name of the calling operator.

步骤607、训练图像识别模型以调节当前目标层的权重参数值,返回执行步骤603。Step 607 , train the image recognition model to adjust the weight parameter value of the current target layer, and return to step 603 .

其中,训练目标为提高图像识别模型的识别率。在训练时,可以采用预设数量的图像作为训练样本,将图像训练样本输入到图像识别模型中,以对图像识别模型进行训练。Among them, the training goal is to improve the recognition rate of the image recognition model. During training, a preset number of images can be used as training samples, and the image training samples can be input into the image recognition model to train the image recognition model.

可选的,在人工智能芯片上进行图像识别模型的训练,训练过程可参考上文相关内容。示例性的,通过第一处理核获取图像训练样本数据,并根据图像训练样本数据和卷积层的权重参数计算卷积层的待输出特征图数据,获取池化层的权重精度,将卷积层的待输出特征图数据的精度配置成池化层的权重精度(假设当前目标层为池化层,则这里的权重精度是升高后的权重精度),得到卷积层的输出特征图数据,并输出至第二处理核,通过第二处理核根据卷积层的输出特征图数据和池化层的权重参数计算池化层的待输出特征向量数据,获取全连接层的权重精度,将池化层的待输出特征向量数据的精度配置成全连接层的权重精度,得到池化层的输出特征向量数据,并输出至第三处理核,通过第三处理核根据池化层的输出特征向量数据和全连接层的权重参数计算并输出图像识别结果,以提高图像识别模型的识别率为目标,对全连接层的权重值进行调整。Optionally, the image recognition model is trained on the artificial intelligence chip. For the training process, please refer to the above related content. Exemplarily, the image training sample data is obtained through the first processing core, and the feature map data to be output of the convolution layer is calculated according to the image training sample data and the weight parameters of the convolution layer, the weight accuracy of the pooling layer is obtained, and the convolution layer is obtained. The accuracy of the feature map data to be output from the layer is configured as the weight accuracy of the pooling layer (assuming the current target layer is the pooling layer, the weight accuracy here is the increased weight accuracy), and the output feature map data of the convolution layer is obtained. , and output to the second processing core, the second processing core calculates the output feature vector data of the pooling layer according to the output feature map data of the convolution layer and the weight parameters of the pooling layer, and obtains the weight accuracy of the fully connected layer. The accuracy of the feature vector data to be output from the pooling layer is configured as the weight accuracy of the fully connected layer, and the output feature vector data of the pooling layer is obtained and output to the third processing core, through the third processing core according to the output feature vector of the pooling layer. The weight parameters of the data and the fully connected layer are calculated and output the image recognition result, in order to improve the recognition rate of the image recognition model, the weight value of the fully connected layer is adjusted.

本发明实施例中,按照对识别率的影响程度对图像识别模型中的所有层进行排序,并依次尝试对当前目标层的权重精度进行先降至最低再升高直到图像识别模型的识别率大于预设阈值,可以快速实现权重精度的配置,在兼顾图像识别模型的识别率的情况下,有效控制图像识别模型的模型面积,提高承载图像识别模型的人工智能芯片中的资源利用率,提高芯片性能,降低芯片功耗。In the embodiment of the present invention, all layers in the image recognition model are sorted according to the degree of influence on the recognition rate, and the weight accuracy of the current target layer is tried to be first reduced to the lowest and then increased until the recognition rate of the image recognition model is greater than The preset threshold value can quickly realize the configuration of the weight accuracy, while taking into account the recognition rate of the image recognition model, it can effectively control the model area of the image recognition model, improve the resource utilization in the artificial intelligence chip that carries the image recognition model, and improve the chip performance and reduce chip power consumption.

图7为本发明实施例提供的一种权重精度配置装置的结构框图,该装置可由软件和/或硬件实现,一般可集成在计算机设备中,可通过执行权重精度配置方法来进行权重精度配置。如图7所示,该装置包括:7 is a structural block diagram of a weight precision configuration apparatus provided by an embodiment of the present invention. The apparatus can be implemented by software and/or hardware, and can generally be integrated in computer equipment. The weight precision configuration can be performed by executing a weight precision configuration method. As shown in Figure 7, the device includes:

目标层确定模块701,用于确定神经网络中的当前目标层,其中,所述神经网络中的所有层被按照对识别率的影响程度进行排序,影响程度高的层优先被确定为目标层;The targetlayer determination module 701 is used to determine the current target layer in the neural network, wherein all layers in the neural network are sorted according to the degree of influence on the recognition rate, and the layer with a high degree of influence is preferentially determined as the target layer;

权重精度降低模块702,用于将当前目标层对应的权重精度降低至预设最低精度;A weightprecision reduction module 702, configured to reduce the weight precision corresponding to the current target layer to a preset minimum precision;

权重精度升高模块703,用于升高当前目标层对应的权重精度,并判断所述神经网络的当前识别率是否大于预设阈值,若大于,则将当前目标层对应的权重精度锁定为本次升高前的权重精度;The weightprecision increasing module 703 is used to increase the weight precision corresponding to the current target layer, and judge whether the current recognition rate of the neural network is greater than the preset threshold, if it is greater than, then lock the weight precision corresponding to the current target layer as this Weight accuracy before the next increase;

目标层切换模块704,用于在满足目标层切换条件的情况下,重新确定当前目标层。The targetlayer switching module 704 is configured to re-determine the current target layer when the target layer switching condition is satisfied.

本发明实施例中提供的权重精度配置装置,神经网络中的所有层被按照对识别率的影响程度进行排序,影响程度高的层优先被确定为目标层,先将当前目标层对应的权重精度降低至预设最低精度,再升高当前目标层对应的权重精度,若神经网络的当前识别率大于预设阈值,则将当前目标层对应的权重精度锁定为本次升高前的权重精度,在满足目标层切换条件的情况下,重新确定当前目标层。通过采用上述技术方案,按照对识别率的影响程度对神经网络中的所有层进行排序,并依次尝试对当前目标层的权重精度先降至最低再进行升高,在兼顾神经网络的识别率的情况下,对各层的权重精度上限进行合理控制,提高承载神经网络的人工智能芯片中的资源利用率,提高芯片性能,降低芯片功耗。In the weight precision configuration device provided in the embodiment of the present invention, all layers in the neural network are sorted according to the degree of influence on the recognition rate, and the layer with a high degree of influence is preferentially determined as the target layer, and the weight precision corresponding to the current target layer is firstly determined. Reduce to the preset minimum accuracy, and then increase the weight accuracy corresponding to the current target layer. If the current recognition rate of the neural network is greater than the preset threshold, lock the weight accuracy corresponding to the current target layer to the weight accuracy before this increase. When the target layer switching condition is satisfied, the current target layer is re-determined. By adopting the above technical solution, all layers in the neural network are sorted according to the degree of influence on the recognition rate, and the weight accuracy of the current target layer is tried to be reduced to the lowest first and then increased, taking into account the recognition rate of the neural network. In this case, the upper limit of the weight accuracy of each layer is reasonably controlled to improve the resource utilization in the artificial intelligence chip carrying the neural network, improve the chip performance, and reduce the chip power consumption.

可选的,权重精度升高模块还用于:在判断所述神经网络的当前识别率是否大于预设阈值之后,若小于或等于,则继续升高当前目标层对应的权重精度,并继续判断所述神经网络的当前识别率是否大于所述预设阈值;Optionally, the weight accuracy increasing module is further configured to: after judging whether the current recognition rate of the neural network is greater than a preset threshold, if it is less than or equal to, continue to increase the weight accuracy corresponding to the current target layer, and continue to judge. Whether the current recognition rate of the neural network is greater than the preset threshold;

并且,所述目标层切换条件包括:所述神经网络的当前识别率大于预设阈值;所述影响程度高的层优先被确定为目标层包括:在对应的权重精度未被锁定的层中,影响程度高的层优先被确定为目标层。In addition, the target layer switching condition includes: the current recognition rate of the neural network is greater than a preset threshold; the layer with a high degree of influence is preferentially determined as the target layer includes: in the layer in which the corresponding weight accuracy is not locked, The layer with a high degree of influence is preferentially determined as the target layer.

可选的,针对所述神经网络中的所有层对应的权重精度进行多轮升高操作,每轮升高操作中,每层对应的权重精度被升高最多一次;Optionally, multiple rounds of raising operations are performed on the weight accuracy corresponding to all layers in the neural network, and in each round of raising operation, the weight accuracy corresponding to each layer is raised at most once;

该装置还包括:权重精度暂存模块,在判断所述神经网络的当前识别率是否大于预设阈值之后,若小于或等于,则暂存升高后的权重精度;The device further includes: a weight accuracy temporary storage module, after judging whether the current recognition rate of the neural network is greater than a preset threshold, if it is less than or equal to, temporarily storing the increased weight accuracy;

并且,所述目标层切换条件包括:当前目标层对应的权重精度在本轮升高操作中已被升高一次;所述将当前目标层对应的权重精度降低至预设最低精度包括:若当前目标层对应的权重精度未被调整过,则将当前目标层对应的权重精度降低至预设最低精度。In addition, the target layer switching conditions include: the weight accuracy corresponding to the current target layer has been increased once in the current round of raising operations; the reducing the weight accuracy corresponding to the current target layer to a preset minimum accuracy includes: if the current If the weight precision corresponding to the target layer has not been adjusted, the weight precision corresponding to the current target layer is reduced to the preset minimum precision.

可选的,所述重新确定当前目标层,包括:Optionally, the re-determining the current target layer includes:

重新确定当前目标层,直到所有层对应的权重精度均被锁定。Re-determine the current target layer until the weight accuracy corresponding to all layers is locked.

可选的,该装置还包括:Optionally, the device further includes:

训练模块,用于在判断所述神经网络的当前识别率是否大于预设阈值之后,若小于或等于,则训练所述神经网络以调节当前目标层的权重参数值,其中,训练目标为提高所述神经网络的识别率。The training module is used to train the neural network to adjust the weight parameter value of the current target layer after judging whether the current recognition rate of the neural network is greater than a preset threshold, if it is less than or equal to The recognition rate of the neural network.

可选的,所述训练所述神经网络包括在人工智能芯片上训练所述神经网络;Optionally, the training of the neural network includes training the neural network on an artificial intelligence chip;

在训练所述神经网络的过程中,包括:In the process of training the neural network, including:

获取所述神经网络中的第一层的待输出数据的精度,其中,所述第一层包括所述神经网络中的最后一层之外的任意一层或多层;Acquiring the accuracy of the data to be output in the first layer in the neural network, wherein the first layer includes any layer or layers other than the last layer in the neural network;

获取第二层的权重精度,其中,所述第二层为所述第一层的下一层;obtaining the weight accuracy of the second layer, wherein the second layer is the next layer of the first layer;

根据所述第二层的权重精度对所述待输出数据的精度进行配置。The precision of the data to be output is configured according to the weight precision of the second layer.

可选的,通过以下方式对所述神经网络中的所有层按照对识别率的影响程度进行排序:Optionally, all layers in the neural network are sorted according to the degree of influence on the recognition rate in the following manner:

计算所述神经网络的初始识别率;calculating the initial recognition rate of the neural network;

对于所述神经网络中的每一层,将当前层的权重精度从第一精度降低至第二精度,并计算所述神经网络的识别率相对于所述初始识别率的下降值;For each layer in the neural network, reduce the weight accuracy of the current layer from the first accuracy to the second accuracy, and calculate the drop value of the recognition rate of the neural network relative to the initial recognition rate;

依据所述下降值对所有层进行排序,其中,所述下降值越大,对识别率的影响程度越高。All layers are sorted according to the drop value, wherein the larger the drop value, the higher the impact on the recognition rate.

可选的,若存在下降值相同的至少两层,则按照与神经网络的输入层的距离大小对所述至少两层进行排序,其中,距离越小,对识别率的影响程度越高。Optionally, if there are at least two layers with the same drop value, the at least two layers are sorted according to the distance from the input layer of the neural network, wherein the smaller the distance, the higher the impact on the recognition rate.

本发明实施例提供了一种计算机设备,该计算机设备中可集成本发明实施例提供的权重精度配置装置。图8为本发明实施例提供的一种计算机设备的结构框图。计算机设备800可以包括:存储器801,处理器802及存储在存储器801上并可在处理器运行的计算机程序,所述处理器802执行所述计算机程序时实现如本发明实施例所述的权重精度配置方法。需要说明的是,若在人工智能芯片上训练神经网络,则计算机设备800中还可以包含人工智能芯片。或者,若将计算机设备800记为第一计算机设备,则也可在其他包含人工智能芯片的第二计算机设备中进行训练,并由第二计算机设备将训练结果发送给第一计算机设备。An embodiment of the present invention provides a computer device, in which the weight precision configuration apparatus provided by the embodiment of the present invention can be integrated. FIG. 8 is a structural block diagram of a computer device according to an embodiment of the present invention. Thecomputer device 800 may include: amemory 801, aprocessor 802, and a computer program stored on thememory 801 and executed by the processor, and theprocessor 802 implements the weighting precision described in the embodiments of the present invention when theprocessor 802 executes the computer program configuration method. It should be noted that, if the neural network is trained on an artificial intelligence chip, thecomputer device 800 may also include an artificial intelligence chip. Alternatively, if thecomputer device 800 is recorded as the first computer device, training can also be performed in other second computer devices including artificial intelligence chips, and the second computer device sends the training results to the first computer device.

本发明实施例提供的计算机设备,按照对识别率的影响程度对神经网络中的所有层进行排序,并依次尝试对当前目标层的权重精度先降至最低再进行升高,在兼顾神经网络的识别率的情况下,对各层的权重精度上限进行合理控制,提高承载神经网络的人工智能芯片中的资源利用率,提高芯片性能,降低芯片功耗。The computer equipment provided by the embodiment of the present invention sorts all layers in the neural network according to the degree of influence on the recognition rate, and sequentially tries to reduce the weight accuracy of the current target layer to the lowest and then increase it, taking into account the accuracy of the neural network. In the case of the recognition rate, the upper limit of the weight accuracy of each layer is reasonably controlled to improve the resource utilization in the artificial intelligence chip carrying the neural network, improve the chip performance, and reduce the chip power consumption.

本发明实施例还提供一种包含计算机可执行指令的存储介质,所述计算机可执行指令在由计算机处理器执行时用于执行权重精度配置方法。Embodiments of the present invention further provide a storage medium containing computer-executable instructions, where the computer-executable instructions are used to execute the weight precision configuration method when executed by a computer processor.

上述实施例中提供的权重精度配置装置、设备及存储介质可执行本发明任意实施例所提供的权重精度配置方法,具备执行该方法相应的功能模块和有益效果。未在上述实施例中详尽描述的技术细节,可参见本发明任意实施例所提供的权重精度配置方法。The weight precision configuration apparatus, device, and storage medium provided in the above embodiments can execute the weight precision configuration method provided by any embodiment of the present invention, and have corresponding functional modules and beneficial effects for executing the method. For technical details not described in detail in the foregoing embodiments, reference may be made to the weight precision configuration method provided by any embodiment of the present invention.

注意,上述仅为本发明的较佳实施例及所运用技术原理。本领域技术人员会理解,本发明不限于这里所述的特定实施例,对本领域技术人员来说能够进行各种明显的变化、重新调整和替代而不会脱离本发明的保护范围。因此,虽然通过以上实施例对本发明进行了较为详细的说明,但是本发明不仅仅限于以上实施例,在不脱离本发明构思的情况下,还可以包括更多其他等效实施例,而本发明的范围由所附的权利要求范围决定。Note that the above are only preferred embodiments of the present invention and applied technical principles. Those skilled in the art will understand that the present invention is not limited to the specific embodiments described herein, and various obvious changes, readjustments and substitutions can be made by those skilled in the art without departing from the protection scope of the present invention. Therefore, although the present invention has been described in detail through the above embodiments, the present invention is not limited to the above embodiments, and can also include more other equivalent embodiments without departing from the concept of the present invention. The scope is determined by the scope of the appended claims.

Claims (11)

CN202010659069.0A2020-07-092020-07-09Weight precision configuration method, device, equipment and storage mediumActiveCN111831356B (en)

Priority Applications (3)

Application NumberPriority DateFiling DateTitle
CN202010659069.0ACN111831356B (en)2020-07-092020-07-09Weight precision configuration method, device, equipment and storage medium
PCT/CN2021/105172WO2022007879A1 (en)2020-07-092021-07-08Weight precision configuration method and apparatus, computer device, and storage medium
US18/015,065US11797850B2 (en)2020-07-092021-07-08Weight precision configuration method and apparatus, computer device and storage medium

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202010659069.0ACN111831356B (en)2020-07-092020-07-09Weight precision configuration method, device, equipment and storage medium

Publications (2)

Publication NumberPublication Date
CN111831356Atrue CN111831356A (en)2020-10-27
CN111831356B CN111831356B (en)2023-04-07

Family

ID=72901207

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202010659069.0AActiveCN111831356B (en)2020-07-092020-07-09Weight precision configuration method, device, equipment and storage medium

Country Status (1)

CountryLink
CN (1)CN111831356B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
WO2022007879A1 (en)*2020-07-092022-01-13北京灵汐科技有限公司Weight precision configuration method and apparatus, computer device, and storage medium
WO2023279946A1 (en)*2021-07-092023-01-12寒武纪(西安)集成电路有限公司Processing apparatus, device, method, and related product

Citations (16)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN107102727A (en)*2017-03-172017-08-29武汉理工大学Dynamic gesture study and recognition methods based on ELM neutral nets
CN108009625A (en)*2016-11-012018-05-08北京深鉴科技有限公司Method for trimming and device after artificial neural network fixed point
CN108229670A (en)*2018-01-052018-06-29中国科学技术大学苏州研究院Deep neural network based on FPGA accelerates platform
CN108345939A (en)*2017-01-252018-07-31微软技术许可有限责任公司Neural network based on fixed-point calculation
US20180322391A1 (en)*2017-05-052018-11-08Nvidia CorporationLoss-scaling for deep neural network training with reduced precision
US20190050710A1 (en)*2017-08-142019-02-14Midea Group Co., Ltd.Adaptive bit-width reduction for neural networks
CN109740508A (en)*2018-12-292019-05-10北京灵汐科技有限公司 A kind of image processing method and neural network system based on neural network system
CN109800877A (en)*2019-02-202019-05-24腾讯科技(深圳)有限公司Parameter regulation means, device and the equipment of neural network
CN109993300A (en)*2017-12-292019-07-09华为技术有限公司Training method and device of neural network model
CN110009100A (en)*2019-03-282019-07-12北京中科寒武纪科技有限公司The calculation method and Related product of customized operator
CN110163368A (en)*2019-04-182019-08-23腾讯科技(深圳)有限公司Deep learning model training method, apparatus and system based on mixed-precision
WO2019165602A1 (en)*2018-02-282019-09-06深圳市大疆创新科技有限公司Data conversion method and device
CN110738315A (en)*2018-07-182020-01-31华为技术有限公司neural network precision adjusting method and device
US20200042856A1 (en)*2018-07-312020-02-06International Business Machines CorporationScheduler for mapping neural networks onto an array of neural cores in an inference processing unit
WO2020092532A1 (en)*2018-10-302020-05-07Google LlcQuantizing trained long short-term memory neural networks
CN111339027A (en)*2020-02-252020-06-26中国科学院苏州纳米技术与纳米仿生研究所Automatic design method of reconfigurable artificial intelligence core and heterogeneous multi-core chip

Patent Citations (18)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN108009625A (en)*2016-11-012018-05-08北京深鉴科技有限公司Method for trimming and device after artificial neural network fixed point
CN108345939A (en)*2017-01-252018-07-31微软技术许可有限责任公司Neural network based on fixed-point calculation
CN107102727A (en)*2017-03-172017-08-29武汉理工大学Dynamic gesture study and recognition methods based on ELM neutral nets
US20180322391A1 (en)*2017-05-052018-11-08Nvidia CorporationLoss-scaling for deep neural network training with reduced precision
CN110073371A (en)*2017-05-052019-07-30辉达公司Loss scaling for deep neural network training with reduced accuracy
CN110799994A (en)*2017-08-142020-02-14美的集团股份有限公司 Adaptive Bit Width Reduction for Neural Networks
US20190050710A1 (en)*2017-08-142019-02-14Midea Group Co., Ltd.Adaptive bit-width reduction for neural networks
CN109993300A (en)*2017-12-292019-07-09华为技术有限公司Training method and device of neural network model
CN108229670A (en)*2018-01-052018-06-29中国科学技术大学苏州研究院Deep neural network based on FPGA accelerates platform
WO2019165602A1 (en)*2018-02-282019-09-06深圳市大疆创新科技有限公司Data conversion method and device
CN110738315A (en)*2018-07-182020-01-31华为技术有限公司neural network precision adjusting method and device
US20200042856A1 (en)*2018-07-312020-02-06International Business Machines CorporationScheduler for mapping neural networks onto an array of neural cores in an inference processing unit
WO2020092532A1 (en)*2018-10-302020-05-07Google LlcQuantizing trained long short-term memory neural networks
CN109740508A (en)*2018-12-292019-05-10北京灵汐科技有限公司 A kind of image processing method and neural network system based on neural network system
CN109800877A (en)*2019-02-202019-05-24腾讯科技(深圳)有限公司Parameter regulation means, device and the equipment of neural network
CN110009100A (en)*2019-03-282019-07-12北京中科寒武纪科技有限公司The calculation method and Related product of customized operator
CN110163368A (en)*2019-04-182019-08-23腾讯科技(深圳)有限公司Deep learning model training method, apparatus and system based on mixed-precision
CN111339027A (en)*2020-02-252020-06-26中国科学院苏州纳米技术与纳米仿生研究所Automatic design method of reconfigurable artificial intelligence core and heterogeneous multi-core chip

Cited By (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
WO2022007879A1 (en)*2020-07-092022-01-13北京灵汐科技有限公司Weight precision configuration method and apparatus, computer device, and storage medium
US11797850B2 (en)2020-07-092023-10-24Lynxi Technologies Co., Ltd.Weight precision configuration method and apparatus, computer device and storage medium
WO2023279946A1 (en)*2021-07-092023-01-12寒武纪(西安)集成电路有限公司Processing apparatus, device, method, and related product

Also Published As

Publication numberPublication date
CN111831356B (en)2023-04-07

Similar Documents

PublicationPublication DateTitle
CN111831359B (en)Weight precision configuration method, device, equipment and storage medium
CN111831355B (en)Weight precision configuration method, device, equipment and storage medium
CN113449859B (en) A data processing method and device thereof
EP4036803A1 (en)Neural network model processing method and apparatus, computer device, and storage medium
CN107862374B (en)Neural network processing system and processing method based on assembly line
EP3407266B1 (en)Artificial neural network calculating device and method for sparse connection
CN111831358A (en) Weight accuracy configuration method, device, device and storage medium
CN111144561B (en)Neural network model determining method and device
US20180197084A1 (en)Convolutional neural network system having binary parameter and operation method thereof
JP2023510566A (en) Adaptive search method and apparatus for neural networks
CN112541159A (en)Model training method and related equipment
CN112163601B (en) Image classification method, system, computer equipment and storage medium
KR20200111948A (en)A method for processing artificial neural network and electronic device therefor
CN111831354B (en)Data precision configuration method, device, chip array, equipment and medium
CN111931917A (en)Forward computing implementation method and device, storage medium and electronic device
CN108734270A (en)A kind of compatible type neural network accelerator and data processing method
CN116266274A (en)Neural network adjusting method and corresponding device
CN111831356B (en)Weight precision configuration method, device, equipment and storage medium
CN114707643A (en)Model segmentation method and related equipment thereof
CN119272234A (en) Operator fusion method, system, device and medium
WO2021120036A1 (en)Data processing apparatus and data processing method
CN119539075A (en) Model training reasoning method, device, equipment, medium and program product
CN113688988A (en)Precision adjustment method and device, and storage medium
WO2024239927A1 (en)Model training method and related device
CN114595819A (en) Processing core with data-associative adaptive rounding

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp