CN110929862B

Movatterモバイル変換

Info

Publication number: CN110929862B
Application number: CN201911174616.XA
Authority: CN
Inventors: 陈子祺; 田甲
Original assignee: Individual
Current assignee: Individual
Priority date: 2019-11-26
Filing date: 2019-11-26
Publication date: 2023-08-01
Anticipated expiration: 2039-11-26
Also published as: CN110929862A

Abstract

Translated fromChinese

本申请涉及一种定点化的神经网络模型量化方法和装置。包括以下步骤：校验阶段：校验图模型为有向无环图，多输入图模型转换为单输入模型；准备阶段：将图模型进行等价转换便于后续量化；标度阶段：输入所有样本，在浮点模型中执行，统计模型内部各算子的输出，根据输出数据的特征预测全部样本中算子可能的输出域值；量化阶段：将模型按照拓扑排序对算子进行定点化转换。本申请能够有效降低模型的存储和计算开销，消除浮点运算中的舍入误差带来的不确定性，提高深度神经网络模型的高效性、透明性和安全性。

The present application relates to a fixed-point neural network model quantification method and device. It includes the following steps: verification stage: verify that the graph model is a directed acyclic graph, and convert the multi-input graph model into a single-input model; preparation stage: convert the graph model to facilitate subsequent quantification; scale stage: input all samples , executed in the floating-point model, statistics the output of each operator in the model, and predicts the possible output domain values of the operators in all samples according to the characteristics of the output data; quantization stage: performs fixed-point conversion of the operators according to the topological sorting of the model. The application can effectively reduce the storage and calculation overhead of the model, eliminate the uncertainty caused by the rounding error in the floating-point operation, and improve the efficiency, transparency and security of the deep neural network model.

Description

Translated fromChinese

定点化的神经网络模型量化装置和方法Fixed-point neural network model quantization device and method

技术领域technical field

本申请涉及一种定点化的神经网络模型量化装置和方法，适用于人工智能的技术领域。The present application relates to a fixed-point neural network model quantification device and method, which are suitable for the technical field of artificial intelligence.

背景技术Background technique

深度神经网络模型被广泛应用在图像分类、物体检测等机器视觉任务中，并取得了巨大成功。然而，由于存储空间和功耗的限制，神经网络模型在嵌入式芯片和特别设计的神经网络芯片上的存储与计算仍然是一个巨大的挑战。同时，现有神经网络模型在设计时通常只考虑了准确率，而没有考虑运算的可重现性和一致性，其运算结果在不同的架构甚至相同的计算环境下存在不一致的可能。受此影响，神经网络算法在金融、可信计算、区块链、智能合约等安全需求更高的领域的应用受到了很大的限制。Deep neural network models are widely used in machine vision tasks such as image classification and object detection, and have achieved great success. However, due to the limitations of storage space and power consumption, the storage and computation of neural network models on embedded chips and specially designed neural network chips is still a great challenge. At the same time, the design of existing neural network models usually only considers the accuracy rate, but does not consider the reproducibility and consistency of the operation, and the operation results may be inconsistent under different architectures or even the same computing environment. Affected by this, the application of neural network algorithms in fields with higher security requirements such as finance, trusted computing, blockchain, and smart contracts has been greatly restricted.

模型定点化，即是将深度神经网络的浮点运算转换为整形运算，可以从两个方面解决上述问题。其一是作为深度学习领域最广泛采用的模型压缩方法之一，能够降低模型的存储和计算开销，其二定点化的整形操作能够避免浮点运算中的舍入误差，消除运算中的不确定性。The fixed-point model is to convert the floating-point operations of the deep neural network into plastic operations, which can solve the above problems from two aspects. One is that as one of the most widely used model compression methods in the field of deep learning, it can reduce the storage and calculation overhead of the model, and the second fixed-point shaping operation can avoid rounding errors in floating-point operations and eliminate uncertainties in operations sex.

现有主流量化方法是将参数域映射到离散整数域上，例如将卷积核参数映射到INT8整数域，根据离散整数域是否对称可分为对称量化和非对称量化，同时可以考虑是否基于模型通道做量化，是否需要增加零点偏移量来提升模型量化能力。然而，在一方面，现有的模型量化技术不够成熟，在模型性能得到提升的同时，模型的精度却不能得到有效的保证。在另一方面，现有模型量化装置中大多只是针对某些特定的算子(例如卷积算子，矩阵乘法算子)进行加速，但在计算过程中仍然存在大量的浮点数中间值，这种半整形量化仍然无法完全避免模型运算中的舍入误差。The existing mainstream quantization method is to map the parameter domain to the discrete integer domain, for example, to map the convolution kernel parameters to the INT8 integer domain. According to whether the discrete integer domain is symmetrical, it can be divided into symmetric quantization and asymmetric quantization. At the same time, it can be considered whether it is based on the model If the channel is quantized, do you need to increase the zero offset to improve the quantization capability of the model? However, on the one hand, the existing model quantization technology is not mature enough. While the performance of the model is improved, the accuracy of the model cannot be effectively guaranteed. On the other hand, most of the existing model quantization devices only accelerate some specific operators (such as convolution operators, matrix multiplication operators), but there are still a large number of floating-point intermediate values in the calculation process, which This semi-integer quantization still cannot completely avoid rounding errors in model operations.

发明内容Contents of the invention

本申请的目的是提供一种定点化的神经网络模型量化装置和方法，能够有效降低模型的存储和计算开销，消除浮点运算中的舍入误差带来的不确定性，提高深度神经网络模型的高效性、透明性和安全性。The purpose of this application is to provide a fixed-point neural network model quantization device and method, which can effectively reduce the storage and calculation costs of the model, eliminate the uncertainty caused by rounding errors in floating-point operations, and improve the accuracy of the deep neural network model. efficiency, transparency and security.

本申请涉及一种定点化的神经网络模型量化装置，包括：This application relates to a fixed-point neural network model quantification device, including:

模型存储器：被配置为至少存储一个模型；model storage: configured to store at least one model;

数据存储器：被配置为存储数据；data storage: configured to store data;

算子定点化处理器：被配置为执行至少一个程序来定点化神经网络中的算子；Operator fixed-point processor: configured to execute at least one program to fix the operator in the neural network;

中央处理器：从模型存储器和数据存储器中读取模型和数据，调用算子定点化处理器中的对应算子，统计在实际执行样本数据时内部各个算子的输出结果。Central processing unit: read the model and data from the model memory and data memory, call the corresponding operator in the operator fixed-point processor, and count the output results of each internal operator when the sample data is actually executed.

优选地，所述中央处理器包括以下程序单元：Preferably, the central processing unit includes the following program units:

读取程序单元，其从模型存储器中读取模型，且从数据存储器中读取样本数据；a read program unit that reads a model from a model memory and reads sample data from a data memory;

检验程序单元，其对模型做算子的拓扑排序，按照顺序调用算子执行处理器中对应算子的检验配置；The verification program unit performs topological sorting of the operators on the model, calls the operators in order to execute the verification configuration of the corresponding operators in the processor;

准备程序单元，其对模型做算子的拓扑排序，按照顺序调用算子执行处理器中对应算子的准备配置；Prepare the program unit, which performs the topological sorting of the operators on the model, and calls the operators in order to execute the preparation and configuration of the corresponding operators in the processor;

标度程序单元，其根据读取的样本数据，统计在实际执行样本数据时内部各个算子的输出结果；Scaling program unit, which counts the output results of each internal operator when the sample data is actually executed according to the sample data read;

量化程序单元，其对模型做算子的拓扑排序，按照顺序调用算子定点化处理器中对应算子的量化配置。The quantization program unit performs topological sorting of operators on the model, and calls the quantization configuration of corresponding operators in the operator fixed-point processor in sequence.

优选地，还包括再量化装置，其被配置为执行整形数据精度再量化程序；每个算子配置为可插拔的算子执行处理器；tanh，simgoid，exp算子处理器配置量化阶段，采用查表法，将输入整形数据对应的原浮点数映射到离散域INT16，建立索引表，处理器定点化为将算子变换为索引指令和索引表；更优选地，tanh，simgoid，exp算子处理器配置最大输入精度为16。Preferably, a requantization device is also included, which is configured to execute the shaping data precision requantization program; each operator is configured as a pluggable operator execution processor; tanh, simgoid, and exp operator processors configure the quantization stage, Using the look-up table method, the original floating-point number corresponding to the input plastic data is mapped to the discrete domain INT16, an index table is established, and the processor is fixed-pointed to convert the operator into an index instruction and an index table; more preferably, tanh, simgoid, and exp are calculated The subprocessor is configured with a maximum input precision of 16.

优选地，softmax算子处理器配置量化阶段，采用查表法和整形数据操作，即先根据算子查表法量化，再进行整形加除法；其中，原表达式中浮点除法定点化后期望离散操作为就近取整，故转换为整数除法后分子需要加以分母的半数；更优选地，softmax算子处理器配置最大精度为16。Preferably, the softmax operator processor is configured with a quantization stage, using table lookup and plastic data operations, that is, first quantifying according to the operator lookup table method, and then performing plastic addition and division; wherein, the floating point division method in the original expression is expected to The discrete operation is rounded to the nearest integer, so after converting to integer division, the numerator needs to be half of the denominator; more preferably, the softmax operator processor is configured with a maximum precision of 16.

优选地，卷积，矩阵乘处理器配置校验阶段，只支持2D-NCHW输入格式；卷积，矩阵乘处理器配置准备阶段，当矩阵乘积的结果造成数据上溢时，需要进行矩阵分解操作，将大矩阵乘积算子转换成众多小矩阵算子合并相加；卷积，矩阵乘处理器配置量化阶段，原浮点卷积和矩阵乘操作可等价转换为整形同等算子；更优选地，卷积，矩阵乘处理器配置最大精度为8。Preferably, convolution, matrix multiplication processor configuration verification stage, only supports 2D-NCHW input format; convolution, matrix multiplication processor configuration preparation stage, when the result of matrix multiplication causes data overflow, matrix decomposition operation is required , convert the large matrix multiplication operator into many small matrix operators and merge and add; convolution, matrix multiplication processor configuration quantization stage, the original floating-point convolution and matrix multiplication operations can be equivalently converted into integer equivalent operators; more preferably The ground, convolution, and matrix multiplication processors are configured with a maximum precision of 8.

优选地，归一化算子处理器配置准备阶段，归一化操作可被等价转换为矩阵乘法和加法；或者归一化算子处理器配置准备阶段，如果输入数据为卷积操作结果，归一化操作可被合并至卷积操作。Preferably, the normalization operator processor is configured in the preparation stage, and the normalization operation can be equivalently converted into matrix multiplication and addition; or the normalization operator processor is configured in the preparation stage, if the input data is the result of the convolution operation, Normalization operations can be incorporated into convolution operations.

优选地，relu算子处理器输入精度不做限制；配置准备阶段，如果其子节点为转置操作，则将该节点和子节点顺序对调；其他阶段采用默认操作。Preferably, there is no limit to the input precision of the relu operator processor; in the configuration preparation stage, if its child node is a transpose operation, the order of the node and child nodes is reversed; other stages adopt default operations.

优选地，自动扩展矩阵乘算子处理器配置输入精度16，其他阶段采用默认操作；或者Preferably, the automatic expansion matrix multiplication operator processor is configured with an input precision of 16, and other stages adopt default operations; or

维度加法算子处理器配置输入精度为8，其他阶段采用默认操作；或者The dimension addition operator processor is configured with an input precision of 8, and the default operation is used for other stages; or

矩阵加、减算子处理器配置输入精度为16，其他阶段采用默认操作；或者The matrix addition and subtraction operator processor is configured with an input precision of 16, and other stages adopt the default operation; or

自动扩展矩阵加、减算子处理器配置输入精度为16，其他阶段采用默认操作；或者Automatically expand the matrix addition and subtraction operator processor to configure the input precision to be 16, and use the default operation for other stages; or

矩阵链接算子处理器输入精度不做限制，其他阶段采用默认操作；或者There is no limit to the input precision of the matrix link operator processor, and the default operation is used in other stages; or

内嵌算子处理器输入精度不做限制，其他阶段采用默认操作；或者There is no limit to the input precision of the embedded operator processor, and the default operation is used in other stages; or

最大值池化算子处理器输入精度不做限制，其他阶段采用默认操作；或者There is no limit to the input precision of the maximum pooling operator processor, and the default operation is used in other stages; or

平均值池化算子处理器配置校验阶段，池化核窗口滑动时计算平均值至少应包含周边补齐窗格；或者During the verification phase of the average pooling operator processor configuration, the average value calculated when the pooling core window slides should at least include the peripheral padding pane; or

矩阵截取，截断算子处理器输入精度不做限制，其他阶段采用默认操作；或者For matrix truncation, there is no limit to the input precision of the truncation operator processor, and the default operation is used in other stages; or

矩阵取反，维度重复，重复链接算子处理器输入精度不做限制，其他阶段采用默认操作；或者Invert the matrix, repeat the dimension, and do not limit the input precision of the repeated link operator processor, and use the default operation in other stages; or

维度扩展，消除，重整算子处理器输入精度不做限制，其他阶段采用默认操作；或者There is no limit to the input precision of dimension expansion, elimination, and reshaping operator processors, and default operations are used in other stages; or

转置算子处理器输入精度不做限制；配置准备阶段，如果输入数据为转置操作结果，可合成一个转置操作；其他阶段采用默认操作；或者There is no limit to the input precision of the transpose operator processor; in the configuration preparation stage, if the input data is the result of a transpose operation, a transpose operation can be synthesized; the default operation is used in other stages; or

扁平化，最大，最小值，超采样算子处理器输入精度不做限制；其他阶段采用默认操作。There is no limit to the input precision of flattening, maximum, minimum, and oversampling operator processors; default operations are used for other stages.

本申请还涉及一种定点化的神经网络模型量化方法，使用如上所述的神经网络模型量化装置，包括以下步骤：The present application also relates to a fixed-point neural network model quantification method, using the above-mentioned neural network model quantification device, including the following steps:

校验阶段：校验图模型为有向无环图，多输入图模型转换为单输入模型；Verification phase: the verification graph model is a directed acyclic graph, and the multi-input graph model is converted into a single-input model;

准备阶段：将图模型进行等价转换便于后续量化；Preparation stage: Perform equivalent conversion of the graphical model to facilitate subsequent quantification;

标度阶段：输入所有样本，在浮点模型中执行，统计模型内部各算子的输出，根据输出数据的特征预测全部样本中算子可能的输出域值；Scaling stage: input all samples, execute in the floating-point model, count the output of each operator in the model, and predict the possible output domain values of the operators in all samples according to the characteristics of the output data;

量化阶段：将模型按照拓扑排序对算子进行定点化转换。Quantization stage: The model is converted to a fixed point according to the topological sorting of the operator.

优选地，还包括再量化阶段，其中，对算子设置输入数据的最大精度，在算子输入数据精度大于设置的最大精度时，对输入数据做降精度处理。Preferably, a requantization stage is also included, wherein the maximum precision of the input data is set for the operator, and when the precision of the input data of the operator is greater than the set maximum precision, the precision reduction process is performed on the input data.

本申请还涉及一种计算机可读介质，存储有使计算机执行以下操作的指令：The present application also relates to a computer-readable medium storing instructions for causing a computer to perform the following operations:

(1)校验图模型为有向无环图，多输入图模型转换为单输入模型；(1) The verification graph model is a directed acyclic graph, and the multi-input graph model is converted into a single-input model;

(2)将图模型进行等价转换便于后续量化；(2) Perform equivalent conversion of the graphical model to facilitate subsequent quantification;

(3)输入所有样本，在浮点模型中执行，统计模型内部各算子的输出，根据输出数据的特征预测全部样本中算子可能的输出域值；(3) Input all samples, execute in the floating-point model, count the output of each operator in the model, and predict the possible output domain value of the operator in all samples according to the characteristics of the output data;

(4)将模型按照拓扑排序对算子进行定点化转换。(4) Carry out fixed-point transformation of the operator according to the topological sorting of the model.

优选地，还存储有使计算机执行以下操作的指令：对算子设置输入数据的最大精度，在算子输入数据精度大于设置的最大精度时，对输入数据做降精度处理。Preferably, instructions are also stored for the computer to perform the following operations: set the maximum precision of the input data for the operator, and perform precision reduction processing on the input data when the precision of the input data of the operator is greater than the set maximum precision.

根据本申请的定点化的神经网络模型量化装置和方法，具有以下技术优势：According to the fixed-point neural network model quantification device and method of the present application, it has the following technical advantages:

(1)多种图等价转化设计能够有效降低图的计算量，提高深度神经网络模型的执行效率和透明性，使深度神经网络模型能够更好地应用于嵌入式芯片和神经网络推理芯片；(1) A variety of graph equivalent conversion designs can effectively reduce the amount of graph calculation, improve the execution efficiency and transparency of the deep neural network model, and enable the deep neural network model to be better applied to embedded chips and neural network reasoning chips;

(2)对接了定点化模型的算子协议，实现了从普通浮点模型到定点化模型的转换；(2) The operator protocol of the fixed-point model is connected, and the conversion from the ordinary floating-point model to the fixed-point model is realized;

(3)在量化过程能够实现全整型量化，执行过程没有任何的浮点数据，消除了浮点运算中的舍入误差，以达到模型计算的确定性；(3) Full-integer quantization can be realized in the quantization process, and there is no floating-point data in the execution process, which eliminates rounding errors in floating-point operations to achieve the certainty of model calculations;

(4)在量化过程能够得到更好的精度，无须更多后续的模型微调复杂操作，使用便捷。(4) Better accuracy can be obtained in the quantization process, and there is no need for more subsequent complex operations of model fine-tuning, which is convenient to use.

附图说明Description of drawings

图1为本申请中的数据离散化方法的示意图。FIG. 1 is a schematic diagram of the data discretization method in this application.

图2为本申请的定点化的神经网络模型量化装置的示意图。FIG. 2 is a schematic diagram of a fixed-point neural network model quantization device of the present application.

图3为本申请的中央处理器的处理流程模块的示意图。FIG. 3 is a schematic diagram of processing flow modules of the central processing unit of the present application.

具体实施方式Detailed ways

为使本申请的目的、技术方案和优点更加清楚明白，下文中将结合附图对本申请的实施例进行详细说明。需要说明的是，在不冲突的情况下，本申请中的实施例及实施例中的特征可以相互任意组合。In order to make the purpose, technical solution and advantages of the application clearer, the embodiments of the application will be described in detail below in conjunction with the accompanying drawings. It should be noted that, in the case of no conflict, the embodiments in the present application and the features in the embodiments can be combined arbitrarily with each other.

根据本申请的第一方面，一种定点化的神经网络模型量化方法的流程为：According to the first aspect of the present application, the flow of a fixed-point neural network model quantification method is as follows:

校验阶段：校验图模型为有向无环图，多输入图模型转换为单输入模型；其中，图模型无重复命名符号，并去除模型无用参数等方面，如果检验图模型不符合标准，则可以将报错信息反馈给用户；其中，还可以进行算子的相关校验，此部分内容可参考后续算子定点化处理器中的配置；Verification stage: the verification graph model is a directed acyclic graph, and the multi-input graph model is converted into a single-input model; among them, the graph model has no repeated naming symbols, and the useless parameters of the model are removed. If the verification graph model does not meet the standard, Then the error information can be fed back to the user; among them, the relevant verification of the operator can also be performed. For this part, please refer to the configuration in the follow-up operator fixed-point processor;

准备阶段：将图模型进行等价转换便于后续量化，算子相关等价转换可参考后续算子定点化处理器的配置；Preparation stage: Perform equivalent conversion of the graph model to facilitate subsequent quantification. Operator-related equivalent conversion can refer to the configuration of subsequent operator fixed-point processors;

标度阶段：输入所有样本，在浮点模型中执行，统计模型内部各算子的输出，根据输出数据的最大最小值、均值、方差等特征预测全部样本中算子可能的输出域值。考虑量化实际因素和时间成本，采集小部分样本(测试中使用16个样本)来模拟全部数据预测结果特征，而实际量化效果并不次于全部样本数据；Scaling phase: Input all samples, execute in the floating-point model, count the output of each operator in the model, and predict the possible output domain values of the operators in all samples according to the characteristics of the maximum, minimum, mean, and variance of the output data. Considering the quantification of actual factors and time costs, a small number of samples were collected (16 samples were used in the test) to simulate the characteristics of all data prediction results, and the actual quantification effect is not inferior to all sample data;

量化阶段：将模型按照拓扑排序对算子进行定点化转换，定点化工作使得算子从接受浮点数据转换为可接受带固定缩放因子m的整形数据；故而处理某算子时，可归纳假设其输入数据X已被定点化，被映射到整形数域INTp上，其中p定义为输入X的精度，固定缩放因子为m。则算子在原浮点模型中接受的输入应为x＝X/m。显然的例子为算子逻辑抽象是同质操作时，即算子满足要求m*f(x)＝f(mx)，那么将该算子的整形数据输入，其输出数据也会带有相同的缩放因子m。Quantization stage: The model is converted to a fixed-point conversion of the operator according to the topological sorting. The fixed-point conversion makes the operator convert from accepting floating-point data to accepting plastic data with a fixed scaling factor m; therefore, when dealing with an operator, it is possible to inductive assumptions Its input data X has been fixed-point and mapped to the integer field INTp, where p is defined as the precision of the input X, and the fixed scaling factor is m. Then the input accepted by the operator in the original floating-point model should be x=X/m. An obvious example is when the logical abstraction of an operator is a homogeneous operation, that is, the operator satisfies the requirement m*f(x)=f(mx), then the input data of the operator will have the same output data scaling factor m.

再量化阶段：考虑到算子在操作数据后数域范围会增大，显然的例子为矩阵乘法，结果数据精度理论上是输入数据的两倍。而本量化装置存储器默认采用INT32占位空间，模型输入数据在经过多个算子执行后数据精度会逐渐增大，直至超出INT32发生数据溢出，带来结果的不确定性。故而算子会设置输入数据的最大精度q，防止数据在算子内部执行出现数据溢出，从而在算子输入数据精度p大于q的时候，需要将输入做降精度处理，该步骤定义为再量化操作，属于优选操作。Re-quantization stage: Considering that the range of the number field will increase after the operator operates the data, the obvious example is matrix multiplication, and the precision of the resulting data is theoretically twice that of the input data. However, the memory of this quantization device uses INT32 space by default. After the model input data is executed by multiple operators, the data accuracy will gradually increase until the data overflow exceeds INT32, resulting in uncertainty of the result. Therefore, the operator will set the maximum precision q of the input data to prevent data overflow when the data is executed inside the operator. Therefore, when the precision p of the input data of the operator is greater than q, it is necessary to reduce the precision of the input. This step is defined as requantization operation is the preferred operation.

本申请借鉴部分主流量化方法思路，将输入输出数据映射到离散整数域，数据的离散化方法如图1，斜线为实际数据，水平线为量化后的数据分布，数学严格描述为浮点就近取整。This application draws on the idea of some mainstream quantization methods to map the input and output data to the discrete integer domain. The discretization method of the data is shown in Figure 1. The oblique line is the actual data, and the horizontal line is the quantized data distribution. The strict mathematical description is the nearest floating point all.

根据本申请的第二方面，涉及一种定点化的神经网络模型量化装置，如图2所示，包括：According to the second aspect of the present application, it relates to a fixed-point neural network model quantification device, as shown in Figure 2, including:

数据存储器：被配置为存储数据，包括样本数据、中间数据结果、最后数据结果等；Data storage: configured to store data, including sample data, intermediate data results, final data results, etc.;

算子定点化处理器：被配置为执行至少一个程序来定点化神经网络中的算子，模型中存在诸多算子，针对每个算子配置为可插拔的算子执行处理器；Operator fixed-point processor: configured to execute at least one program to fix the operator in the neural network. There are many operators in the model, and each operator is configured as a pluggable operator execution processor;

再量化装置：被配置为执行整形数据精度再量化程序，计算浮点操作使用整形的模拟参数；给定浮点输入映射到的整形域M：INT(p)，带有缩放因子sp，期望加数据映射到整形域N：INT(q)上，缩放因子为sq，则有表达式N＝M*(sq/sp)，即M乘以缩放因子(sq/sp)即可。再量化装置执行浮点数据除法后，根据IEEE浮点数据表示标准，将其浮点结果硬件二进制表示直接提取出来有sq/sp＝a/2^b，返回操作指令(M*a)>>b。Re-quantization means: configured to perform the plastic data precision re-quantization procedure, calculate the simulation parameters of the floating-point operation using the plastic; the plastic domain M to which the given floating-point input is mapped: INT(p), with the scaling factor sp, expected to add The data is mapped to the shaping domain N: INT(q), and the scaling factor is sq, then there is an expression N=M*(sq/sp), that is, M is multiplied by the scaling factor (sq/sp). After the requantization device executes floating-point data division, according to the IEEE floating-point data representation standard, the hardware binary representation of the floating-point result is directly extracted as sq/sp=a/2^b, and the return operation instruction (M*a)>> b.

中央处理器：从模型存储器和数据存储器中读取模型和数据，调用算子定点化处理器中的对应算子，统计在实际执行样本数据时内部各个算子的输出结果，并对数据进行再量化处理，具体配置见图3。Central processing unit: read the model and data from the model memory and data memory, call the corresponding operator in the fixed-point processor of the operator, count the output results of each internal operator when actually executing the sample data, and reconstruct the data Quantization processing, the specific configuration is shown in Figure 3.

图3示出了中央处理器的处理流程模块，协调其他部件处理神经网络，实施模型的定点化转换，其内部为一系列的程序单元。Figure 3 shows the processing flow module of the central processing unit, which coordinates other components to process the neural network, and implements the fixed-point conversion of the model, and its interior is a series of program units.

读取程序单元31从模型存储器中读取模型，且从数据存储器中读取样本数据。The reading program unit 31 reads the model from the model memory, and reads the sample data from the data memory.

检验程序单元32实施检验阶段，其对模型做算子的拓扑排序，按照顺序调用算子执行处理器中对应算子的检验配置。The verification program unit 32 implements the verification stage, which performs topological sorting of the operators on the model, and calls the operators in order to execute the verification configuration of the corresponding operators in the processor.

准备程序单元33实施准备阶段，其对模型做算子的拓扑排序，按照顺序调用算子执行处理器中对应算子的准备配置。The preparation program unit 33 implements the preparation stage, which performs topological sorting of the operators on the model, and calls the operators in order to execute the preparation configuration of the corresponding operators in the processor.

标度程序单元34实施标度阶段，根据读取的样本数据，统计在实际执行样本数据时内部各个算子的输出结果，归纳最大最小值、均值、方差等特征，便于后续执行再量化操作时计算缩放因子。The scaling program unit 34 implements the scaling phase. According to the read sample data, the output results of each internal operator are counted when the sample data is actually executed, and the characteristics such as maximum and minimum values, mean value, and variance are summarized, which is convenient for subsequent re-quantization operations. Calculate the scaling factor.

量化程序单元35实施量化阶段，其对模型做算子的拓扑排序，按照顺序调用算子定点化处理器中对应算子的量化配置。如之前所述，首先从算子处理器中获取到输入数据最大精度，如检验到输入数据精度大于最大精度，则需要调用再量化装置进行数据降精度处理；之后调用算子处理器的量化配置处理降精度数据。The quantization program unit 35 implements the quantization stage, which performs topological sorting of operators on the model, and calls the quantization configuration of corresponding operators in the operator fixed-point processor in sequence. As mentioned before, first obtain the maximum precision of the input data from the operator processor, if it is verified that the precision of the input data is greater than the maximum precision, it is necessary to call the re-quantization device to perform data reduction processing; then call the quantization configuration of the operator processor Handle reduced precision data.

参见图2，神经网络定点化装置通过中央处理器同模型存储器和数据存储器的交互，并协同可插拔的算子定点化处理器，执行配置在中央处理器中的软件方法，将模型量化为全整形网络。算子定点化处理器设置为可插拔，一是因为简洁设计，所有算子提供相同接口，提升了本装置的定点化可扩展性，更多的算子可针对配置，能够不断的支持更多更新的模型；其二，算子处理器热插拔便于安装和部署，根据应用场景的不同，可针对配置不同定点化转换装置。Referring to Figure 2, the neural network fixed-point device interacts with the model memory and data memory through the central processing unit, and cooperates with the pluggable operator fixed-point processor to execute the software method configured in the central processing unit to quantize the model as Fully shaped network. The fixed-point processor of the operator is set to be pluggable. First, because of the simple design, all operators provide the same interface, which improves the fixed-point scalability of the device. More operators can be configured for continuous support. More updated models; second, operator processors are hot-swappable for easy installation and deployment, and different fixed-point conversion devices can be configured according to different application scenarios.

所有算子定点化处理器均需配置上述三个阶段软件方法，分别是校验阶段、准备阶段、量化阶段。下面介绍已有算子定点化处理器中配置的程序方法中涉及到的一些概念：All operator fixed-point processors need to be configured with the above-mentioned three-stage software method, which are the verification stage, the preparation stage, and the quantization stage. The following introduces some concepts involved in the program method configured in the fixed-point processor of existing operators:

常量消除：位于准备阶段，神经网络中存在三种类型节点，分别为输入、参数和算子，其中参数为已知变量，算子为数据操作的逻辑抽象。假设某算子的输入均为参数，或者没有任何输入，那么该算子可在处理器上被提前计算出结果，即该算子可被常量消除。Constant Elimination: In the preparation stage, there are three types of nodes in the neural network, namely input, parameter and operator. The parameter is a known variable, and the operator is a logical abstraction of data operation. Assuming that the input of an operator is all parameters, or there is no input, the result of the operator can be calculated in advance on the processor, that is, the operator can be eliminated by a constant.

转置消除：转置指的是将一个(M1,M2,M3,…,Mk)维大小的矩阵变换成(N1,N2,N3,…,Nk),其中N是M数组的重排列；对于某些特定的神经网络，大量的转置操作可以消除。例如两个连续的转置可以被消除为一个转置。Transpose elimination: Transpose refers to transforming a (M1, M2, M3, ..., Mk) dimensional matrix into (N1, N2, N3, ..., Nk), where N is the rearrangement of the M array; for For certain neural networks, a large number of transpose operations can be eliminated. For example two consecutive transpositions can be eliminated as one transposition.

以上是神经网络定点化方法中常用的概念，下面详细介绍神经网络定点化装置中算子定点化处理器中的配置方法，默认操作假设算子为前文介绍的同质操作，输出缩放因子为输入缩放因子，算子操作逻辑不做变更。需要说明的是，本申请中的算子设置根据不同的处理对象和工作内容可以根据需要自由组合选择，无需同时设置。The above are commonly used concepts in the neural network fixed-point method. The following describes the configuration method in the operator fixed-point processor in the neural network fixed-point device in detail. The default operation assumes that the operator is the homogeneous operation described above, and the output scaling factor is the input The scaling factor and operator operation logic remain unchanged. It should be noted that the operator settings in this application can be freely combined and selected according to different processing objects and work contents, and there is no need to set them at the same time.

relu算子处理器输入精度不做限制；配置准备阶段，如果其子节点为转置操作，则将该节点和子节点顺序对调；其他阶段采用默认操作。There is no limit to the input precision of the relu operator processor; in the configuration preparation stage, if its child node is a transpose operation, the order of the node and child nodes will be reversed; in other stages, the default operation will be used.

tanh，simgoid，exp算子处理器配置最大输入精度为16；tanh, simgoid, and exp operator processors are configured with a maximum input precision of 16;

tanh，simgoid，exp算子处理器配置量化阶段，这些算子为非线性函数，无法采用正常的整形算子来模拟浮点计算，故采用查表法，将输入整形数据对应的原浮点数映射到离散域INT16，建立索引表TABLE，处理器定点化为将算子变换为索引指令和索引表；其他阶段采用默认操作。举例而言，假设输入整形数据为INT8阈值，则每一个输入的INT8值，对应一个原输入浮点数据，经过原浮点算子计算对应一个原输出浮点数据，该浮点输出数据默认映射到INT16离散域上，即一个输出整形数值，简言之，可建立一个二维数组表格，将每一个输入INT8整形值直接映射到其整形数值结果，称之为查表法。The tanh, simgoid, and exp operator processors configure the quantization stage. These operators are nonlinear functions, and normal integer operators cannot be used to simulate floating-point calculations. Therefore, the table lookup method is used to map the original floating-point numbers corresponding to the input integer data. To the discrete domain INT16, the index table TABLE is established, and the processor is fixed-pointed to transform the operator into an index instruction and an index table; other stages adopt default operations. For example, assuming that the input integer data is an INT8 threshold value, each input INT8 value corresponds to an original input floating-point data, and after the calculation of the original floating-point operator, it corresponds to an original output floating-point data, and the floating-point output data is mapped by default To the INT16 discrete domain, that is, an output integer value, in short, a two-dimensional array table can be established, and each input INT8 integer value is directly mapped to its integer value result, which is called a table lookup method.

softmax算子处理器配置最大精度为16；The softmax operator processor is configured with a maximum precision of 16;

softmax算子处理器配置量化阶段，采用查表法+整形数据操作，算子逻辑抽象表达式为Y[i]＝exp(I)/sum(e^j,j in X)，通过将exp(i)表达式根据算子exp查表法量化，再进行整形加除法。其中，原表达式中浮点除法定点化后期望离散操作为就近取整，故转换为整数除法后分子需要加以分母的半数，量化后数学表达式为Y[i]＝ROUND(TABLE(I)/TOTAL)＝(TBALE(I)+TOTAL/2)/TOTAL，其中TOTAL＝sum(TABLE(j),j in X)；其他阶段采用默认操作。The softmax operator processor configures the quantization stage, adopts the look-up method + shaping data operation, and the abstract expression of the operator logic is Y[i]=exp(I)/sum(e^j,j in X), by exp( i) The expression is quantified according to the table lookup method of the operator exp, and then performs plastic addition and division. Among them, the discrete operation is expected to be rounded to the nearest integer after floating-point division in the original expression, so after converting to integer division, the numerator needs to add half of the denominator, and the mathematical expression after quantization is Y[i]=ROUND(TABLE(I) /TOTAL)=(TBALE(I)+TOTAL/2)/TOTAL, where TOTAL=sum(TABLE(j),j in X); other stages use default operations.

卷积(Convolution)，矩阵乘(FullyConnected,Dense)处理器配置最大精度为8；Convolution, matrix multiplication (FullyConnected, Dense) processor configuration with a maximum precision of 8;

卷积(Convolution)，矩阵乘(FullyConnected,Dense)处理器配置校验阶段，只支持2D-NCHW输入格式。Convolution, matrix multiplication (FullyConnected, Dense) processor configuration verification stage, only supports 2D-NCHW input format.

卷积(Convolution)，矩阵乘(FullyConnected,Dense)处理器配置准备阶段，执行矩阵分解方法：向量相乘会导入大量的乘加操作，INT8的输入数据相乘之后为INT16，在32位占用空间表示下，理论上存在K>65536次加法即可造成数据上溢。当矩阵乘积A*B的K满足如上条件时，需要进行矩阵分解操作，将大矩阵乘积算子转换成众多小矩阵算子合并相加。Convolution (Convolution), matrix multiplication (FullyConnected, Dense) Processor configuration preparation stage, perform matrix decomposition method: vector multiplication will import a large number of multiplication and addition operations, INT8 input data is multiplied to INT16, occupying space in 32 bits Under the expression, theoretically, K>65536 additions can cause data overflow. When the K of the matrix product A*B satisfies the above conditions, a matrix decomposition operation is required to convert the large matrix product operator into many small matrix operators for merging and adding.

卷积(Convolution)，矩阵乘(FullyConnected,Dense)处理器配置量化阶段，原数学表达式Y＝X*W+B，假设有X＝Xi*Sx,W＝Wi*Sw,其中Xi，Wi为整形数据，Sx，Sw为缩放因子。则原数学表达式等价为Y＝Xi*Sx*Wi*Sw+B＝Xi*Wi*(Sx*Sw)+B,令偏移量B的缩放因子为Sx*Sw，则B＝Bi*(Sx*Sw),则上式可化简为Y＝(Xi*Wi+Bi)*(Sx*Sw),其中第一个括号为整形的卷积操作，Sx*Sw则为卷积操作输出带有的缩放因子。原浮点卷积和矩阵乘操作可等价转换为整形同等算子，其中输入X,W,B的缩放因子分别为Sx,Sw,(Sx*Sw)，输出缩放因子为Sx*Sw。Convolution (Convolution), matrix multiplication (FullyConnected, Dense) processor configuration quantization stage, the original mathematical expression Y=X*W+B, assuming X=Xi*Sx, W=Wi*Sw, where Xi, Wi are Shaping data, Sx, Sw are scaling factors. Then the original mathematical expression is equivalent to Y=Xi*Sx*Wi*Sw+B=Xi*Wi*(Sx*Sw)+B, and the scaling factor of offset B is Sx*Sw, then B=Bi* (Sx*Sw), the above formula can be simplified to Y=(Xi*Wi+Bi)*(Sx*Sw), where the first bracket is the convolution operation of shaping, and Sx*Sw is the output of the convolution operation with scaling factor. The original floating-point convolution and matrix multiplication operations can be equivalently converted into integer equivalent operators, where the scaling factors of input X, W, and B are Sx, Sw, (Sx*Sw) respectively, and the output scaling factor is Sx*Sw.

归一化(BatchNorm)算子处理器配置准备阶段，有算子逻辑抽象表达式为Y＝(X–mean)/var*gamma+beta＝X*(1/var)+(beta–mean*gamma/var)＝X*a+b，即归一化操作可被等价转换为矩阵乘法和加法，其他阶段无需配置。Normalization (BatchNorm) operator processor configuration preparation stage, the abstract expression of operator logic is Y=(X–mean)/var*gamma+beta=X*(1/var)+(beta–mean*gamma /var)=X*a+b, that is, the normalization operation can be equivalently transformed into matrix multiplication and addition, and no configuration is required for other stages.

归一化(BatchNorm)算子处理器配置准备阶段，如果输入数据为卷积操作结果，则上述表达式还可写作Y＝X*a+b＝(D*W+B)*a+b＝D*(W*a)+(B*a+b),为权重W*a，偏移量B*a+b的卷积操作，即归一化操作可被合并至卷积操作。In the preparatory stage of BatchNorm operator processor configuration, if the input data is the result of convolution operation, the above expression can also be written as Y=X*a+b=(D*W+B)*a+b= D*(W*a)+(B*a+b), is the convolution operation of weight W*a and offset B*a+b, that is, the normalization operation can be incorporated into the convolution operation.

自动扩展矩阵乘(broadcast multiply)算子处理器配置输入精度16；其他阶段采用默认操作。Automatically expand the matrix multiplication (broadcast multiply) operator processor to configure an input precision of 16; other stages use the default operation.

维度加法(sum over axis)算子处理器配置输入精度为8；其他阶段采用默认操作。The dimension addition (sum over axis) operator processor is configured with an input precision of 8; other stages adopt the default operation.

矩阵加，减算子处理器配置输入精度为16；其他阶段采用默认操作。The matrix addition and subtraction operator processor is configured with an input precision of 16; other stages use the default operation.

自动扩展矩阵加，减(broadcast_add，broadcast_sub)算子处理器配置输入精度为16；其他阶段采用默认操作。Automatically expand the matrix addition and subtraction (broadcast_add, broadcast_sub) operator processor configuration input precision is 16; other stages adopt the default operation.

矩阵链接(concatenate)算子处理器输入精度不做限制；其他阶段采用默认操作。There is no limit to the input precision of the matrix link (concatenate) operator processor; other stages adopt the default operation.

内嵌(Embedding)算子处理器输入精度不做限制；其他阶段采用默认操作。There is no limit to the input precision of the embedded (Embedding) operator processor; other stages adopt the default operation.

最大值池化算子(max pooling)处理器输入精度不做限制；其他阶段采用默认操作。There is no limit to the input precision of the max pooling operator (max pooling) processor; other stages adopt the default operation.

平均值池化算子(average pooling)处理器配置校验阶段，池化核窗口滑动时计算平均值至少应包含周边补齐窗格。In the average pooling operator (average pooling) processor configuration verification stage, the average value calculated when the pooling core window slides should at least include the peripheral padding pane.

平均值池化算子(average pooling)处理器配置准备阶段，算子逻辑抽象表达式为Y＝sum{kernel(X)}/size of kernel，等价转换为核大小为size of kernel，值均为1/size of kernel的卷积操作，其他阶段不需要配置。The average pooling operator (average pooling) processor configuration preparation stage, the abstract expression of the operator logic is Y=sum{kernel(X)}/size of kernel, which is equivalently converted to the core size as size of kernel, and the value average It is a convolution operation of 1/size of kernel, and other stages do not need to be configured.

矩阵截取(slice)，截断(clip)算子处理器输入精度不做限制；其他阶段采用默认操作。Matrix truncation (slice), truncation (clip) operator processor input precision is not limited; other stages use the default operation.

矩阵取反(negative)，维度重复(repeat)，重复链接(tile)算子处理器输入精度不做限制；其他阶段采用默认操作。Matrix inversion (negative), dimension repetition (repeat), repeat link (tile) operator processor input precision is not limited; other stages use default operations.

维度扩展(expand dims)，消除(squeeze)，重整(reshape)算子处理器输入精度不做限制；其他阶段采用默认操作。Dimension expansion (expand dims), elimination (squeeze), reshape (reshape) operator processor input precision is not limited; other stages use the default operation.

转置(transpose)算子处理器输入精度不做限制；配置准备阶段，如果输入数据为转置操作结果，可合成一个转置操作；其他阶段采用默认操作。There is no limit to the input precision of the transpose operator processor; in the configuration preparation stage, if the input data is the result of a transpose operation, a transpose operation can be synthesized; the default operation is used in other stages.

扁平化(flatten)，最大，最小值(max，min)，超采样(upsampling)算子处理器输入精度不做限制；其他阶段采用默认操作。Flatten, maximum, minimum (max, min), oversampling (upsampling) operator processor input precision is not limited; other stages use the default operation.

虽然本申请所揭露的实施方式如上，但所述的内容只是为了便于理解本申请而采用的实施方式，并非用以限定本申请。任何本申请所属技术领域内的技术人员，在不脱离本申请所揭露的精神和范围的前提下，可以在实施的形式上及细节上作任何的修改与变化，但本申请的专利保护范围，仍须以所附的权利要求书所界定的范围为准。Although the embodiments disclosed in the present application are as above, the content described is only the embodiments adopted for the convenience of understanding the present application, and is not intended to limit the present application. Anyone skilled in the technical field to which this application belongs can make any modifications and changes in the form and details of implementation without departing from the spirit and scope disclosed in this application, but the patent protection scope of this application is The scope defined by the appended claims must still prevail.