Movatterモバイル変換


[0]ホーム

URL:


CN110135580B - Convolution network full integer quantization method and application method thereof - Google Patents

Convolution network full integer quantization method and application method thereof
Download PDF

Info

Publication number
CN110135580B
CN110135580BCN201910344069.9ACN201910344069ACN110135580BCN 110135580 BCN110135580 BCN 110135580BCN 201910344069 ACN201910344069 ACN 201910344069ACN 110135580 BCN110135580 BCN 110135580B
Authority
CN
China
Prior art keywords
output
weight
integer
network
quantization
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910344069.9A
Other languages
Chinese (zh)
Other versions
CN110135580A (en
Inventor
钟胜
周锡雄
王建辉
商雄
蔡智
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and TechnologyfiledCriticalHuazhong University of Science and Technology
Priority to CN201910344069.9ApriorityCriticalpatent/CN110135580B/en
Publication of CN110135580ApublicationCriticalpatent/CN110135580A/en
Application grantedgrantedCritical
Publication of CN110135580BpublicationCriticalpatent/CN110135580B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Landscapes

Abstract

Translated fromChinese

本发明公开了一种卷积网络全整型量化方法,属于卷积网络的量化压缩技术领域。本发明对卷积网络的输入特征图、网络权重、以及输出特征图,均采用整型表达,每层网络的前向推理过程只涉及整型计算。为保障整型量化后的性能,本发明需要对网络重新训练,并在训练中模拟网络全整型推理的结果。本发明还实现了一种全整型量化卷积网络的应用方法。相比于单精度浮点表达的卷积网络,本发明方案所占用资源更少,推理速度更快;相比于定点量化的网络,本发明对网络的输入、输出以及权重均采用固定长度整型表达,不用考虑逐层网络的输出结果的位宽带来的影响,其规整性更强,更适合于面向资源受限平台,如FPGA/ASIC等平台应用。

Figure 201910344069

The invention discloses a convolutional network full-integer quantization method, which belongs to the technical field of quantization and compression of convolutional networks. In the present invention, the input feature map, network weight, and output feature map of the convolutional network are all expressed by integer, and the forward reasoning process of each layer of network only involves integer calculation. In order to ensure the performance after integer quantization, the present invention needs to retrain the network, and simulate the result of the full integer reasoning of the network during training. The invention also realizes an application method of the full integer quantization convolution network. Compared with the convolutional network expressed by single-precision floating point, the solution of the present invention occupies less resources and has a faster inference speed; It has stronger regularity and is more suitable for resource-constrained platforms, such as FPGA/ASIC and other platform applications.

Figure 201910344069

Description

Convolution network full integer quantization method and application method thereof
Technical Field
The invention belongs to the technical field of quantization compression of a convolutional network, and particularly relates to a convolutional network full integer quantization method and an application method thereof.
Background
Since Alex-Net published in 2012, the deep learning method represented by convolutional neural network has been a breakthrough in the performance of the target discrimination and identification fields year by year, the accuracy of the existing complex network can reach more than 95%, and the network is not considered to be deployed on the embedded platform with limited resources at the beginning of design. For resource-oriented constrained applications, such as: applications such as AR/VR, smart phones, FPGA/ASIC, etc. require quantitative compression of models for reducing the size of the models and the demand of computing resources to adapt to the deployment of these embedded platforms.
Facing the model quantization compression problem, there are mainly two approaches: the first is to design a more efficient/lightweight network for the model structure itself to accommodate constrained computing resources such as Mobile Net, Shuffle Net. The second method is to carry out low bit quantization on intermediate results of the network, including weight, input and output, aiming at the existing network structure, and reduce the requirement of the computing resources of the network and the computing delay of the network under the condition that the network structure is unchanged and the network precision is ensured.
In view of the second mode, the existing methods for low bit quantization include: TWN, BNN, XOR-NET. The methods change the weight and the input and output quantity of the network into 1 bit or 3 bits, so that the multiplication and addition operation of the convolution process can be replaced by an exclusive or + shift operation, and the use of computing resources can be reduced. However, this method has significant drawbacks: the loss of precision is large. As for other quantification methods, the actual deployment in hardware is not considered, quantification is only performed on the network weight, and the consideration on the requirement of storage resources is focused on meeting, while the consideration on the requirement of computing resources is ignored.
Disclosure of Invention
In view of the above drawbacks or needs for improvement in the prior art, the present invention provides a convolution network full-integer quantization method and an application method thereof, which aims to express the input, output and weight of a network by fixed-length integer, and the quantization method enables the accuracy loss of the network to be controlled to about 5%, and simultaneously, the consumption of computing resources, storage resources and network resources.
In order to achieve the above object, the present invention provides a convolution network full integer quantization method, which comprises the following steps:
(1) obtaining a model, a floating point type weight and a training data set of a convolutional network, and initializing the network;
(2) for each convolution layer, firstly, calculating the distribution range of input IN, output OUT and weight WT of each layer through a floating point type reasoning process, and respectively calculating the maximum absolute extreme value of the input IN, the output OUT and the weight WT;
(3) updating the maximum absolute extreme values of the three in the training process of the current layer;
(4) performing integer quantization on the input and the weight of the current layer IN the convolutional network according to the maximum absolute pole of the input IN, the output OUT and the weight WT;
(5) according to the input and the weight of the integer quantization, the output of the integer quantization of the current layer is solved;
(6) carrying out inverse quantization on the output of the integer quantization of the current layer, reducing the output into a floating point type, and outputting to the next layer; if the next layer is the batch norm layer, merging the parameters of the batch norm layer into the current layer by adopting a merging means; repeating the steps (3) to (6) until the last layer in the convolutional network;
(7) back propagation, continuously updating the weight until the network converges, and storing the quantized weight and the additional parameters; the parameters after integer quantization are used in the forward derivation process of full integer, and integer is used to replace the original floating point operation.
Further, the step (3) of updating the maximum absolute extreme values of the three in the training process specifically includes updating the maximum absolute extreme values by using an exponential moving average algorithm:
xn=αxn-1+(1-α)x
wherein x isnFor updating the maximum absolute extreme value, x, of input, output, or weight this timen-1The maximum absolute extreme value of the input, the output or the weight is updated last time, x is the input, the output or the weight obtained by the calculation, and alpha is a weight coefficient.
Further, the step (4) is specifically as follows:
input integer quantization:
Q_IN=clamp(IN/S1)
wherein Q _ IN represents an integer quantization input; s1 { | IN | }/Γ ═ 2N(ii) a N represents the number of quantized bits; clamp () represents the part after truncation of the decimal point; max { | IN | } represents the maximum absolute extreme value of the input;
integer quantization of weights:
Q_WT=clamp(WT/S2)
wherein Q _ WT represents an integer quantization of the weights; s2 { | WT | }/Γ | WT | }/Γ | 2N(ii) a max { | WT | } represents the maximum absolute extreme value of the weight.
Further, the step (5) is specifically:
the output of the integer quantization, Q _ OUT, is:
Q_OUT=Q_IN×Q_WT×M
M=S1×S2/S3
wherein Q _ IN represents an integer quantization input; q _ WT represents an integer quantization of the weights; since M is floating-point type S1 × S2/S3, the order is
Figure BDA0002041707870000031
The derivation process of the parameter C and the parameter S is as follows:
firstly, solving M, S1 × S2/S3:
wherein S1 { | IN | }/Γ ═ 2NMax { | IN | } represents the maximum absolute extreme of the input; s2 { | WT | }/Γ | WT | }/Γ | 2NMax { | WT | } represents the maximum absolute extreme value of the weight; s3 { | OUT | }/Γ | Γ 2NMax { | OUT | } represents the maximum absolute extremum of the output; n represents the number of quantized bits;
multiplying M by 2 or dividing by 2 repeatedly, so that 0< M <0.5, a is 0, each time M is multiplied by 2, a is a +1, and dividing by 2, a is a-1, and counting to obtain the final value of a;
then presetting a value of v, wherein v is more than 0 and less than or equal to 32, and solving S and C according to the following formula:
S=v+a
C=round(M×2v)
0<C≤2v
where round () means to return round rounding.
Further, the shaped quantized output Q _ OUT is:
Q_OUT=Q_IN×Q_WT×M
before the output is shaped and quantized, the non-linear activation of Q _ IN and Q _ WT is carried out, and the non-linear activation adopts a shift approximation operation.
Further, the non-linear activation of Q _ IN and Q _ WT is specifically:
nonlinear activation is performed by using a leak activation function Q _ IN × Q _ WT, which is specifically formed as follows:
Figure BDA0002041707870000041
to ensure that the Q _ IN × Q _ WT remains integer after nonlinear activation, the above equation is shifted approximately, as follows:
Figure BDA0002041707870000042
wherein y < <1 indicates that the binary y is shifted to the left by one bit, and (y + y < <1) > >5 indicates that the binary (y + y < <1) is shifted to the right by 5 bits, and the final nonlinearly activated Q _ IN × Q _ WT remains an integer.
Further, if the next layer in the step (6) is a batch norm layer, merging the parameters of the batch norm layer into the current layer by adopting a merging means specifically comprises:
the calculation process of the batch norm layer is as follows:
Figure BDA0002041707870000043
wherein x represents input, y represents output, epsilon represents the additional value of denominator, mu represents the output mean value, sigma represents the output standard deviation, gamma is a parameter generated in the calculation process of the batch norm layer, and beta represents bias;
since the batch norm follows the convolution process, the convolution process is expressed as:
y=∑w×fmap(i,j)
wherein fmap (i, j) is an image feature at the input image (i, j); w is a weight; y represents an output;
therefore, merging the batch norm layer parameters into the convolution process by adopting a merging means is as follows:
the combined weight is as follows:
Figure BDA0002041707870000051
combined bias:
Figure BDA0002041707870000052
the convolution process after combination: y ∑ w _ fold × fmap (i, j) + β _ fold.
According to another aspect of the present invention, there is provided an application method of a full integer quantization convolution network, the application method comprising the steps of:
s1, obtaining a model, a floating point type weight and a training data set of the convolutional network, and initializing the network;
s2, for each convolution layer, firstly, the distribution range of the input IN, the output OUT and the weight WT of each layer is obtained through the reasoning process of a floating point form, and the maximum absolute extreme values of the input IN, the output OUT and the weight WT are respectively obtained;
s3, updating the maximum absolute extreme values of the three in the training process of the current layer;
s4, performing integer quantization on the input and the weight of the current layer IN the convolution network according to the maximum absolute pole of the input IN, the output OUT and the weight WT;
s5, obtaining the output of the current layer integer quantization according to the input and weight of the integer quantization;
s6, carrying out inverse quantization on the output of the integer quantization of the current layer, reducing the output into a floating point type and outputting to the next layer; if the next layer is the batch norm layer, merging the parameters of the batch norm layer into the current layer by adopting a merging means; repeatedly and sequentially executing the steps S3 to S6 until the last layer in the convolutional network;
s7, used for back propagation, continuously updating the weight until the network convergence, saving the quantized weight, and additional parameters; the parameters after integer quantization are used in the forward derivation process of full integer, and integer is used for replacing the original floating point operation;
s8, inputting the image of the target to be detected into a full integer quantization convolution network, and dividing the image of the target to be detected into S × S grids;
s9, setting n anchor boxes with fixed length-width ratios, predicting n anchor boxes for each grid, and independently predicting the coordinates (x, y, w, h), the confidence coefficient p and the probability of m categories of the target by each anchor box; wherein x, y represent the target coordinates, w, h represent the height and width of the target;
s10, according to the probability corresponding to each category calculated in the previous step, firstly, carrying out preliminary screening through a fixed threshold, filtering out candidate frames with the confidence coefficient lower than the threshold in the corresponding category, and then removing overlapped target frames through a non-maximum inhibition method;
and S11, selecting the targets with the corresponding probability exceeding the threshold in different categories for the reserved target frames to be displayed visually, and outputting the target detection result.
Generally, compared with the prior art, the above technical solution conceived by the present invention has the following beneficial effects:
(1) the invention adopts a full integer quantization method, and the input, output and weight of the network are expressed by fixed length integers, the quantization method can control the precision loss of the network to be about 5 percent, and the requirement on computing resources is more friendly because the forward propagation process only comprises the multiplication of the fixed length integers;
(2) the absolute value extreme values of input and output of the network are calculated by adopting an exponential moving average algorithm, then quantization operation is carried out through the extreme values, the exponential moving average algorithm counts the distribution characteristics of a batch of data, so that the quantization result can meet the numerical characteristics of the batch of data and is not limited to specific input, and the quantization method is a necessary guarantee for generalization in practical application;
(3) merging measures are taken for the batch norm layer, parameters of the batch norm layer are directly merged to the convolutional layer, the process of quantifying the batch norm layer is directly omitted, and meanwhile, the process does not need to consider calculation of the batch norm layer when the network carries out forward reasoning;
(4) the shift activation process is advanced to the front of the quantization network output result, the shift activation operation is firstly carried out on the output intermediate result, and then the quantization of the network output is carried out, the method is based on the following steps: if the output is quantized to 8 bits and then the shift activation process is executed, it is equivalent to operating on an 8-bit signed number, and the precision is
Figure BDA0002041707870000071
Before the output is quantized, it is expressed using a 32bit value, androw shift active operation with precision of
Figure BDA0002041707870000072
Thus, by performing the change of the order, an error due to the shift approximation operation of the active layer can be reduced.
Drawings
FIG. 1 is a training flow diagram of a full integer quantization method of the present invention;
FIG. 2 is a diagram illustrating an example of the structure of a convolutional neural network in an embodiment of the present invention;
FIG. 3 is a diagram illustrating a batch norm integration method according to the present invention;
FIG. 4 is an exemplary diagram of the cancellation of quantization and dequantization between adjacent layers of a network in the present invention;
FIG. 5 is a schematic diagram of the full integer forward derivation process of the present invention;
FIG. 6 is a graph of target detection results before quantification;
FIG. 7 is a graph of the target detection results after quantification.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
As shown in fig. 1, the method of the present invention comprises the steps of:
(1) obtaining a model, a floating point type weight and a training data set of a convolutional network, and initializing the network;
specifically, the supporting embodiment of the invention adopts a network structure of YOLOV 2-tiny. Referring to FIG. 2, there are 6 max pool layers, 9 convolutional layers followed by a batch norm. The training framework employs a darknet, which is written in c language and opens open sources. The Yolo web author provides floating point type weights on the personal home page for download. The training data was trained using VOC2012 and VOC2007 data sets, which contain 20 classes of targets, and a total of 9963+ 11540-21503 labeled data. The width of an input image for initializing the network is 416 pixels, the height of the input image is 416 pixels, the number of channels of the image is 3, the number of pictures subjected to iterative training each time is 64, the momentum is 0.9, the learning rate is 0.001, the maximum iteration number is 60200, the network output is the position, the size and the confidence coefficient of a target in the image, and due to the fact that cross redundancy exists in detection results, the detection results need to be fused by a non-maximum suppression method, and therefore the output result of each detected target corresponds uniquely.
(2) For each convolution layer, firstly, the distribution range of input, output and weight of each convolution layer is obtained through a floating point type reasoning process, the maximum absolute extreme value | max | of the input, output and weight of each convolution layer is respectively obtained, and the extreme value is updated by using an exponential moving average algorithm (EMA) in a training process;
specifically, each layer of network weight includes parameters w and β, and input and output need to be quantized, which requires statistics of the maximum absolute values of 4 groups of w, β, IN, and OUT. In order to enable statistical absolute maxima, reflecting the statistical characteristics of the data set, rather than maxima under a particular input image, these extrema need to be updated using the EMA. The specific formula is as follows: x is the number ofn=αxn-1+(1-α)x。
xnValue, x, reserved for the current endn-1The value reserved for the last iteration process, and x is the result of this calculation. Alpha is a weight coefficient, generally selected between 0.9 and 1, and in the embodiment of the invention, alpha is 0.99.
(3) Quantizing the input and the weight of the network according to the obtained maximum absolute value by using the following quantization formula, so that the input and the weight can be expressed by int 8;
and (3) quantization input: q _ IN ═ clamp (IN/S1)
Quantization weight: q _ WT ═ clamp (WT/S2)
Quantization coefficient: s1 ═ MAX |/Γ, | MAX { | IN | }, Γ ═ 2N
S2=|MAX|/Γ,|MAX|=max{|WT|},Γ=2N
Wherein Γ ═ 2NRepresenting the number of quantized bits; IN is input, WT is weight, max { | IN | } is the maximum absolute extreme of the input, max { | WT | } is the maximum absolute extreme of the weight;
specifically, according to experience, the input and weight absolute value of each layer network are in the range of 0-1, linear transformation is carried out by utilizing the statistical maximum absolute value, the weight and the input are normalized to [ -127,127] by adopting the formula, when the numerical value is rounded, a direct truncation mode is used instead of a rounding forensics mode, and in the formula, clamp () represents truncation operation: int ═ clamp (float). In an embodiment of the present invention, N ═ 8.
(4) The quantized output of the current layer can be obtained according to the obtained quantized input and weight. To ensure that the network output is also an integer value, the quantization is performed using the following formula:
and (3) floating point output: OUT × WT × Q _ IN × Q _ WT × S1 × S2
And (3) quantization output: q _ OUT/S3Q _ IN × Q _ WT × (S1 × S2/S3)
Where S3 is the output quantized coefficient. Since M is a floating point number S1 × S2/S3, to ensure that the network inference process is integer computation, it can be approximated by multiplication and shift, and the coefficients C, S generated by the approximation process are stored as parameters, which is as follows:
approximate calculation:
Figure BDA0002041707870000091
Figure BDA0002041707870000092
specifically, since M is a floating point number, S1S2/S3, since it is necessary to ensure that the quantized output value can be represented by integer, and the calculation process does not involve floating point operation, it is necessary to perform approximate calculation on M, and order M
Figure BDA0002041707870000093
To ensure that the bit width of the integer multiplication is as small as possible and the result of the approximate calculation is more accurate, it is necessary to selectSelecting the numerical range of C. In the examples of the present invention, 0< C.ltoreq.2 is definedv,v=24。
The calculation for solving C, S is to multiply or divide M by 2 repeatedly to finally get 0<MΔ<0.5. Assuming that a is 0, each time M multiplies 2, a adds 1, and each time M divides 2, asubtracts 1. Finally, let C equal to round (M)Δ×2v) S ═ v + a, round () denotes rounding.
(5) Before a network result is output to a lower layer, a nonlinear activation (active) process is required, the process is a floating point operation, and in order to simulate a forward propagation full integer calculation process, a shift approximation operation is required to be adopted in the process. Shifting the result (in8 expression) after the activation operation is approximated, reducing the result into a floating point type expression after inverse quantization, and outputting the floating point type expression to the next layer; and (5) repeating the processes from (2) to (5) until the last layer of the network. For the network with the batch norm layer, merging means is needed to merge the parameters of the batch norm layer directly into the network of the previous layer.
Specifically, for a network having a batch norm layer, a merging approach needs to be taken, as shown in fig. 3. The specific implementation process comprises the following steps: mathematical formulas are available for batch norm
Figure BDA0002041707870000101
Describing the calculation process, wherein mu represents an output mean value, epsilon represents the added value of a denominator, 0 division operation is prevented when the square difference is divided, default is 1e-5, sigma represents an output standard deviation, gamma is a parameter generated by a batch norm process, and beta represents a bias; since the batch norm follows the convolution process, i.e., x ═ Σ w × fmap (i, j), w is the weight of the network, and fmap (i, j) is the feature map of the input. Through a simple transformation, the batch norm can be integrated into the convolution process, and the deformation process is expressed as follows:
combined weight w:
Figure BDA0002041707870000102
combined bias β:
Figure BDA0002041707870000103
the convolution process after combination: y ∑ w _ fold × fmap (i, j) + β _ fold
The invention adopts the displacement approximation to the nonlinear activation function, and ensures the full integer forward derivation process. The invention adopts a leak activation function, and the specific form is as follows:
Figure BDA0002041707870000104
for the activation function, two parts of operations are mainly included: data judgment and floating-point multiplication. In order to ensure that the forward derivation process only uses integer calculation, the invention adopts shift approximate calculation for the forward derivation process, and the specific form is as follows:
Figure BDA0002041707870000105
the shift approximation process of the present invention is numerically equivalent to the following approximation:
Figure BDA0002041707870000106
in the actual calculation process, the operation of shift activation is performed before the quantization of the final output in step (4). The bit width of the final output value is consistent with that of the input value, and preparation is made for the forward derivation process of the next layer, so that the error caused by the shift approximation operation of the activation layer can be reduced.
(6) Back propagation, continuously updating the weight until the network converges, and storing the quantized weight and the additional parameters; the parameters after integer quantization can be used in the forward derivation process of full integer, and integer is used to replace the original floating point operation.
Specifically, assuming that the input channel of the convolutional layer is L _ M, the output channel is L _ N, and the convolutional kernel size is K, the required storage space before and after the integer quantization is 1/4 after the quantization as shown below.
After quantization:
Storage_int8=L_M×L_N×K×K+L_N+2×sizeof(int32)/sizeof(int8)
before quantization:
Storage_float=(L_M×L_N×K×K+L_N+bn×L_N×3)sizeof(float),bn={0,1}
as shown in fig. 4, there is a quantization as well as an inverse quantization process between the two layers. In the actual forward derivation process, the two can cancel each other out, so in the actual calculation process, only the inverse quantization processing needs to be performed on the output of the last layer of the network, and only the full integer calculation exists in the middle layer, as shown in fig. 5.
In addition, the performance of the invention is actually measured by using a darknet frame: quantization was performed on a network structure of YOLO v2-tiny, and the loss was 5.1% comparing the average map values before and after quantization, as shown in table 1:
categoriesBefore quantizationAfter quantizationError of the measurement
Boat0.14150.16570.0242
Bird0.18070.1621-0.0186
Train0.51450.4441-0.0704
Bus0.53060.4669-0.0637
Person0.46330.4061-0.0572
Dog0.33790.3023-0.0356
Diningtable0.34330.238-0.1053
Sheep0.33220.2644-0.0678
Pottedplant0.08640.0756-0.0108
Sofa0.31870.2076-0.1111
Car0.51950.4358-0.0837
Aeroplane0.41570.2801-0.1356
Bicycle0.480.4563-0.0237
Tvmonitor0.40290.3335-0.0694
Bottle0.05220.037-0.0152
Motorbike0.5360.4221-0.1139
Cat0.38470.3633-0.0214
Chair0.17760.1235-0.0541
Cow0.30490.2972-0.0077
Horse0.52220.4384-0.0838
Average mAP0.35210.301-0.0511
TABLE 1
The invention utilizes the parameters before and after quantization to detect and identify the target
Inputting a given image into a convolution network, and dividing the image into S-S grids;
setting n anchor boxes with fixed length-width ratios, predicting the n anchor boxes for each grid, and independently predicting the coordinates (x, y, w, h), the confidence (p) and the probability of 20 categories of the target by each anchor box;
performing non-maximum suppression (NMS) on the extracted S, S and n targets, removing overlapped frames, and keeping a prediction frame with high confidence;
and outputting and visually displaying the result.
For a certain class of targets, the confidence of the corresponding class in all candidate frames needs to be calculated, and the calculation process is shown as the following formula:
P(class)=P(class|obj)×P(obj)
wherein P (class) represents the final confidence of a class of target in a candidate box, P (class | obj) represents the numerical value of the corresponding class regressed in the candidate box, and P (obj) represents the probability of the target regressed in the candidate box. After the probability of the corresponding category is calculated, firstly, a primary screening is carried out through a fixed threshold, candidate frames with low confidence level in the corresponding category are filtered, and then overlapped target frames are removed through an NMS (non-maximum suppression) method.
The non-maximum suppression (NMS) removal of overlapping boxes is performed according to each category, and the process is summarized as follows:
(1) sorting P (class) of a certain class in all the candidate frames in a descending order, and marking all the frames in an unprocessed state;
(2) calculating the overlapping rate of the frame with the maximum probability and other frames, if the overlapping rate exceeds 0.5, reserving the frame with the maximum probability, correspondingly removing other frames, and marking the frame as processed;
(3) finding out the second largest target frame of P (class) in sequence, and marking according to the step (2);
(4) repeating steps (2) - (3) until all frames are marked as processed;
(5) and selecting the targets exceeding the threshold in the P (class) for visual display and outputting the result for the reserved target frames.
As shown in fig. 6, it is a schematic diagram of the effect of a general convolutional network after image target recognition; performing target detection and identification on the same picture by adopting a convolution network after full integer quantization, wherein the effect is shown in fig. 7; it can be seen that the performance loss of the convolutional network after the integer quantization is adopted is not large, the identification effect is almost better than that of the ordinary convolutional network, but the detection identification speed is higher, and the consumption of computing resources is less.
It will be appreciated by those skilled in the art that the foregoing is only a preferred embodiment of the invention, and is not intended to limit the invention, such that various modifications, equivalents and improvements may be made without departing from the spirit and scope of the invention.

Claims (7)

1. An application method of a full integer quantization convolution network, the application method comprising the steps of:
s1, obtaining a model, a floating point type weight and a training data set of the convolutional network, and initializing the network;
s2, for each convolution layer, firstly, the distribution range of the input IN, the output OUT and the weight WT of each layer is obtained through the reasoning process of a floating point form, and the maximum absolute extreme values of the input IN, the output OUT and the weight WT are respectively obtained;
s3, updating the maximum absolute extreme values of the three in the training process of the current layer;
s4, performing integer quantization on the input and the weight of the current layer IN the convolution network according to the maximum absolute pole of the input IN, the output OUT and the weight WT;
s5, obtaining the output of the current layer integer quantization according to the input and weight of the integer quantization;
s6, carrying out inverse quantization on the output of the integer quantization of the current layer, reducing the output into a floating point type and outputting to the next layer; if the next layer is the batch norm layer, merging the parameters of the batch norm layer into the current layer by adopting a merging means; repeatedly and sequentially executing the steps S3 to S6 until the last layer in the convolutional network;
s7, used for back propagation, continuously updating the weight until the network convergence, saving the quantized weight, and additional parameters; the parameters after integer quantization are used in the forward derivation process of full integer, and integer is used for replacing the original floating point operation;
s8, inputting the image of the target to be detected into a full integer quantization convolution network, and dividing the image of the target to be detected into S × S grids;
s9, setting n anchor boxes with fixed length-width ratios, predicting n anchor boxes for each grid, and independently predicting the coordinates (x, y, w, h), the confidence coefficient p and the probability of m categories of the target by each anchor box; wherein x, y represent the target coordinates, w, h represent the height and width of the target;
s10, according to the probability corresponding to each category calculated in the previous step, firstly, carrying out preliminary screening through a fixed threshold, filtering out candidate frames with the confidence coefficient lower than the threshold in the corresponding category, and then removing overlapped target frames through a non-maximum inhibition method;
and S11, selecting the targets with the corresponding probability exceeding the threshold in different categories for the reserved target frames to be displayed visually, and outputting the target detection result.
2. The method of claim 1, wherein the maximum absolute extremum of the full integer quantization convolutional network is updated in the training process in step S3, specifically, the maximum absolute extremum is updated by using an exponential moving average algorithm:
xn=αxn-1+(1-α)x
wherein x isnFor updating the maximum absolute extreme value, x, of input, output, or weight this timen-1The maximum absolute extreme value of the input, the output or the weight is updated last time, x is the input, the output or the weight obtained by the calculation, and alpha is a weight coefficient.
3. The method of claim 1, wherein the step S4 is specifically as follows:
input integer quantization:
Q_IN=clamp(IN/S1)
wherein Q _ IN represents an integer quantization input; s1 { | IN | }/Γ ═ 2N(ii) a N represents the number of quantized bits; clamp () represents the part after truncation of the decimal point; max { | IN | } represents the maximum absolute extreme value of the input;
integer quantization of weights:
Q_WT=clamp(WT/S2)
wherein Q _ WT represents an integer quantization of the weights; s2 { | WT | }/Γ | WT | }/Γ | 2N(ii) a max { | WT | } represents the maximum absolute extreme value of the weight.
4. The method of claim 1, wherein the step S5 is specifically as follows:
the output of the integer quantization, Q _ OUT, is:
Q_OUT=Q_IN×Q_WT×M
M=S1×S2/S3
wherein Q _ IN represents an integer quantization input; q _ WT represents an integer quantization of the weights; since M is floating-point type S1 × S2/S3, the order is
Figure FDA0002881954860000031
The derivation process of the parameter C and the parameter S is as follows:
firstly, solving M, S1 × S2/S3:
wherein S1 { | IN | }/Γ ═ 2NMax { | IN | } represents the maximum absolute extreme of the input; s2 { | WT | }/Γ | WT | }/Γ | 2NMax { | WT | } represents the maximum absolute extreme value of the weight; s3 { | OUT | }/Γ | Γ 2NMax { | OUT | } represents the maximum absolute extremum of the output; n represents the number of quantized bits;
multiplying M by 2 or dividing by 2 repeatedly, so that 0< M <0.5, a is 0, each time M is multiplied by 2, a is a +1, and dividing by 2, a is a-1, and counting to obtain the final value of a;
then presetting a value of v, wherein v is more than 0 and less than or equal to 32, and solving S and C according to the following formula:
S=v+a
C=round(M×2v)
0<C≤2v
where round () means to return round rounding.
5. The method of claim 4, wherein the output Q _ OUT of the full integer quantization convolution network is:
Q_OUT=Q_IN×Q_WT×M
before the output is shaped and quantized, the non-linear activation of Q _ IN and Q _ WT is carried out, and the non-linear activation adopts a shift approximation operation.
6. The method of claim 5, wherein the non-linear activation of Q _ IN and Q _ WT is specifically:
nonlinear activation is performed by using a leak activation function Q _ IN × Q _ WT, which is specifically formed as follows:
Figure FDA0002881954860000041
to ensure that the Q _ IN × Q _ WT remains integer after nonlinear activation, the above equation is shifted approximately, as follows:
Figure FDA0002881954860000042
wherein y < <1 indicates that the binary y is shifted to the left by one bit, and (y + y < <1) > >5 indicates that the binary (y + y < <1) is shifted to the right by 5 bits, and the final nonlinearly activated Q _ IN × Q _ WT remains an integer.
7. The method of claim 1, wherein if the next layer is a batch norm layer in step S6, merging the parameters of the batch norm layer into the current layer by using a merging means specifically comprises:
the calculation process of the batch norm layer is as follows:
Figure FDA0002881954860000043
wherein x represents input, y represents output, epsilon represents the additional value of denominator, mu represents the output mean value, sigma represents the output standard deviation, gamma is a parameter generated in the calculation process of the batch norm layer, and beta represents bias;
since the batch norm follows the convolution process, the convolution process is expressed as:
y=∑w×fmap(i,j)
wherein fmap (i, j) is an image feature at the input image (i, j); w is a weight; y represents an output;
therefore, merging the batch norm layer parameters into the convolution process by adopting a merging means is as follows:
the combined weight is as follows:
Figure FDA0002881954860000044
combined bias:
Figure FDA0002881954860000045
the convolution process after combination: y ∑ w _ fold × fmap (i, j) + β _ fold.
CN201910344069.9A2019-04-262019-04-26Convolution network full integer quantization method and application method thereofActiveCN110135580B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201910344069.9ACN110135580B (en)2019-04-262019-04-26Convolution network full integer quantization method and application method thereof

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201910344069.9ACN110135580B (en)2019-04-262019-04-26Convolution network full integer quantization method and application method thereof

Publications (2)

Publication NumberPublication Date
CN110135580A CN110135580A (en)2019-08-16
CN110135580Btrue CN110135580B (en)2021-03-26

Family

ID=67575312

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201910344069.9AActiveCN110135580B (en)2019-04-262019-04-26Convolution network full integer quantization method and application method thereof

Country Status (1)

CountryLink
CN (1)CN110135580B (en)

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
WO2021039164A1 (en)*2019-08-262021-03-04ソニー株式会社Information processing device, information processing system, and information processing method
CN110659734B (en)*2019-09-272022-12-23中国科学院半导体研究所Low bit quantization method for depth separable convolution structure
CN112686365B (en)*2019-10-182024-03-29华为技术有限公司 Methods, devices and computer equipment for running neural network models
CN111260022B (en)*2019-11-222023-09-05中国电子科技集团公司第五十二研究所 A method for full INT8 fixed-point quantization of convolutional neural network
CN110929862B (en)*2019-11-262023-08-01陈子祺 Fixed-point neural network model quantization device and method
CN110889503B (en)*2019-11-262021-05-04中科寒武纪科技股份有限公司Data processing method, data processing device, computer equipment and storage medium
CN111160544B (en)*2019-12-312021-04-23上海安路信息科技股份有限公司Data activation method and FPGA data activation system
CN111310890B (en)*2020-01-192023-10-17深圳云天励飞技术有限公司Optimization method and device of deep learning model and terminal equipment
CN111444772A (en)*2020-02-282020-07-24天津大学 Pedestrian detection method based on NVIDIA TX2
CN113762495A (en)*2020-06-042021-12-07合肥君正科技有限公司Method for improving precision of low bit quantization model of convolutional neural network model
CN113762497B (en)*2020-06-042024-05-03合肥君正科技有限公司Low-bit reasoning optimization method for convolutional neural network model
CN113780513B (en)*2020-06-102024-05-03杭州海康威视数字技术股份有限公司Network model quantization and reasoning method and device, electronic equipment and storage medium
CN111696149A (en)*2020-06-182020-09-22中国科学技术大学Quantization method for stereo matching algorithm based on CNN
CN111723934B (en)*2020-06-242022-11-01北京紫光展锐通信技术有限公司Image processing method and system, electronic device and storage medium
CN112200296B (en)*2020-07-312024-04-05星宸科技股份有限公司 Network model quantization method, device, storage medium and electronic device
CN112308226B (en)*2020-08-032024-05-24北京沃东天骏信息技术有限公司Quantization of neural network model, method and apparatus for outputting information
CN112508125A (en)*2020-12-222021-03-16无锡江南计算技术研究所Efficient full-integer quantization method of image detection model
CN114191267A (en)*2021-12-062022-03-18南通大学Light-weight intelligent method and system for assisting blind person in going out in complex environment
CN114444666A (en)*2022-02-072022-05-06南京风兴科技有限公司Convolutional neural network training perception quantization method and device
CN119418175B (en)*2024-12-042025-06-17安徽聆思智能科技有限公司 Image processing method and related method, device, electronic device and storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
WO2012005764A2 (en)*2010-07-082012-01-12Prime Genomics, Inc.System for the quantification of system-wide dynamics in complex networks
WO2016039651A1 (en)*2014-09-092016-03-17Intel CorporationImproved fixed point integer implementations for neural networks
US10733505B2 (en)*2016-11-102020-08-04Google LlcPerforming kernel striding in hardware
CN107515736B (en)*2017-07-012021-01-15广州深域信息科技有限公司Method for accelerating computation speed of deep convolutional network on embedded equipment
CN107909537B (en)*2017-11-162020-11-06厦门美图之家科技有限公司Image processing method based on convolutional neural network and mobile terminal

Also Published As

Publication numberPublication date
CN110135580A (en)2019-08-16

Similar Documents

PublicationPublication DateTitle
CN110135580B (en)Convolution network full integer quantization method and application method thereof
CN109101975B (en)Image semantic segmentation method based on full convolution neural network
CN111260020B (en)Convolutional neural network calculation method and device
US10552737B2 (en)Artificial neural network class-based pruning
CN110084221B (en)Serialized human face key point detection method with relay supervision based on deep learning
CN109002889B (en)Adaptive iterative convolution neural network model compression method
CN111489364B (en)Medical image segmentation method based on lightweight full convolution neural network
CN110533022B (en)Target detection method, system, device and storage medium
CN112508125A (en)Efficient full-integer quantization method of image detection model
CN110363297A (en)Neural metwork training and image processing method, device, equipment and medium
CN110599429B (en)Non-blind deblurring method for high-energy X-ray image
CN114444686B (en) A method and device for quantizing model parameters of convolutional neural network and related devices
CN114463727B (en)Subway driver behavior recognition method
CN113011532A (en)Classification model training method and device, computing equipment and storage medium
CN113570036B (en) Hardware Accelerator Architecture Supporting Dynamic Neural Network Sparse Models
CN117808072B (en)Model pruning method, image processing method, device, equipment and medium
CN113240090B (en)Image processing model generation method, image processing device and electronic equipment
CN113361707A (en)Model compression method, system and computer readable medium
CN119006326A (en)Image rain removing method and system based on improved diffusion model
CN115376195B (en)Method for training multi-scale network model and face key point detection method
CN116523025A (en)Model pruning method, device, equipment, storage medium and program product
US20220405576A1 (en)Multi-layer neural network system and method
CN112598020B (en) Target recognition method and system
CN114037858A (en) An image classification network layer pruning method based on Taylor expansion
CN119130187A (en) A short-term prediction method, device, medium and product for photovoltaic power generation

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
CB03Change of inventor or designer information

Inventor after:Zhong Sheng

Inventor after:Zhou Xixiong

Inventor after:Wang Jianhui

Inventor after:Shang Xiong

Inventor after:Cai Zhi

Inventor before:Zhong Sheng

Inventor before:Zhou Xixiong

Inventor before:Shang Xiong

Inventor before:Cai Zhi

CB03Change of inventor or designer information
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp