CN115423796A

Movatterモバイル変換

Info

Publication number: CN115423796A
Application number: CN202211155766.8A
Authority: CN
Inventors: 张恒; 赵洪坪; 杭芹; 程成; 何云玲; 郭家新
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2022-09-22
Filing date: 2022-09-22
Publication date: 2022-12-02
Anticipated expiration: 2042-09-22
Also published as: CN115423796B

Abstract

The invention belongs to the technical field of chip defect detection, and particularly relates to a chip defect detection method and system based on TensorRT accelerated reasoning; the method comprises the following steps: acquiring a chip image data set and preprocessing the chip image data set; training the improved YOLOv5 model by using a chip image data set to obtain a plurality of target detection models; converting all target detection models into TensorRT models and splicing the TensorRT models; processing the chip image to be detected by adopting the spliced TensorRT model to obtain an inference result; performing dimensionality reduction and redundancy removal processing on the inference result to obtain a defect detection result of the chip to be detected; the invention fundamentally solves the defects of the quality inspection speed and precision of the existing equipment, has higher detection efficiency and higher speed, saves the labor cost and has high practicability.

Description

Translated fromChinese

一种基于TensorRT加速推理的芯片缺陷检测方法及系统A chip defect detection method and system based on TensorRT accelerated reasoning

技术领域technical field

本发明属于芯片缺陷检测技术领域，具体涉及一种基于TensorRT加速推理的芯片缺陷检测方法及系统。The invention belongs to the technical field of chip defect detection, and in particular relates to a chip defect detection method and system based on TensorRT accelerated reasoning.

背景技术Background technique

近年来，随着我国经济增长以及人工智能化时代的到来，各行各业都加入数字化转型的队伍，电子设备已逐渐成为人们工作生活中不可缺少的部分，由于芯片属于各电子设备正常运行的核心，芯片的需求量也在不断攀升，一旦芯片出现质量问题，将带来巨大损失，所以供应厂商对芯片质检的把控不容忽视。目标检测技术是近年来计算机视觉领域的研究热点，随着深度学习技术的快速发展，目标检测取得了巨大进展，该项技术借助卷积神经网络，利用已经标注好的数据集对网络参数进行训练，自动提取目标中的潜在特征，最终完成目标的分类与定位。In recent years, with my country's economic growth and the advent of the era of artificial intelligence, all walks of life have joined the ranks of digital transformation, and electronic equipment has gradually become an indispensable part of people's work and life. Since chips belong to the core of the normal operation of various electronic equipment , The demand for chips is also rising. Once the chip has quality problems, it will bring huge losses. Therefore, the supplier's control of chip quality inspection cannot be ignored. Target detection technology is a research hotspot in the field of computer vision in recent years. With the rapid development of deep learning technology, target detection has made great progress. This technology uses convolutional neural networks to train network parameters using marked data sets. , automatically extract the potential features in the target, and finally complete the classification and positioning of the target.

由于近年来产业智能化转型，目标检测技术将广泛应用于产线质检，从最初耗时耗力的人工质检，到如今使用AOI设备代替人工，提高了机械化水平。但是AOI设备的检测算法标准过于单一，效率低下，准确率也存在一定问题，并且不同批次有不同的合格标准，泛化性极差，并未最大化提高人工复查效率。于是将深度学习与目标检测技术相结合。然后应用于芯片质检的方案，能够大大提高产线质检效率与产品的生产质量。Due to the intelligent transformation of the industry in recent years, target detection technology will be widely used in production line quality inspection. From the time-consuming and labor-intensive manual quality inspection at the beginning, to the use of AOI equipment instead of manual labor, the level of mechanization has been improved. However, the detection algorithm standard of AOI equipment is too single, the efficiency is low, and there are certain problems in the accuracy rate, and different batches have different qualification standards, the generalization is extremely poor, and the efficiency of manual review has not been maximized. So deep learning is combined with target detection technology. Then it is applied to the chip quality inspection scheme, which can greatly improve the efficiency of production line quality inspection and the production quality of products.

目前，基于深度学习的目标检测算法主要分为单阶段检测算法和双阶段检测算法。其中，单阶段检测算法主要有YOLO、SSD等，之所以称为单阶段检测算法，是因为该类算法在训练过程中，生成预测框的同时，进行分类与回归操作；双阶段检测方法主要有RCNN系列等，该类方法首先通过RPN网络产生的一系列感兴趣区域，然后将这些感兴趣区域送入卷积神经网络进行分类与回归操作。At present, target detection algorithms based on deep learning are mainly divided into single-stage detection algorithms and two-stage detection algorithms. Among them, single-stage detection algorithms mainly include YOLO, SSD, etc., which are called single-stage detection algorithms because they perform classification and regression operations while generating prediction frames during the training process; two-stage detection methods mainly include RCNN series, etc., this type of method first passes through a series of regions of interest generated by the RPN network, and then sends these regions of interest to the convolutional neural network for classification and regression operations.

在基于深度学习的目标检测技术发展的同时，量化技术也逐渐成熟，量化是将训练好的模型转为低精度表示与计算。常用的软件工具就包括NVIDIA的TensorRT，TensorRT是一种深度学习推理引擎，是一套从模型获得，到模型优化与编译，再到模型部署的完整工具。TensorRT支持当前常用的Pytorch、Tensorflow、Caffe等主流框架。While the target detection technology based on deep learning is developing, the quantization technology is gradually maturing. Quantization is to convert the trained model into low-precision representation and calculation. Commonly used software tools include NVIDIA's TensorRT. TensorRT is a deep learning inference engine. It is a complete set of tools from model acquisition, to model optimization and compilation, to model deployment. TensorRT supports commonly used mainstream frameworks such as Pytorch, Tensorflow, and Caffe.

由于芯片需求量与日俱增，质量要求更不容忽视，所以工业产线质检对检测速度、精度要求更高，所以，发明一种能在保证检测速度的同时兼顾检测精度的芯片缺陷检测方法是有待解决的问题。Due to the ever-increasing demand for chips, the quality requirements cannot be ignored, so the quality inspection of industrial production lines has higher requirements for detection speed and accuracy. The problem.

发明内容Contents of the invention

针对现有技术存在的不足，本发明提出了一种基于TensorRT加速推理的芯片缺陷检测方法及系统，该方法包括：Aiming at the deficiencies in the prior art, the present invention proposes a chip defect detection method and system based on TensorRT accelerated reasoning, the method comprising:

S1：获取芯片图像数据集并对其进行预处理，得到处理好的芯片图像数据集；S1: Obtain the chip image data set and preprocess it to obtain the processed chip image data set;

S2：使用芯片图像数据集对改进YOLOv5模型进行训练，得到多个目标检测模型；S2: Use the chip image dataset to train the improved YOLOv5 model to obtain multiple target detection models;

S3：将所有目标检测模型转换成TensorRT模型并将TensorRT模型拼接；S3: Convert all target detection models into TensorRT models and stitch TensorRT models;

S4：获取待检测芯片图像，采用拼接后的TensorRT模型对待检测芯片图像进行处理，得到推理结果；S4: Obtain the image of the chip to be tested, and use the spliced TensorRT model to process the image of the chip to be tested to obtain the inference result;

S5：对推理结果进行降维处理；采用改进NMS算法对降维处理后的推理结果进行去冗余处理，得到待检测芯片的缺陷检测结果。S5: Perform dimensionality reduction processing on the inference results; use the improved NMS algorithm to perform de-redundancy processing on the inference results after dimensionality reduction processing, and obtain defect detection results of the chips to be detected.

优选的，对改进YOLOv5模型进行训练的过程包括：改进YOLOv5模型包括backbone网络、neck网络和head网络；Preferably, the process of training the improved YOLOv5 model includes: the improved YOLOv5 model includes a backbone network, a neck network and a head network;

backbone网络中采用形变卷积提取特征，采用backbone网络对芯片图像进行处理，得到不同尺寸的特征图；The backbone network uses deformable convolution to extract features, and the backbone network is used to process the chip image to obtain feature maps of different sizes;

neck网络融合不同尺寸的特征图，得到融合特征图；The neck network fuses feature maps of different sizes to obtain a fusion feature map;

head网络对融合特征图进行处理，得到预测结果；The head network processes the fusion feature map to obtain the prediction result;

采用总损失函数对改进YOLOv5模型的参数进行调整，得到训练好的改进YOLOv5模型。The parameters of the improved YOLOv5 model are adjusted by using the total loss function to obtain the trained improved YOLOv5 model.

进一步的，采用形变卷积提取特征的公式为：Further, the formula for feature extraction using deformed convolution is:

其中，y(p₀)表示位置点p₀在输出特征图上的位置，w(p_n)表示位置点Pn的权重，R表示规则网格，Δp_n表示偏移量，ω_n表示偏移量Δp_n的权重，x()表示点在输入特征图上的位置。Among them, y(p₀ ) represents the position of the position point p₀ on the output feature map, w(p_n ) represents the weight of the position point Pn, R represents the regular grid, Δp_n represents the offset, and ω_n represents the offset The weight of the quantity Δp_n , x() indicates the position of the point on the input feature map.

进一步的，总损失函数为：Further, the total loss function is:

Loss＝w₁·loss_cls+w₂·loss_reg+w₃·loss_objLoss＝w₁ loss_cls +w₂ loss_reg +w₃ loss_obj

其中，loss_cls表示分类损失，loss_reg表示定位损失，loss_obj表示置信度损失，w₁、w₂、w₃分别对应三种损失的权重。Among them, loss_cls represents the classification loss, loss_reg represents the positioning loss, loss_obj represents the confidence loss, and w₁ , w₂ , and w₃ correspond to the weights of the three losses, respectively.

进一步的，分类损失为：Further, the classification loss is:

其中，n表示样本的总数量，x表示样本，y_gt表示标签，y_p表示预测输出，x表示样本；ω₁表示第一调节权重，ω₂表示第二调节权重，sample()表示求和。Among them, n represents the total number of samples, x represents the sample, y_gt represents the label, y_p represents the predicted output, x represents the sample; ω₁ represents the first adjustment weight, ω₂ represents the second adjustment weight, and sample() represents the sum .

优选的，对推理结果进行降维处理的公式为：Preferably, the formula for performing dimensionality reduction processing on the inference result is:

length＝box_num*box_{pram_num}*memory_size+1length＝box_num *box_{pram_num} *memory_size+1

其中，length表示降维后存储推理结果的数组长度，box_num表示预测框的数量，box_{pram_num}表示预测框中参数的数量，memory_size表示数据类型占内存的大小。Among them, length indicates the length of the array for storing inference results after dimensionality reduction, box_num indicates the number of prediction boxes, box_{pram_num} indicates the number of parameters in the prediction box, and memory_size indicates the size of the memory occupied by the data type.

优选的，推理结果包括多个预测框信息，预测框信息包括预测框的坐标、长度、宽度和置信度。Preferably, the reasoning result includes a plurality of predicted frame information, and the predicted frame information includes coordinates, length, width and confidence of the predicted frame.

优选的，采用改进NMS算法对降维处理后的推理结果进行去冗余处理的过程包括：Preferably, the process of using the improved NMS algorithm to de-redundantly process the reasoning results after dimensionality reduction processing includes:

设置交叠率阈值，计算最大置信度的预测框与其他预测框的交叠率，去除其他框中交叠率大于交叠率阈值的预测框；Set the overlap rate threshold, calculate the overlap rate between the prediction frame with the maximum confidence and other prediction frames, and remove the prediction frames whose overlap rate is greater than the overlap rate threshold in other frames;

设置小目标、中目标和大目标的尺寸范围；将芯片图像数据集中的缺陷根据其最长边长划分为小目标、中目标和大目标；分别取落在三种目标下的所有缺陷的最小边尺寸作为小目标阈值尺寸、中目标阈值尺寸和大目标阈值尺寸；Set the size range of small targets, medium targets and large targets; divide the defects in the chip image data set into small targets, medium targets and large targets according to their longest side lengths; respectively take the minimum of all defects falling under the three targets Edge size as small object threshold size, medium object threshold size and large object threshold size;

选取余下预测框中置信度最高的预测框，将预测框的最小边与最接近最小边的阈值尺寸作比较，若最小边小于该阈值尺寸，则将预测框的最小边扩展到该阈值尺寸；Select the prediction frame with the highest confidence in the remaining prediction frames, compare the minimum side of the prediction frame with the threshold size closest to the minimum side, if the minimum side is smaller than the threshold size, expand the minimum side of the prediction frame to the threshold size;

计算扩展尺寸后的预测框与余下预测框的交叠率，若交叠率大于交叠率阈值，则去除余下预测框中交叠率大于交叠率阈值的预测框。Calculate the overlap rate between the expanded prediction frame and the remaining prediction frames, and if the overlap rate is greater than the overlap rate threshold, remove the prediction frames in the remaining prediction frames whose overlap rate is greater than the overlap rate threshold.

一种基于TensorRT加速推理的芯片缺陷检测系统，包括：图像处理模块、目标检测模块、推理结果再处理模块和检测结果显示模块；A chip defect detection system based on TensorRT accelerated reasoning, including: an image processing module, a target detection module, an inference result reprocessing module, and a detection result display module;

所述图像处理模块用于对待检测芯片图像进行预处理；The image processing module is used for preprocessing the image of the chip to be detected;

所述目标检测模块用于对预处理后的待检测芯片图像进行缺陷检测，得到推理结果；The target detection module is used to perform defect detection on the preprocessed image of the chip to be detected to obtain an inference result;

所述推理结果再处理模块用于对推理结果进行再处理，去除冗余预测框，得到缺陷检测结果；The inference result reprocessing module is used to reprocess the inference results, remove redundant prediction frames, and obtain defect detection results;

所述检测结果显示模块根据芯片合格标准分析缺陷检测结果并输出分析结果。The detection result display module analyzes the defect detection result according to the chip qualification standard and outputs the analysis result.

本发明的有益效果为：本发明将普通卷积变成形变卷积增强网络提取特征的能力，将pytorch框架下的.pt格式的目标检测模型转换成.engine格式的TensorRT模型，加快了检测速度；利用多个.engine模型推理检测，提高了检测精度；本发明结合深度学习、目标检测技术与加速推理框架，实时精准检测芯片缺陷，从根本上解决现有设备质检速度与精度上的不足，检测效率更高、速度更快，节约了人力成本，实用性高。The beneficial effects of the present invention are as follows: the present invention transforms ordinary convolution into deformed convolution to enhance the ability of the network to extract features, converts the target detection model in .pt format under the pytorch framework into the TensorRT model in .engine format, and speeds up the detection speed ;Using multiple .engine models for reasoning detection improves detection accuracy; the present invention combines deep learning, target detection technology and accelerated reasoning framework to accurately detect chip defects in real time and fundamentally solve the shortcomings of existing equipment quality inspection speed and accuracy , higher detection efficiency, faster speed, saving labor costs, and high practicability.

附图说明Description of drawings

图1为本发明中基于TensorRT加速推理的芯片缺陷检测方法流程图。FIG. 1 is a flow chart of a chip defect detection method based on TensorRT accelerated reasoning in the present invention.

具体实施方式detailed description

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

本发明提出了一种基于TensorRT加速推理的芯片缺陷检测方法及系统，如图1所示，所述方法包括以下内容：The present invention proposes a chip defect detection method and system based on TensorRT accelerated reasoning, as shown in Figure 1, the method includes the following:

S1：获取芯片图像数据集并对其进行预处理，得到处理好的芯片图像数据集。S1: Obtain the chip image data set and preprocess it to obtain the processed chip image data set.

可从生产线上拍摄获得芯片图像，使用标注工具对图像中的缺陷目标进行标注，以VOC格式制作数据集，然后将数据集按照比例划分为训练集和测试集，对训练集图像数据进行预处理，将图像缩放至适合网络结构的尺寸，得到芯片图像数据集，数据集中的芯片图像长、宽均相同。The chip image can be taken from the production line, and the defect target in the image can be marked with the labeling tool, and the data set can be made in VOC format, and then the data set can be divided into training set and test set according to the proportion, and the image data of the training set can be preprocessed , scale the image to a size suitable for the network structure, and obtain the chip image data set. The length and width of the chip images in the data set are the same.

S2：使用芯片图像数据集对改进YOLOv5模型进行训练，得到多个目标检测模型。S2: Use the chip image dataset to train the improved YOLOv5 model to obtain multiple target detection models.

传统的YOLOv5模型是Ultralytics公司于2020年6月9日公开发布的基于YOLOv3模型基础上改进而来；本发明对传统YOLOv5模型进行改进，改进的YOLOv5模型包括backbone网络(骨干网络，用于特征提取)、neck网络(颈部，用于连接backbone和head)和head网络(头部，做最后的输出预测)；首先，将预处理后得到的芯片图像数据集输入到backbone提取特征，backbone通过对处理后的图像进行卷积池化操作提取到不同尺度的特征图，再将得到的不同尺度的特征图输入到neck部分进行特征融合，最后将融合后的特征输入到head部分做分类和边界框回归处理得到预测框。其中backbone网络中采用形变卷积提取特征；具体的，本发明将backbone网络部分最后3层的3*3普通卷积改为形变卷积来提取特征。形变卷积是基于用额外的偏移量来增加模块中的空间采样位置，并从目标任务中学习偏移量，而不需要额外的监督。在标准卷积中的常规网格采样位置添加二维偏移量，可使采样网络能够形成自由形式的变形，偏移量是通过额外的卷积层从前面的特征图中学习到的。因此，变形是以局部、密集和自适应的方式影响输入特征的。The traditional YOLOv5 model is improved on the basis of the YOLOv3 model released by Ultralytics on June 9, 2020; the present invention improves the traditional YOLOv5 model, and the improved YOLOv5 model includes a backbone network (backbone network, used for feature extraction ), neck network (neck, used to connect backbone and head) and head network (head, for final output prediction); first, the chip image dataset obtained after preprocessing is input to the backbone to extract features, and the backbone extracts features by The processed image performs convolution pooling operation to extract feature maps of different scales, and then input the obtained feature maps of different scales into the neck part for feature fusion, and finally input the fused features into the head part for classification and bounding box Regression processing gets prediction boxes. Among them, deformable convolution is used to extract features in the backbone network; specifically, the present invention changes the 3*3 ordinary convolution of the last 3 layers of the backbone network to deformable convolution to extract features. Deformable convolutions are based on augmenting the spatial sampling locations in modules with additional offsets, and learn the offsets from the target task without additional supervision. Adding a 2D offset to the regular grid sampling positions in standard convolutions enables the sampling network to form free-form deformations, the offsets being learned from previous feature maps via additional convolutional layers. Therefore, the deformation affects the input features in a local, dense and adaptive manner.

在输入特征图X即芯片图像数据上使用规则网格R进行采样，网格R定义了接受场的大小和扩张：Sampling is performed using a regular grid R on the input feature map X, which is the chip image data, and the grid R defines the size and expansion of the receptive field:

R＝{(-1,-1),(-1,0),......,(0,1),(1,1)}R={(-1,-1),(-1,0),...,(0,1),(1,1)}

对于形变卷积的输出映射y上的位置，有：For the position on the output map y of the deformed convolution, there are:

其中，y(p₀)表示位置点p₀在输出特征图的位置，x()表示某点在输入特征图上的初始位置；Pn表示R中的第n个位置，Pn枚举了R中的位置；w(p_n)表示Pn中第n个位置的权重，x(p₀+p_n)表示输入特征图上p₀关于接受场R中各点的位置，在可变形卷积中，规则网格R增加了偏移量{Δp_n|n＝1,2,...,N}，其中N＝|R|，偏移量的计算在格外的卷积层进行，对原图使用普通卷积核得到2N个偏移量，即x方向和y方向各有N个，在使用形变卷积的前一层卷积网络新增一个专门用于计算偏移量的卷积层，针对两个方向分别对应两个卷积层进行计算，得到Δp_n＝p(Δx_n,Δy_n)，上式变为：Among them, y(p₀ ) indicates the position of the position point p₀ in the output feature map, x() indicates the initial position of a point on the input feature map; Pn indicates the nth position in R, and Pn enumerates the position; w(p_n ) represents the weight of the nth position in Pn, x(p₀ +p_n ) represents the position of p₀ on the input feature map with respect to each point in the receptive field R, in deformable convolution, The regular grid R adds an offset {Δp_n |n=1,2,...,N}, where N=|R|, the calculation of the offset is performed in an extra convolutional layer, and the original image is used Ordinary convolution kernels get 2N offsets, that is, N in the x direction and N in the y direction. A convolutional layer dedicated to calculating offsets is added to the previous layer of convolutional network using deformed convolution. The two directions are calculated corresponding to two convolutional layers, and Δp_n =p(Δx_n ,Δy_n ), the above formula becomes:

采样是在不规则和偏移位置Pn+ΔPn上，由于偏移量常是小数，所以采用双线性插值将偏移量分解为x，y方向上的整数：Sampling is at the irregular and offset position Pn+ΔPn. Since the offset is often a decimal, bilinear interpolation is used to decompose the offset into integers in the x and y directions:

其中，P表示一个任意位置，q枚举了特征图X的所有积分空间位置，G()是双线性插值核，G是一个二维的，它被分成两个一维的内核。Among them, P represents an arbitrary position, q enumerates all the integral space positions of the feature map X, G() is a bilinear interpolation kernel, and G is a two-dimensional one, which is divided into two one-dimensional kernels.

G(p,q)＝g(q_x,p_x)·g(q_y,p_y)G(p,q)＝g(q_x ,p_x )·g(q_y ,p_y )

g(a,b)＝max(0,1-|a-b|)g(a,b)=max(0,1-|a-b|)

由于G(q,p)只对几个q是非零的，x(p)的计算非常快。Since G(q,p) is nonzero only for a few qs, the computation of x(p) is very fast.

偏移量是在相同的输入特征映射上应用一个卷积层来获得的，卷积核与当前卷积层具有相同的空间分辨率和扩展度。输出偏移字段与输入特征映射具有相同的空间分辨率，在训练过程中，同时学习生成输出特征的卷积核和偏移量。为每一个偏移量引入一个权重参数，在提取特征时，根据提取到的roi(region of interest，感兴趣区域)与真实框对比，以该roi可代表真实框的概率值作为Δp_n的权重系数ω_n，所以计算卷积后的特征映射的表达式变为：The offset is obtained by applying a convolutional layer on the same input feature map with the same spatial resolution and expansion as the current convolutional layer. The output offset field has the same spatial resolution as the input feature map, and during training, the convolution kernel and offset that generate the output features are learned simultaneously. Introduce a weight parameter for each offset. When extracting features, compare the extracted roi (region of interest) with the real frame, and use the probability value that the roi can represent the real frame as the weight of Δp_n Coefficient ω_n , so the expression for calculating the feature map after convolution becomes:

芯片图像经过backbone网络处理后，得到不同尺寸的特征图；neck网络将不同尺寸的特征图进行融合，从而得到既包含丰富的位置信息又包含丰富的语义信息的融合特征图。After the chip image is processed by the backbone network, feature maps of different sizes are obtained; the neck network fuses the feature maps of different sizes to obtain a fusion feature map that contains rich positional information and rich semantic information.

head网络对融合特征图进行处理，得到预测结果；预测结果为多个预测框信息，包括预测框的坐标、长度、宽度和置信度。在该部分中，采用总损失函数对改进YOLOv5模型的参数进行调整；总损失函数分为三部分，分别是：用于计算锚框与对应的真实框分类是否正确的分类损失、计算预测框与真实框之间的误差的定位损失、用于计算网络的置信度的置信度损失；分类损失和置信度损失使用交叉熵损失函数计算，定位损失使用IOU(Intersection over Union，交并比)损失函数计算；通过损失函数得到损失值，然后通过反向传播对模型参数进行更新，可得到训练好的改进YOLOv5模型即目标检测模型，采用不同的训练数据可得到多个不同的目标检测模型。本发明对分类损失和置信度损失进行改进，改进后的分类损失为：The head network processes the fusion feature map to obtain the prediction result; the prediction result is the information of multiple prediction frames, including the coordinates, length, width and confidence of the prediction frame. In this part, the total loss function is used to adjust the parameters of the improved YOLOv5 model; the total loss function is divided into three parts, namely: the classification loss used to calculate whether the classification of the anchor frame and the corresponding real frame is correct, the calculation of the prediction frame and The positioning loss of the error between the real frames, the confidence loss used to calculate the confidence of the network; the classification loss and confidence loss are calculated using the cross entropy loss function, and the positioning loss uses the IOU (Intersection over Union) loss function Calculation; the loss value is obtained through the loss function, and then the model parameters are updated through backpropagation to obtain the trained improved YOLOv5 model, which is the target detection model. Multiple different target detection models can be obtained by using different training data. The present invention improves classification loss and confidence loss, and the improved classification loss is:

其中，loss_cls表示分类损失，loss_obj表示置信度损失，n表示样本的总数量，x表示样本，一个样本即一个输入；y_gt表示标签，y_p表示预测输出，q_o表示预测输出的置信度分数，p_iou表示预测框与对应的目标框的iou(交叠率)值；ω₁为第一调节权重，ω₂为第二调节权重，ω₁，ω₂均用于调节正负样本的权重，正样本表示识别为缺陷的预测框，负样本表示识别为非缺陷的预测框；用于调和正负样本之间的平衡，其中，sample()表示求和，sample(y_p)表示对所有预测输出y_p求和。Among them, loss_cls represents the classification loss, loss_obj represents the confidence loss, n represents the total number of samples, x represents the sample, and one sample is one input; y_gt represents the label, y_p represents the predicted output, and q_o represents the confidence of the predicted output degree score, p_iou represents the iou (overlap rate) value of the prediction frame and the corresponding target frame; ω₁ is the first adjustment weight, ω₂ is the second adjustment weight, ω₁ and ω₂ are both used to adjust positive and negative samples The weight of , the positive sample represents the prediction frame identified as a defect, and the negative sample represents the prediction frame recognized as a non-defect; it is used to reconcile the balance between positive and negative samples, where sample() means summation, and sample(y_p ) means Sum over all predicted outputs y_p .

定位损失为：The positioning loss is:

loss_reg＝1-IOUloss_reg ＝1-IOU

其中，IOU表示两个预测框的交并比，即两个预测框的交集面积与两个预测框的并集面积的比值。Among them, IOU represents the intersection and union ratio of two prediction frames, that is, the ratio of the intersection area of two prediction frames to the union area of two prediction frames.

总损失函数为：The total loss function is:

原交叉熵损失函数是平等对待正负样本的，当正负样本有不均衡的情况时，比如负样本小于正样本时，正样本总的损失会远大于负样本总的损失，这样模型的学习会逐渐偏向正样本，而不会考虑负样本的影响。本发明通过改进损失函数，添加权重参数用于调整调和正负样本的不均衡，ω₁随(1-a)的变化趋势变化，ω₂随a的变化趋势变化(a表示预测标签或者预测输出的置信度分数)，比如，当输出是正样本，并且a的值较大，说明是一个易分的正样本，此时由于ω₁随(1-a)的趋势，所以正样本的易分样本损失就会被减小，从而抑制正样本数量；进而使得模型的预测结果更加精确。The original cross-entropy loss function treats positive and negative samples equally. When the positive and negative samples are unbalanced, for example, when the negative sample is smaller than the positive sample, the total loss of the positive sample will be much greater than the total loss of the negative sample. In this way, the learning of the model It will gradually favor positive samples without considering the influence of negative samples. The present invention improves the loss function and adds weight parameters to adjust and reconcile the imbalance of positive and negative samples. ω₁ changes with the change trend of (1-a), and ω₂ changes with the change trend of a (a represents the predicted label or predicted output confidence score), for example, when the output is a positive sample and the value of a is large, it means that it is an easy-to-segment positive sample. At this time, since ω₁ follows the trend of (1-a), the easy-to-segment sample of the positive sample The loss will be reduced, thereby suppressing the number of positive samples; thus making the prediction results of the model more accurate.

S3：将所有目标检测模型转换成TensorRT模型并将TensorRT模型拼接。S3: Convert all target detection models into TensorRT models and stitch TensorRT models.

将.pt格式的目标检测模型转换成.engine格式的TensorRT模型，在转换期间将会完成优化过程中的层间融合和精度校准。层间融合阶段中，pytorch框架下的每一层张量作为输入，将层与层之间进行横向或者纵向合并，横向合并将卷积、偏置和激活函数合并为一个CBR结构，只占用一个CUDA核心；纵向合并能将结构相同、权值不同的层合并为一个更宽的层，也只占用一个CUDA核心。层间融合将使层的数量减少，因为在模型推理时每一层的计算操作实际上都是GPU通过调用不同的CUDA核心完成计算，虽然CUDA计算张量的速度很快，但是时间会浪费在CUDA核心的启动与对每一层张量的读写操作上，将会造成空间不足，所以需要使用层间融合提升计算速度；精度校准过程由tensor rt内部自动完成，该过程中，很多框架使用的张量都是32位精度的浮点数，等到网络训练完成之后，在推理过程中是没有反向传播的过程的，所以本发明将精度设为16位，将张量精度降低，占用内存更少，轻量化模型。Convert the target detection model in .pt format to the TensorRT model in .engine format. During the conversion, the inter-layer fusion and precision calibration in the optimization process will be completed. In the inter-layer fusion stage, the tensor of each layer under the pytorch framework is used as input, and the layers are merged horizontally or vertically. The horizontal merger merges the convolution, bias and activation functions into a CBR structure, which only occupies one CUDA core; vertical merging can merge layers with the same structure and different weights into a wider layer, and only occupies one CUDA core. Inter-layer fusion will reduce the number of layers, because the calculation operation of each layer is actually done by the GPU by calling different CUDA cores during model inference. Although CUDA calculates tensors quickly, time will be wasted in The startup of the CUDA core and the read and write operations on each layer of tensors will cause insufficient space, so it is necessary to use inter-layer fusion to improve the calculation speed; the precision calibration process is automatically completed by tensor rt. During this process, many frameworks use The tensors are all 32-bit precision floating-point numbers. After the network training is completed, there is no backpropagation process in the reasoning process. Therefore, the present invention sets the precision to 16 bits, reduces the tensor precision, and takes up more memory. Less, lightweight model.

由步骤S2中训练得到多个目标检测模型均转换为TensorRT模型后，将多个TensorRT模型拼接，具体的，After the multiple target detection models obtained from the training in step S2 are converted into TensorRT models, multiple TensorRT models are spliced, specifically,

1.依次读取不同的.engine文件，并将其反序列化；1. Read different .engine files in sequence and deserialize them;

2.创建二维engine对象，然后依次反序列化后的文件存入二维engine对象，加载单模型时，采用的数据结构为一维指针，将此处改为以二维动态数组的形式存储，实现模型拼接；2. Create a two-dimensional engine object, and then store the deserialized files in the two-dimensional engine object in sequence. When loading a single model, the data structure used is a one-dimensional pointer, and this is changed to store in the form of a two-dimensional dynamic array , to achieve model splicing;

3.创建context，为模型的输入创建接口；3. Create a context to create an interface for the input of the model;

4.创建buffer，在GPU上为待检测数据的输入输出开辟缓存空间；4. Create a buffer to open up a cache space for the input and output of the data to be detected on the GPU;

S4：获取待检测芯片图像，采用拼接后的TensorRT模型对待检测芯片图像进行处理，得到推理结果。S4: Obtain the image of the chip to be tested, and use the spliced TensorRT model to process the image of the chip to be tested to obtain the inference result.

待检测芯片输入到拼接后的TensorRT模型中进行处理，输出推理结果；推理结果包括多个预测框信息，预测框信息包括预测框的坐标、长度、宽度和置信度。The chip to be tested is input into the spliced TensorRT model for processing, and the inference result is output; the inference result includes multiple prediction frame information, and the prediction frame information includes the coordinates, length, width and confidence of the prediction frame.

使用二维数组对推理结果进行存储。将推理结果展开，计算每个TensorRT模型对应推理结果的数量，并在之后串联下一模型的推理结果。Use a two-dimensional array to store the inference results. Expand the inference results, calculate the number of inference results corresponding to each TensorRT model, and then concatenate the inference results of the next model.

对推理结果进行再处理，去除冗余框。首先，对推理结果进行降维处理，具体的，对推理结果进行降维处理的公式为：Reprocess the inference results to remove redundant frames. First, perform dimensionality reduction on the inference results. Specifically, the formula for dimensionality reduction on the inference results is:

将降维之后的一维数组传入NMS(Non-Maximum Suppression，非极大值抑制)算法中，此时在NMS算法中，需要重新计算数组长度，原有技术因为是单模型推理，指定了推理后输出的长度，数组中包括一些没有存储推理结果的内存，造成推理时间长、空间浪费；而本发明由于使用多个模型推理，存储结构发生改变，进行了模型拼接及维度展开等操作，在进行NMS操作时，只根据现有检测结果做计算，避免空间浪费，节约时间成本。Pass the one-dimensional array after dimensionality reduction into the NMS (Non-Maximum Suppression, non-maximum value suppression) algorithm. At this time, in the NMS algorithm, the length of the array needs to be recalculated. Because the original technology is single-model reasoning, specify The length of the output after inference, the array includes some memory that does not store the inference results, resulting in long inference time and waste of space; and because the present invention uses multiple models for inference, the storage structure changes, and operations such as model splicing and dimension expansion are performed. When performing NMS operations, only calculations are made based on existing detection results, avoiding space waste and saving time and cost.

具体的，采用改进NMS算法对降维处理后的推理结果进行进一步处理的过程包括：Specifically, the process of using the improved NMS algorithm to further process the reasoning results after dimensionality reduction processing includes:

设置交叠率阈值，计算最大置信度的预测框与其他预测框的交叠率iou即两个预测框的交集与并集的比值，然后去除其他框中交叠率大于交叠率阈值的预测框；Set the overlap rate threshold, calculate the overlap rate iou between the prediction frame with the maximum confidence and other prediction frames, that is, the ratio of the intersection and union of the two prediction frames, and then remove the predictions whose overlap rate is greater than the overlap rate threshold in other frames frame;

传统的NMS算法处理之后，当前设置iou阈值不能更多地去除冗余框，针对该情况，本发明使用改进NMS算法改变预测框的尺寸，再次计算iou就能够去除传统算法中因为iou值略低于设置的阈值而被保留的框，从而去除更多的冗余框。经过改进NMS算法处理后，得到待检测芯片的缺陷检测结果；根据芯片合格标准分析待检测芯片的缺陷检测结果，得到分析结果即某芯片是否有缺陷，缺陷的大小和位置，芯片是否合格等结果。After the traditional NMS algorithm processing, the current iou threshold cannot remove more redundant frames. For this situation, the present invention uses the improved NMS algorithm to change the size of the predicted frame, and recalculating the iou can remove the traditional algorithm because the iou value is slightly lower. Boxes that are retained based on the set threshold, thereby removing more redundant boxes. After being processed by the improved NMS algorithm, the defect detection result of the chip to be tested is obtained; the defect detection result of the chip to be tested is analyzed according to the chip qualification standard, and the analysis results are obtained, that is, whether a certain chip is defective, the size and location of the defect, whether the chip is qualified, etc. .

本发明还提出了一种基于TensorRT加速推理的芯片缺陷检测系统，该系统用于执行上述基于TensorRT加速推理的芯片缺陷检测方法，包括：图像处理模块、目标检测模块、推理结果再处理模块和检测结果显示模块；The present invention also proposes a chip defect detection system based on TensorRT accelerated reasoning, which is used to implement the chip defect detection method based on TensorRT accelerated reasoning, including: image processing module, target detection module, reasoning result reprocessing module and detection Result display module;

该系统执行基于TensorRT加速推理的芯片缺陷检测方法的过程与上述基于TensorRT加速推理的芯片缺陷检测方法实现过程类似，此处不再赘述。The process of the system implementing the chip defect detection method based on TensorRT accelerated reasoning is similar to the implementation process of the above-mentioned chip defect detection method based on TensorRT accelerated reasoning, and will not be repeated here.

以上所举实施例，对本发明的目的、技术方案和优点进行了进一步的详细说明，所应理解的是，以上所举实施例仅为本发明的优选实施方式而已，并不用以限制本发明，凡在本发明的精神和原则之内对本发明所作的任何修改、等同替换、改进等，均应包含在本发明的保护范围之内。The above examples have further described the purpose, technical solutions and advantages of the present invention in detail. It should be understood that the above examples are only preferred implementations of the present invention and are not intended to limit the present invention. Any modification, equivalent replacement, improvement, etc. made to the present invention within the spirit and principles of the present invention shall be included within the protection scope of the present invention.

Claims

Translated fromChinese

1.一种基于TensorRT加速推理的芯片缺陷检测方法，其特征在于，包括：1. A chip defect detection method based on TensorRT accelerated reasoning, characterized in that, comprising:

2.根据权利要求1所述的一种基于TensorRT加速推理的芯片缺陷检测方法，其特征在于，对改进YOLOv5模型进行训练的过程包括：改进YOLOv5模型包括backbone网络、neck网络和head网络；2. A chip defect detection method based on TensorRT accelerated reasoning according to claim 1, wherein the process of training the improved YOLOv5 model comprises: the improved YOLOv5 model includes a backbone network, a neck network and a head network;

3.根据权利要求2所述的一种基于TensorRT加速推理的芯片缺陷检测方法，其特征在于，采用形变卷积提取特征的公式为：3. A chip defect detection method based on TensorRT accelerated reasoning according to claim 2, wherein the formula for extracting features using deformed convolution is:

4.根据权利要求2所述的一种基于TensorRT加速推理的芯片缺陷检测方法，其特征在于，总损失函数为：4. A kind of chip defect detection method based on TensorRT accelerated reasoning according to claim 2, is characterized in that, total loss function is:

其中，less_cls表示分类损失，loss_reg表示定位损失，loss_obj表示置信度损失，w₁、w₂、w₃分别对应三种损失的权重。Among them, less_cls represents the classification loss, loss_reg represents the positioning loss, loss_obj represents the confidence loss, and w₁ , w₂ , and w₃ correspond to the weights of the three losses, respectively.

5.根据权利要求4所述的一种基于TensorRT加速推理的芯片缺陷检测方法，其特征在于，分类损失为：5. A kind of chip defect detection method based on TensorRT accelerated reasoning according to claim 4, is characterized in that, classification loss is:

6.根据权利要求1所述的一种基于TensorRT加速推理的芯片缺陷检测方法，其特征在于，对推理结果进行降维处理的公式为：6. A chip defect detection method based on TensorRT accelerated reasoning according to claim 1, wherein the formula for dimensionality reduction processing of reasoning results is:

7.根据权利要求1所述的一种基于TensorRT加速推理的芯片缺陷检测方法，其特征在于，推理结果包括多个预测框信息，预测框信息包括预测框的坐标、长度、宽度和置信度。7. A chip defect detection method based on TensorRT accelerated reasoning according to claim 1, wherein the reasoning result includes a plurality of predicted frame information, and the predicted frame information includes the coordinates, length, width and confidence of the predicted frame.

8.根据权利要求1所述的一种基于TensorRT加速推理的芯片缺陷检测方法，其特征在于，采用改进NMS算法对降维处理后的推理结果进行去冗余处理的过程包括：8. A kind of chip defect detection method based on TensorRT accelerated reasoning according to claim 1, it is characterized in that, adopting improved NMS algorithm to carry out the process of de-redundancy processing to the reasoning result after dimensionality reduction processing comprises:

9.一种基于TensorRT加速推理的芯片缺陷检测系统，该系统用于执行权利要求1～8中任意一项基于TensorRT加速推理的芯片缺陷检测方法，其特征在于，包括：图像处理模块、目标检测模块、推理结果再处理模块和检测结果显示模块；9. A chip defect detection system based on TensorRT accelerated reasoning, the system is used to perform any one of the chip defect detection methods based on TensorRT accelerated reasoning in claims 1 to 8, characterized in that it includes: image processing module, target detection module, reasoning result reprocessing module and detection result display module;