Movatterモバイル変換


[0]ホーム

URL:


CN116703819B - A damage detection method for railway freight car steel floor based on knowledge distillation - Google Patents

A damage detection method for railway freight car steel floor based on knowledge distillation

Info

Publication number
CN116703819B
CN116703819BCN202310399454.XACN202310399454ACN116703819BCN 116703819 BCN116703819 BCN 116703819BCN 202310399454 ACN202310399454 ACN 202310399454ACN 116703819 BCN116703819 BCN 116703819B
Authority
CN
China
Prior art keywords
model
layer
convolution layers
group
encoder
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310399454.XA
Other languages
Chinese (zh)
Other versions
CN116703819A (en
Inventor
杨绿溪
谢昂
郑志刚
李春国
黄永明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast UniversityfiledCriticalSoutheast University
Priority to CN202310399454.XApriorityCriticalpatent/CN116703819B/en
Publication of CN116703819ApublicationCriticalpatent/CN116703819A/en
Application grantedgrantedCritical
Publication of CN116703819BpublicationCriticalpatent/CN116703819B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

The invention provides a rail wagon steel floor breakage detection method based on knowledge distillation, which comprises the following steps of obtaining a rail wagon steel floor region image, constructing a training set, constructing a steel floor breakage detection teacher network and a student network, training, distilling the student network by using the teacher network, obtaining a final fault detection model by adjusting parameters, obtaining an image to be detected, processing the image, and inputting the fault detection model to obtain a steel floor breakage detection result. Based on the deep convolutional neural network and knowledge distillation, the structure of an encoder and a decoder is adopted, and prediction matching between queries is established through progressive multi-stage knowledge distillation so as to gradually transfer useful knowledge to a student model. The automatic detection method with high accuracy and high precision solves the problem that visual fatigue caused by fault judgment can only be carried out by naked eyes of dynamic inspectors at the present stage to cause false leakage detection.

Description

Rail wagon steel floor damage detection method based on knowledge distillation
Technical Field
The invention belongs to the field of target detection in computer vision, and particularly relates to a rail wagon steel floor breakage detection method based on knowledge distillation.
Background
In recent years, the deep convolutional neural network model is continuously complicated, parameters and calculated amount are continuously expanded, and great challenges are brought to calculation resources and storage resources. In order to solve the problem that the embedded equipment with limited resources is difficult to deploy, some neural network model compression technologies are adopted to reduce the volume and the calculated amount of the model, so that the deep learning model is efficiently deployed in the environment with limited resources.
Knowledge distillation is a commonly used model compression method and is applied to the field of image classification. The knowledge distillation generally trains a complex model with excellent performance as a teacher model, and then uses the knowledge learned by the teacher model to guide training a simpler student model, so that the performance of the student model is equivalent to that of the teacher model, but the quantity and complexity of network parameters are greatly reduced, thereby realizing compression and acceleration of the model.
Although knowledge distillation achieves good effect in the traditional target detection method based on the convolutional neural network, the traditional algorithm is difficult to directly apply to the target detection method based on the transducer because the architecture of the convolutional neural network and Transomer are not the same. This is because in the convolutional neural network-based target detection method, target information is carried by an image feature map, whereas in the transform-based target detection method, target information is mainly encoded in a query vector, and this great difference causes a significant difference in the target information feature distribution in the two detection methods.
Disclosure of Invention
The invention aims to solve the problems that a target detection method model based on a Transformer network is huge and can not be deployed in a storage space and a terminal with limited computing resources, and introduces knowledge distillation into the target detection method based on the Transformer, and provides a rail wagon steel floor breakage detection method based on the knowledge distillation, which comprises the following steps:
Step 1, acquiring images of a plurality of angles at the bottom of a train;
Step 2, selecting a picture containing the damage of the steel floor and a picture without the damage, marking the damaged part, and distinguishing a sample without the failure from a small number of samples with the failure;
Step 3, constructing a fault detection model, including a teacher model and a student model;
step 4, training a teacher model and a student model in parallel by using the train bottom image obtained in the step 1, transferring the knowledge of the teacher model to the student model by utilizing progressive multi-stage knowledge distillation and teacher feature distillation, transferring the dark knowledge of a decoder layer in the teacher model to the student model layer by the teacher model, training the student model according to a knowledge distillation loss function to realize knowledge distillation, and finally obtaining a trained fault detection model;
and 5, verifying a model detection result, obtaining an image to be detected, inputting a fault detection model, and calculating an abnormal score to obtain a steel floor breakage detection result.
Furthermore, the teacher model and the student model have the same structure and comprise a main network module, an encoder module, a decoder module and a prediction output module, but the teacher model and the student model use the main network modules with different sizes; the method comprises the steps that after an image passes through a backbone network module, high-dimensional vector information is extracted and sent to an encoder module, the encoder module carries out semantic coding on features and then sends the features to a decoder module, the decoder carries out cross attention on key values of a feature map and corresponding region features, and finally a final detection result is output through a prediction module;
The backbone network module comprises an input layer, a first group of convolution layers, a maximum pooling layer, a second group of convolution layers, a third group of convolution layers, a fourth group of convolution layers and a fifth group of convolution layers which are sequentially connected, wherein the input size of the input layer is 513x513 of image data, the first group of convolution layers sequentially comprises two parts of 17 x7 convolution operation and 1 nonlinear activation function operation, the second group of convolution layers sequentially comprises 9 convolution layers, one nonlinear activation layer and an average pooling layer, the third group of convolution layers sequentially comprises 12 convolution layers, one nonlinear activation layer and an average pooling layer, the fourth group of convolution layers sequentially comprises 69 convolution layers, one nonlinear activation layer and an average pooling layer, the fifth group of convolution layers sequentially comprises 9 convolution layers, one nonlinear activation layer and an average pooling layer, and each convolution layer in the second group of convolution layers to the fifth group of convolution layers sequentially passes through 1x1 convolution, 1x 3 convolution and 1x1 convolution operation.
The encoder module comprises six encoders, wherein the encoders are Transformer encoders, the encoder module is formed by superposing 6 identical encoders, each encoder is provided with two sublayers, the first sublayer is a multi-head self-attention convergence layer, the second sublayer is a feed-forward neural network layer based on positions, and each sublayer adopts residual error linkage. The encoder sums the serialized feature map with the position code to obtain the next query Q and the key value K, sums and normalizes the feature map after passing through a multi-head self-attention layer, obtains the output of a single encoder through a feed-forward network, and is used as the input of the next encoder, and obtains the output of the encoder part after passing through 6 identical encoder structures.
The input of the decoder consists of three parts, namely the output of the encoder, position encoding and query. Wherein the dimensions of the query are (300,4), the first dimension is the number of predefined target queries, and the second dimension is the number of hidden layers. The decoder calculates the first half of the multi-headed self-attention as the encoder operates the same, then calculates the cross-attention with the output of the encoder, and gets the output of the decoder section after 6 identical decoder structures.
The prediction output module comprises a feedforward neural network and full connection. The feedforward neural network layer is mainly divided into two parts, one part predicts the category and the other part predicts the position. The branches of the prediction category consist of a linear layer with a hidden layer dimension of 512. Since there is a background class (empty class), the dimension of the output is the number of classes plus 1. The branch of the feedforward neural network at another predicted position is mainly composed of 3 linear layers with hidden layers and dimension 512, and both branches pass through a sigmoid activation function.
Further, the backbone network of the teacher model is ResNet, and the backbone network of the student model is ResNet.
Further, the loss function of the student model in step 4 is a composite loss function consisting of a teacher soft tag and a student hard tag, and is:
wherein alpha, beta is a super parameter,For the loss of the soft label of the teacher,Hard tag loss for students.
Further, the selected distillation anchor frame does not participate in the counter-propagation of the student model during the distillation training by inquiring and sampling the positive and negative sample anchor frames, and the loss functions of the positive and negative sample anchor frames and the random sample anchor frame are as follows
Wherein, theFor a positive sample anchor frame distillation loss,Is the distillation loss of the anchor frame of the negative sample,Loss for random anchor frame distillation.
The rail wagon steel floor breakage detection method based on knowledge distillation provided by the invention mainly comprises a teacher network model and a student network model, wherein a main network of the teacher network model adopts ResNet to 101, and a main network of the student network model is ResNet. The method comprises the steps of establishing prediction matching among queries through progressive multi-stage knowledge distillation to gradually transfer useful knowledge to a student model, and simultaneously adopting a teacher characteristic distillation method to fully utilize the middle characteristics of a teacher and provide additional information for one-to-one allocation strategy groups in the student model.
Drawings
FIG. 1 is a schematic flow chart of a method for detecting the damage of the steel floor of the railway wagon based on knowledge distillation.
Fig. 2 is a network configuration diagram and a detailed block diagram of the present invention.
Fig. 3 is a block diagram of a teacher model feature distillation.
FIG. 4 is a map index comparison of the improved algorithm of the present invention and the student model algorithm.
Detailed Description
As shown in fig. 1, the method for detecting the damage of the steel floor of the railway wagon based on knowledge distillation comprises the following steps:
and step 1, acquiring images of a plurality of angles at the bottom of the train.
Firstly, acquiring a whole car image of a railway car during running through a high-speed camera, wherein the whole car image comprises a side frame, a middle part and a car coupler buffer part, selecting an image containing the bottom of a steel floor from the images, detecting the acquired image of the bottom,
Screening the obtained pictures, and reserving images containing target parts, wherein the number ratio of the pictures containing broken steel floors to the pictures without breakage is 1:1, marking the broken parts, and distinguishing samples without faults from a small number of samples with faults;
Step 3, constructing a teacher model and a student model;
The encoder modules of the teacher model and the student model have the same structure and are encoder modules formed by 6 layers of transformers encoders.
As shown in fig. 2, the teacher model and the student model both include a backbone network module, an encoder module, a decoder module and a prediction output module, the image is sent to the encoder module after passing through the backbone network module to extract high-dimensional vector information, the encoder module sends the encoded feature to the decoder module after carrying out semantic encoding, the decoder carries out cross attention on the key value of the feature map and the corresponding region feature, and finally the final detection result is output through the prediction module.
The backbone network module comprises an input layer, a first group of convolution layers, a maximum pooling layer, a second group of convolution layers, a third group of convolution layers, a fourth group of convolution layers and a fifth group of convolution layers which are connected in sequence.
The input device comprises an input layer, a first group of convolution layers, a second group of convolution layers, a third group of convolution layers, a fourth group of convolution layers and a fifth group of convolution layers, wherein the input layer comprises image data with the input size of 513x513, the first group of convolution layers sequentially comprises two parts of 17 x7 convolution operation and 1 nonlinear activation function operation, the second group of convolution layers sequentially comprises 9 convolution layers, one nonlinear activation layer and an average pooling layer, the third group of convolution layers sequentially comprises 12 convolution layers, one nonlinear activation layer and the average pooling layer, the fourth group of convolution layers sequentially comprises 69 convolution layers, one nonlinear activation layer and the average pooling layer, the fifth group of convolution layers sequentially comprises 9 convolution layers, one nonlinear activation layer and the average pooling layer, and each convolution layer in the second group of convolution layers to the fifth group of convolution layers sequentially passes through 1x1 convolution operation, 1x 3 convolution layer and 1x1 convolution operation.
The encoder module comprises six encoders, wherein the encoders are Transformer encoders, the encoder module is formed by superposing 6 identical encoders, each encoder is provided with two sublayers, the first sublayer is a multi-head self-attention convergence layer, the second sublayer is a feed-forward neural network layer based on positions, and each sublayer adopts residual error links. The encoder sums the serialized feature map with the position code to obtain the next query Q and the key value K, sums and normalizes the feature map after passing through a multi-head self-attention layer, obtains the output of a single encoder through a feed-forward network, and is used as the input of the next encoder, and obtains the output of the encoder part after passing through 6 identical encoder structures.
The input of the decoder consists of three parts, namely the output of the encoder, position coding and query. Wherein the dimensions of the query are (300,4), the first dimension is the number of predefined target queries, and the second dimension is the number of hidden layers. That is, the decoder translates 300 queries into 300 targets based on the encoded characteristics of the encoder. The decoder calculates the first half of the multi-headed self-attention as the encoder operates the same, then calculates the cross-attention with the output of the encoder, and gets the output of the decoder section after 6 identical decoder structures.
And the prediction output module is used for continuing to calculate according to the characteristics output by the decoder and mainly comprises a feedforward neural network and a full-connection layer. The feedforward neural network layer is mainly divided into two parts, one part predicts the category and the other part predicts the position. The branches of the prediction category consist of a linear layer with a hidden layer dimension of 512. Since there is a background class (empty class), the dimension of the output is the number of classes plus 1. The branch of the feedforward neural network at another predicted position is mainly composed of 3 linear layers with hidden layers and dimension 512, and both branches pass through a sigmoid activation function.
And 3, as shown in fig. 3, training a teacher model and a student model in parallel by using the bottom image of the train obtained in the step 1, transferring the knowledge of the teacher model to the student model by using progressive multi-stage knowledge distillation and teacher feature distillation, performing one-to-one matching on forward reasoning results of the teacher model and the student model by using a Hungary algorithm, transmitting dark knowledge of a decoder layer in the teacher model to the student model layer by layer, and simultaneously providing probability information before model non-normalization for the student model by using a teacher feature distillation method.
And inputting the input data into a teacher model, transferring the dark knowledge of a decoder layer in the teacher model to a student model layer by layer, and training the student model according to a knowledge distillation loss function to realize knowledge distillation. The implementation process is that a training teacher model backbone network is ResNet, a student model backbone network is ResNet and is formed by connecting five layers of convolution layer residual errors, wherein the difference is that only downsampling operation is carried out in the convolution layers of ResNet, the size of convolution kernels is 3x3, the output characteristic depth of a fifth layer of convolution layers is 512, and the convolution layers of ResNet101 simultaneously have upsampling and downsampling operations, and the output characteristic depth of the fifth layer of convolution layers is 2048. Both the teacher model and the student model output feature maps of the third through fifth layers and downsampled to 256 as hidden layer dimensions when the encoder portion is input. The encoder structures of the teacher model and the student model are the same, and are both a transducer encoder structure.
The query distillation anchor box of the decoder section is a combination of the image and the query. Because the query has the effect of detecting and aggregating certain example features, the distribution in different stem models may not be uniform, the same number of similar positive sample (foreground) and negative sample (background) anchor boxes are selected as the distillation anchor boxes. Meanwhile, the decoder also has a multi-stage structure, and on the basis, the hidden knowledge of the teacher model is better acquired by progressive multi-stage knowledge distillation. And calculating the cross attention part at each layer of decoder, guiding the student model by using an attention weight matrix of the teacher model, and carrying out weighted fusion on attention weights corresponding to the distillation points to enable the student model to obtain target features with richer semantic information.
The training backbone network is a ResNet teacher network model, and soft labels are generated by high temperature on the basis of the trained teacher network model. At this time, the loss function of the student model is no longer the loss function of the hard tag, but is a composite loss function consisting of the teacher soft tag and the student hard tag. The soft labels of the teachers in the loss function enable the class probability distribution of the student model to be as close as possible to the teacher model, so that the characteristic response of the student model and the characteristic response of the teacher model are as close as possible under the loss of square errors, and the hard labels of the students in the loss function are prediction results of the student model. The composite loss function is weighted by distillation loss (teacher soft tag part) and student model loss (student hard tag), and is
Through positive and negative sample anchor frame query sampling, the student model can focus on the region which is more focused by the teacher, and random sampling provides the vision of the teacher model to the features. During the distillation training, these selected distillation anchor boxes do not participate in the back propagation of the student model. The loss functions of the positive and negative sample anchor frames and the random sampling anchor frame are as follows
And 4, acquiring an image to be detected, inputting a fault detection model after processing, and calculating an abnormal score to obtain a steel floor breakage detection result.
The results of the inventive test are shown in fig. 4.
In the foregoing description, only specific embodiments of the invention have been described, and any features disclosed in this specification may be substituted for other equivalent or alternative features serving a similar purpose, and all the features disclosed, or all the steps in a method or process, except where mutually exclusive, may be combined in any manner.

Claims (6)

The input device comprises an input layer, a first group of convolution layers, a second group of convolution layers, a third group of convolution layers, a fourth group of convolution layers and a fifth group of convolution layers, wherein the input layer comprises image data with the input size of 513x513, the first group of convolution layers sequentially comprises two parts of 17 x7 convolution operation and 1 nonlinear activation function operation, the second group of convolution layers sequentially comprises 9 convolution layers, one nonlinear activation layer and an average pooling layer, the third group of convolution layers sequentially comprises 12 convolution layers, one nonlinear activation layer and the average pooling layer, the fourth group of convolution layers sequentially comprises 69 convolution layers, one nonlinear activation layer and the average pooling layer, the fifth group of convolution layers sequentially comprises 9 convolution layers, one nonlinear activation layer and the average pooling layer, and each convolution layer in the second group of convolution layers to the fifth group of convolution layers sequentially passes through 1x1 convolution operation, 1x 3 convolution layer and 1x1 convolution operation.
The encoder module comprises six encoders, wherein the encoders are Transformer encoders, the encoder module is formed by superposing 6 identical encoders, each encoder is provided with two sublayers, the first sublayer is a multi-head self-attention convergence layer, the second sublayer is a feed-forward neural network layer based on position, each sublayer adopts residual error linkage, the encoder adds the serialized characteristic diagram and the position code to obtain a subsequent query Q and a key value K, the subsequent query Q and the characteristic diagram are added and normalized after passing through a multi-head self-attention layer, the output of a single encoder is obtained through a feed-forward network and is used as the input of the next encoder, and the output of an encoder part is obtained after passing through 6 identical encoder structures;
CN202310399454.XA2023-04-142023-04-14 A damage detection method for railway freight car steel floor based on knowledge distillationActiveCN116703819B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202310399454.XACN116703819B (en)2023-04-142023-04-14 A damage detection method for railway freight car steel floor based on knowledge distillation

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202310399454.XACN116703819B (en)2023-04-142023-04-14 A damage detection method for railway freight car steel floor based on knowledge distillation

Publications (2)

Publication NumberPublication Date
CN116703819A CN116703819A (en)2023-09-05
CN116703819Btrue CN116703819B (en)2025-07-15

Family

ID=87828241

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202310399454.XAActiveCN116703819B (en)2023-04-142023-04-14 A damage detection method for railway freight car steel floor based on knowledge distillation

Country Status (1)

CountryLink
CN (1)CN116703819B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN117315392A (en)*2023-09-262023-12-29中国科学技术大学Knowledge distillation method universal for DETR detector
CN118072227B (en)*2024-04-172024-07-05西北工业大学太仓长三角研究院Rail transit train speed measuring method based on knowledge distillation
CN119251705B (en)*2024-12-062025-03-04厦门理工学院Remote sensing image road extraction method and device based on knowledge distillation and readable storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN111652227A (en)*2020-05-212020-09-11哈尔滨市科佳通用机电股份有限公司 Fault detection method for broken floor at the bottom of railway freight cars
CN111767711A (en)*2020-09-022020-10-13之江实验室 Compression method and platform of pre-trained language model based on knowledge distillation

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20220076136A1 (en)*2020-09-092022-03-10Peyman PASSBANMethod and system for training a neural network model using knowledge distillation
CN114936605A (en)*2022-06-092022-08-23五邑大学 A neural network training method, equipment and storage medium based on knowledge distillation
CN115861736B (en)*2022-12-142024-04-26广州科盛隆纸箱包装机械有限公司High-speed corrugated case printing defect detection method, system and storage medium based on knowledge distillation

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN111652227A (en)*2020-05-212020-09-11哈尔滨市科佳通用机电股份有限公司 Fault detection method for broken floor at the bottom of railway freight cars
CN111767711A (en)*2020-09-022020-10-13之江实验室 Compression method and platform of pre-trained language model based on knowledge distillation

Also Published As

Publication numberPublication date
CN116703819A (en)2023-09-05

Similar Documents

PublicationPublication DateTitle
CN116703819B (en) A damage detection method for railway freight car steel floor based on knowledge distillation
CN114359283B (en)Defect detection method based on Transformer and electronic equipment
CN114565594B (en)Image anomaly detection method based on soft mask contrast loss
CN117541535A (en) A transmission line inspection image detection method based on deep convolutional neural network
Ye et al.Fault detection of railway freight cars mechanical components based on multi-feature fusion convolutional neural network
CN112560948A (en)Eye fundus map classification method and imaging method under data deviation
CN112464846A (en)Automatic identification method for abnormal fault of freight train carriage at station
CN114220043A (en) Foreign object detection method based on generative adversarial network
CN112837281A (en) Pin defect identification method, device and equipment based on cascaded convolutional neural network
CN118691929A (en) UAV target detection method based on space-frequency feature fusion detection head
CN117113066A (en) A method for detecting defects in transmission line insulators based on computer vision
CN115063367A (en) Fault detection method for subway bottom bolts based on improved Cascade RCNN
CN115690730A (en) Method and system for detecting foreign objects in high-speed railway catenary based on single classification and abnormal generation
CN115953648B (en) A knowledge distillation method based on decoupled features and adversarial features
CN110750876A (en) A bearing data model training and use method
CN117726598B (en)Substation equipment defect detection method and system integrating infrared and visible light images
CN119444672A (en) A Transmission Line Insulator Defect Detection Method Based on Adaptive Hybrid Attention
CN117152646B (en)Unmanned electric power inspection AI light-weight large model method and system
CN118506051A (en)Aeroengine damage system, construction method and construction system
CN118038135A (en)Feature layer artificial abnormality-based unsupervised railway wagon part fault detection method
CN113705729B (en)Garbage classification model modeling method, garbage classification device and medium
CN117011219A (en)Method, apparatus, device, storage medium and program product for detecting quality of article
CN114120065A (en)High-cohesion low-coupling train fault detection method
Chen et al.Road crack segmentation based on dual branch attention mechanism and multi-scale features
Feng et al.Intelligent evaluation mechanism for cloud-edge-end based next generation ship simulator towards maritime pilot training

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp