Engine defect detection algorithm based on RNN voiceprint recognitionTechnical Field
The invention relates to the technical field of intelligent quality inspection, in particular to an engine defect detection algorithm based on RNN voiceprint recognition.
Background
At present, an algorithm based on a deep neural network is increasingly applied to the field of intelligent quality inspection. However, most of the existing algorithms detect the appearance of the part, the processing object is an image, and the network structure used is a convolutional neural network. However, for products that cannot visually recognize defects, such as engines, the image-based method described above cannot be used. In an actual production process, a worker can perform defect detection by recognizing abnormal sounds when an engine is operated, but in the field of computers, a technology for detecting defects of products by audio is still blank.
Most of the existing abnormal sound detection algorithms recognize the characteristics of abnormal sounds manually and summarize a set of algorithm flows to judge unknown sounds. The algorithm cannot automatically learn the characteristics of abnormal sounds, so the application range is small, and the algorithm cannot be repeatedly used for different types of sound detection. There is also a sound classification algorithm based on a convolutional neural network. Firstly, each audio frequency is regulated into a window with the same length of 20 frames; then extracting 12-dimensional Mel cepstrum coefficients and first-order and second-order differences thereof from each window, and summing up 36-dimensional characteristic vectors; then, taking the feature vectors of all windows of each video as 720 × 1 or 36 × 20 images, and respectively training the sounds of known classes by using one-dimensional and two-dimensional convolutional neural networks; and finally, predicting the sound of unknown class by using the trained network, thus obtaining the class of the sound. However, the convolutional neural network is not completely applicable to audio, and since the audio length is usually not fixed, preprocessing operations such as cropping need to be performed on the audio.
In summary, the prior art has the following disadvantages:
firstly, most of the sound features used by the existing algorithm are mel cepstrum coefficients, and the features process actual sound according to the perception features of human ears, so the algorithm is very suitable for processing the sound, but for non-speech sounds such as engines, the features cannot fully reflect the characteristics of the sound;
secondly, most abnormal sound detection algorithms usually find out the characteristics of abnormal sounds manually, and design a series of fixed steps for judgment, rather than automatically learning out the characteristics of abnormal sounds, so that the algorithms have no universality, and different algorithms need to be designed for different practical problems;
thirdly, some algorithms process the audio by using a convolutional neural network, so that although the characteristics of abnormal sounds can be learned and detected by self, the convolutional neural network requires that the input has the same size, and the audio generally cannot meet the requirement; if the audio is pre-processed by cropping or the like, part of the information may be lost.
Based on the method, the engine defect detection algorithm based on RNN voiceprint recognition is designed to solve the problem that some faults in the assembled engine cannot be recognized through visual observation.
Disclosure of Invention
The invention aims to provide an engine defect detection algorithm based on RNN voiceprint recognition, which is combined with an engine sound key segment extraction algorithm, a deep learning abnormal sound detection algorithm and a recurrent neural network model capable of processing variable length sequence input so as to solve the problems in the background art.
In order to achieve the purpose, the invention provides the following technical scheme: an engine defect detection algorithm based on RNN voiceprint recognition comprises the following specific steps:
s1: segmenting all recorded audios, and extracting key segments of the generator during oiling and high-speed running;
s2: marking the extracted key segments to construct a training set;
s3: building a deep learning network model and training;
s4: and detecting the sound of the unknown generator by using the trained network model.
Preferably, the specific steps of step S1 are as follows:
s1.1: generating a two-dimensional spectrogram from the recorded original audio;
s1.2: finding out a high-frequency part from a two-dimensional spectrogram of sound by observing, and if the content of the high-frequency sound at a certain time is lower than a threshold value, segmenting and removing the high-frequency part; the remaining segments are segments containing high frequency sounds, i.e., key segments of sounds when the engine is running at high speed.
Preferably, the specific steps of step S1.1 are: by acquiring one-dimensional amplitude information in the recorded original audio, the window function with length N of 2048 and sliding distance of 512 is used as
Dividing the frame into a plurality of frames; and performs a discrete fourier transform on each 2048-length frame:
calculating characteristic vectors with the length of 2048 dimensions for each frame, and stacking the characteristic vectors of all the frames to obtain a two-dimensional spectrogram with the size of 2048 multiplied by n, wherein n is the number of the frames.
Preferably, the specific steps of step S1.2 are: taking the first 800 dimensions of the 2048-dimensional frequency feature vector as a low-frequency signal and the rest as a high-frequency signal, the energy of the high-frequency signal of each frame is:
and combining the continuous frames with high frequency energy higher than the threshold value to obtain a plurality of audio segments, removing segments with shorter duration, and using the rest segments for the next operation.
Preferably, in step S2, the labeling manner is performed one by one.
Preferably, in step S3, the deep learning network model adopts a long-short term memory network LSTM, and the specific implementation steps are as follows:
s3.1: taking a spectrogram with indefinite length and height of 2048 as input, and searching and learning the relation of the sequence in time;
s3.2: extracting the characteristics of an input sequence by using a two-layer LSTM network, and classifying by using two full connection layers DENSE;
s3.3: the classification of the audio is output as a vector with a length of M +1, where M is the type of the anomaly, and the value in the vector is between 0 and 1, which respectively indicates the probability that the audio is normal or has some anomaly, i.e. whether the anomaly exists and the type of the anomaly.
Preferably, in step S4, if all the segments are identified as normal after the detection, the engine is considered to have no defect; if any of the segments are identified as abnormal, the engine is considered to be defective.
Compared with the prior art, the invention has the beneficial effects that:
1. an engine sound key segment extraction algorithm is adopted: by utilizing the characteristics of the engine sound, the key segments in high-speed operation are segmented, so that the data volume to be processed is greatly reduced, the interference of a large number of useless segments is avoided, and the accuracy of the whole method is improved;
2. adopting an abnormal sound detection algorithm based on deep learning: the deep learning algorithm is adopted, the characteristics of normal and various abnormal sounds can be automatically learned through a large number of samples of known labels without manually analyzing the characteristics of the sounds, and meanwhile, the method can be conveniently used for processing other similar problems without largely modifying;
3. adopting a recurrent neural network model capable of processing variable-length sequence input: a cyclic neural network which can accept variable-length sequences as input is used as an input layer, and a deep neural network model is constructed on the basis of the cyclic neural network, so that the problem that all the input is required to have the same scale and the length of the audio is not fixed in the traditional method for processing the audio by using a convolutional neural network is successfully solved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic block diagram of the present invention;
FIG. 2 is a schematic diagram of a key segment extraction algorithm of the present invention;
FIG. 3 is a diagram of a recurrent neural network of the present invention;
FIG. 4 is a schematic diagram of a detection algorithm according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1-3, the present invention provides a technical solution: an engine defect detection algorithm based on RNN voiceprint recognition comprises the following specific steps:
s1: segmenting all recorded audios, and extracting key segments of the generator during oiling and high-speed running;
the method comprises the following specific steps:
s1.1: generating a two-dimensional spectrogram from the recorded original audio, acquiring one-dimensional amplitude information in the recorded original audio, and taking a window function with a length N of 2048 and a sliding distance of 512 as
Divide it into severalA frame; and performs a discrete fourier transform on each 2048-length frame:
calculating characteristic vectors with length of 2048 dimensions for each frame, and stacking the characteristic vectors of all the frames to obtain a two-dimensional spectrogram with the size of 2048 multiplied by n, wherein n is the number of the frames;
s1.2: finding out a high-frequency part from a two-dimensional spectrogram of sound by observing, and if the content of the high-frequency sound at a certain time is lower than a threshold value, segmenting and removing the high-frequency part; the remaining segment is a segment containing high-frequency sounds, i.e., a key segment of sounds during high-speed operation of the engine, and in the spectrogram of fig. 2, the horizontal axis represents time, the vertical axis represents frequency, the value of the corresponding point represents the intensity of the frequency at that time,
the method comprises the following specific steps: taking the first 800 dimensions of the 2048-dimensional frequency feature vector as a low-frequency signal and the rest as a high-frequency signal, the energy of the high-frequency signal of each frame is:
combining successive frames with high frequency energy above the threshold of 0.05, resulting in several audio segments, removing segments of less than 20 frames in duration, the remaining segments being used for the next operation.
S2: and marking the extracted key segments one by one in a marking mode, and ensuring enough normal and abnormal sound segments in a training set. If an audio is normal, all its segments can be marked as normal; if one audio is abnormal, all the segments are not abnormal, so that the segments need to be labeled one by one to construct a training set;
s3: a deep learning network model is built and trained, a long and short term memory network LSTM is adopted for the deep learning network model, and the method is one of a recurrent neural network RNN and specifically comprises the following steps:
s3.1: receiving a spectrogram which is obtained before and has an indefinite length and a height of 2048 as input, and searching and learning the relation of the sequence on time;
s3.2: extracting the characteristics of an input sequence by using a two-layer LSTM network, and classifying by using two full connection layers DENSE;
s3.3: outputting the classification of the audio, wherein the classification of the audio is a vector with the length of M +1, M is the type of the abnormity, and the value in the vector is between 0 and 1, which respectively represents the probability that the audio is normal or has some abnormity, namely whether the abnormity exists and the type of the abnormity exists;
s4: detecting the sound of an unknown generator by using a trained network model, similarly extracting key segments from a section of audio to be detected, then performing anomaly detection on the segments by using the trained model, and determining that the engine has no defects if all the segments are identified as normal after detection; if any of the segments are identified as abnormal, the engine is considered to be defective.
The specific working principle is as follows:
extracting all audio frequency segments of the engine in high-speed operation by a key segment extraction algorithm; then, in the training process, the training sample is sent to a neural network, and the estimation result and the real label of the neural network are jointly used for optimizing the network; after training is completed, the network can be used for predicting whether an audio frequency is abnormal, and the prediction label can be obtained only by inputting the segments into the trained network.
1. Advantage of engine sound key segment extraction algorithm
The unusual sound is usually noticeable only when the engine is running at high speed, so the unusual sound detection is generally performed only for these segments, while the other segments are ignored. The algorithm utilizes the characteristics of the engine sound to segment the key segments during high-speed operation, thereby greatly reducing the data volume to be processed, avoiding the interference of a large number of useless segments and improving the accuracy of the whole method.
2. Abnormal sound detection algorithm based on deep learning
The method adopts a deep learning algorithm, and can automatically learn the characteristics of normal and various abnormal sounds through a large number of samples of known labels without manually analyzing the characteristics of the sounds. At the same time, the method can be conveniently used for processing other similar problems without a great deal of modification.
3. Cyclic neural network model capable of processing variable-length sequence input
Although audio can also be processed using convolutional neural networks, they require all inputs to be of the same scale, but the length of the audio is usually not fixed, and thus it is inconvenient to use convolutional neural networks. In the method, a cyclic neural network which can accept variable-length sequences as input is used as an input layer, and a deep neural network model is constructed on the basis of the input layer, so that the problems are successfully avoided.
Example (b):
as in the section of engine audio in fig. 4, 11 key segments can be extracted from the section of engine audio by the key segment extracting step; the segments are predicted through a trained neural network, wherein one segment is judged to be abnormal in howling and the other segments are judged to be normal; and therefore it is finally judged that the engine has the time-howling abnormality.
In the description herein, references to the description of "one embodiment," "an example," "a specific example" or the like are intended to mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The preferred embodiments of the invention disclosed above are intended to be illustrative only. The preferred embodiments are not intended to be exhaustive or to limit the invention to the precise embodiments disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best utilize the invention. The invention is limited only by the claims and their full scope and equivalents.