Background
The feature extraction and data mining of echo measurement data actually belong to the technical field of signal processing, but the traditional signal processing method relies on a physical model of a signal to be processed, and a network needs to be built according to the characteristics of the signal when deep learning is used for signal processing. At present, an echo feature extraction method based on deep learning has almost no published literature with high feasibility.
The problem with the present invention, which consists of separating the signal reflected by a single signal source from the mixed echoes reflected by multiple signal sources, is called blind source separation (BlindSourceSeparation) because of the lack of useful information about the system characteristics of the mixed source signal and the need to recover the single desired signal by some methods and transformations [1 ].
The adaptive spatial sampling algorithm, originally proposed in 1986 by Heraul and Jutten et al, achieves blind separation of two signals using a simple neural network [2 ]. In 1994, Comon proposed ICA methods based on minimal mutual information, and the system elaborated the concept of independent components and also defined the basic assumptions of the blind source separation problem [3 ]. In 1995, an ICA algorithm based on the information maximization criterion published by Bell and Sejnowski successfully studied adaptive blind separation and blind deconvolution by making full use of the information amount transmitted by a nonlinear network by using a method of maximizing the entropy of nonlinear nodes [4 ]. The blind source separation theory is developed and widely applied to the fields of image processing, voice processing, signal processing and the like.
In the era of wide application of deep learning, many blind source separation problems can be solved by using a deep learning method. For example, in speech processing, the cocktail party problem is one of the most studied areas [5], which attempts to solve the problem of separating each person's voice in a waveform of a mixture of voices of a plurality of persons. Similar tasks are performed in music processing, where the different instruments that make up a song are separated from the vocal part, all belonging to the multi-channel blind deconvolution problem [6 ]. Many deep learning approaches are successfully utilized and address such problems. Jansson et al successfully isolated human voices and different instrument parts in music by applying the structure network to a music spectrogram with reference to a U-Net model structure [7] in the field of image segmentation [8 ]. Daniel et al propose a Wave-U-Net structure based on this to improve the phase information lost when Fourier transform is performed on music [9] directly in the frequency domain. Li et al proposed TF-Attention-Net [10] to apply the Attention mechanism to this type of problem.
[1]Chabriel G,Kleinsteuber M,Moreau E,et al.Joint matricesdecompositions and blind source separation:A survey of methods,identification,and applications[J].IEEE Signal Processing Magazine,2014,31(3):34-43.
[2]Jutten C,Herault J.Blind separation of sources,part I:An adaptivealgorithm based on neuromimetic architecture[J].Signal processing,1991,24(1):1-10.
[3]Comon P.Separation of stochastic processes[C]//Workshop on Higher-Order Spectral Analysis.IEEE,1989:174-179.
[4]Bell A J,Sejnowski T J.An information-maximization approach toblind separation and blind deconvolution[J].Neural computation,1995,7(6):1129-1159.
[5]Haykin S,Chen Z.The cocktail party problem[J].Neural computation,2005,17(9):1875-1902.
[6]Cardoso J F.Blind signal separation:statistical principles[J].Proceedings of the IEEE,1998,86(10):2009-2025.
[7]Ronneberger O,Fischer P,Brox T.U-net:Convolutional networks forbiomedical image segmentation[C]//International Conference on Medical imagecomputing and computer-assisted intervention.Springer,Cham,2015:234-241.
[8]Jansson A,Humphrey E,Montecchio N,et al.Singing voice separationwith deep U-Net convolutional networks[J].2017.
[9]Stoller D,Ewert S,Dixon S.Wave-u-net:A multi-scale neural networkfor end-to-end audio source separation[J].arXiv preprint arXiv:1806.03185,2018.
[10]Li T,Chen J,Hou H,et al.TF-Attention-Net:An End To End NeuralNetwork For Singing Voice Separation[J].arXiv preprint arXiv:1909.05746,2019.。
Disclosure of Invention
The invention aims to provide a method for accurately and quickly extracting characteristics and mining data of a signal under the condition of unknown physical characteristics of the signal, so that an echo signal generated by a plurality of targets can be separated into a plurality of echo signals generated by a single target.
The invention provides a method for extracting characteristics and mining data of signals, which is based on echo measurement data and adopts a deep learning information processing technology, and comprises the following specific steps:
(1) additionally generating echo data for each individual target as a tag for separating echoes, on the basis of echoes originally generated jointly by a plurality of targets;
(2) constructing a basic residual block, wherein the structure of the basic residual block comprises: three one-dimensional convolutions, one ReLU activation function, two batch regularization functions, and one maximum pooling layer; as shown in fig. 1;
(3) respectively linking 12 basic residual blocks into an encoder and a decoder, using one-dimensional convolution to link, and finally using cross-layer link between the same abstract levels of the encoder and the decoder to form a complete deep learning network;
(4) dividing a training set, a verification set and a test set on the complete data set according to the ratio of 8:1:1 and different relative postures under the same target position;
(5) for the deep learning network, firstly, training is carried out on a training set for multiple times, then, the learning rate is reduced, the training is carried out on the training set, the performance is verified on a verification set, the training is stopped after a certain condition is met, and the best network model on the verification set is output.
In step (1) of the present invention, the process of additionally generating a single target echo comprises: for a given target position and target pose, say k targets, to generate an echo from the first target alone, the position and relative pose of the first target is saved, while the other targets are removed to generate a tag corresponding to the target. Such a process is repeated k times to obtain all the tags.
In step (2) of the present invention, because the amplitude distribution of the echo signal to be processed is not uniform compared to the common sound signal, the network needs a deeper layer number and more neurons to obtain a stronger nonlinear expression capability. The deeper layers easily cause the gradient to explode or disappear, so that each layer needs to use a residual mode. Let x denote our input in the form of N x C L, which respectively denotes the number of samples (also called batch number of samples, which N samples together make up a batch), the number of channels and the length; f denotes a layer in the network, and the use of the residual error changes the output of this layer from f (x) to x + f (x). However, if this is done, the number of output channels cannot be changed, because if more neurons are used, the number of channels f (x) is greater than x, and thus direct addition cannot be achieved.
There are therefore two paths within the basic residual block, one path, called the mapping path, intended to map the input signal directly into a higher dimensional space with a higher number of channels, and it is desirable to obtain a higher dimensional representation that is easier to handle. While the other path, called the residual path, is intended to learn the residual between the high-dimensional representation obtained in the previous path and the better one.
In the mapping path, due to strong timing correlation of echo signals, a one-dimensional convolution is used to obtain timing information and change the number of input channels, and the one-dimensional convolution is represented by conv _ f. In order to realize the residual addition, the step size is set according to the kernel size of the previous convolution (the kernel size used by the invention is 3, and the step size is 1), so that the input length is ensured to be unchanged, and the residual addition can be directly added with the output of the residual path. Let conv _ f (x) denote the mapping we get.
In the residual pathway, two identical structures are reused for the expression of capacity, and are described in detail below:
(a) similarly, to obtain timing information, the input is first subjected to a one-dimensional convolution, denoted by conv _ 1.
(b) Again using a linear rectification function (rectifiedlireunit) as the activation function, a neuron-like mechanism forces the part of the result that is smaller than 0 to be 0, which part does not involve a gradient back-pass, denoted relu _ 1.
(c) For training purposes, a batch regularization (Batchnormalization) is used, denoted by bn _ 1. Because the input and output are relative to a batch, batch regularization maps all samples in the batch to a mean value of 0 and a variance of 1, which speeds up network training.
The input passes through the above paths (a), (b) and (c) twice, the one-dimensional convolution in the second time, the activation function and the batch regularization function are respectively represented by conv _2, relu _2 and bn _2, and the obtained residual is represented by res (x).
Finally, a maximum pooling level is used to obtain the final result, i.e. a sliding window is set on the input and the maximum value in the window is output, and the input of the pooling level is conv _ f (x) + res (x). The role of the pooling layer is to reduce the number of parameters while preserving significant features, denoted by MaxPool.
The whole basic residual block is schematically shown in the attached figure 1, and the algorithm pseudo code of the basic residual block is shown in the attached appendix 1.
In step (3), the method for forming the complete network is that firstly L (10 can be particularly taken) basic residual blocks are taken to form an encoder in an end-to-end mode, and the first passing basic residual block is numbered as 1 (using f) according to the sequence of input passing1Indicating, otherwise, the same) that the last passed basic residual block is numbered L.
This is then subjected to a one-dimensional convolution, denoted conv _ med, intended to process the encoded features to the following decoder. Then, taking L basic residual blocks, forming a decoder according to the end-to-end mode, the number sequence is opposite to that of the encoder, and the first passing basic residual block is numbered as L (using h)LRepresenting, otherwise) the last one is 1. Note that the input here is the feature subjected to the aforementioned one-dimensional convolution processing.
Finally, cross-linking of the encoder and decoder is established. For a basic residual block numbered i by the decoder, if i < L, i.e., not the first passing basic residual block, the input is the encoder residual block output numbered i, and the along-the-channel of the decoder residual block numbered i +1 (i.e., the last passing decoder residual block) is pieced together; if i is L, it is pieced together with the output from conv _ med as input.
The schematic diagram of the whole deep learning network is shown in the attached figure 2, and the pseudo code of the algorithm of the deep learning network is shown in the attached appendix 2.
In step (4) of the invention, the data in the data set is in the form that each file corresponds to a multi-target situation, for example, there are three targets, then there is a collected echo for the relative angles between different targets and the radar, and there are also multiple echoes generated by a single target in step (1) as tags. To take care of a variety of situations, each file is internally divided into a training set, a validation set, and a test set in a ratio of 8 to 1 when the validation set is divided.
In step (5), the network is trained by randomly dividing the data in the training set into a plurality of batches, each batch consisting of 64 echo signals. In general, the final network model is the network model that performs best on the validation set, but there is no explicit optimal way as to when to stop training. The method of the invention divides the training network into two stages, the first stage adopts relatively high learning rate, which is to estimate reasonable training times from the loss transformation curve, and in addition, the higher learning rate can reduce the probability of reaching the saddle point. Specifically, we use an Adam optimizer with a learning rate of 0.001, beta 1-0.9, and beta 2-0.999. Run only iterations (say 1000 times) on the training set, make a gradient pass-back with MSE (MeanSquareError) as the loss function, the expression is as follows:
MSE(Zi,Yi)=||Zi-Yi||2
here, ZiAnd YiRepresents the ith input and output;
the second phase uses a lower learning rate, which is to fine tune the model. The difference from the first stage is that the learning rate is reduced to 0.0001, and furthermore, the iteration is run on the training set (e.g. 1000 times), the same loss function is still used, then the performance is evaluated on the verification set every 5 times, if the performance does not improve within 50 iterations, the model is considered to have converged, the training is stopped and the historical best model is output.
In the invention, two indexes for actually evaluating the network performance are provided, one is the relative error of the peak value size, and the other is the relative error of the peak value relative position, and if the two relative errors are within fifteen percent, the qualification is carried out.
An accurate definition of these two performance indicators is given below, first defining the peak of the signal X (here and after X is assumed to be a tensor of only one dimension in length) as:
φ(X)=max(X)
the peak position of the signal X is then defined as:
P(X)=argmax(X)
then the relative error in peak magnitude is:
the relative error in peak position is:
note that the denominator here means the length of occurrence of a defined peak, i.e. the number of points in the whole wave band which are greater than a certain proportion of the peak, where the certain proportion is denoted by k, and the specific implementation is 0.01.
Since the evaluation index is only related to a certain maximum point of the peak, it is required that the input parameters of these evaluation indexes have only one distinct peak, which is otherwise meaningless. This is why the echo of a single object is extracted, since such an echo necessarily has only one distinct peak.
This also results in that the gradient feedback directly by the evaluation index tends to distort the separated signal at other points, so that the MSE (MeanSquareeror) is used as the loss function for the gradient feedback in the first stage. In a specific operation, the gradient can be smoothed by taking the mean value according to the number of batches of samples or the signal length. The reason for using such a loss function to perform gradient feedback is mainly to make all points of the output signal as close to the tag as possible, so that the output has more reliability.
The pseudo code of the algorithm for training the deep learning network is shown in appendix 3.
The method can well separate the echoes generated by different targets from the echoes commonly generated by a plurality of targets, so that the model can well learn the characteristics of the echoes generated by the targets, thereby extracting a plurality of relatively simple echoes generated by a single target.
Detailed Description
Having introduced the algorithmic principles and the specific steps of the present invention, the effect of the signal separation test on simulation data is shown below.
The data set used for the experiment was derived from 802 generated frequency band-specific echo data for three targets. There were 428 target positions, 24 relative poses, and a total of 10272 samples.
In the test, the peak amplitude relative error (e) is adoptedφ) Peak position relative error (e)P) Two indexes are used for measuring the experimental effect, wherein the first two relative errors are generally within 15% and have practical application value.
Experimental example 1: 20 random samples
I denotes the sample number, H denotes the peak amplitude relative error, P denotes the peak position relative error, and the following numbers indicate relative to the several targets. For example, H1 represents the relative error in the amplitude of the predicted first target echo signal and the actual echo signal. The numbers marked in bold indicate that the indicator is not acceptable.
Table 1: performance of the algorithm over 20 random samples
From the results of the random extraction it can be seen that:
1. the mean peak error generated by the model is 8.455% at the maximum.
2. The model generated average peak position error is only 5.761% at the maximum.
3. The model behavior can in some cases present an over-fitting problem, since the robustness of the model is reduced when the data is highly noisy. But from experimental results, in most cases, has good predictive performance. We will solve this problem in the next step of the study by augmenting the data and improving the robustness of the model.
Experimental example 2: all samples
Table 2: performance of the algorithm on all samples
| H1 | P1 | H2 | P2 | H3 | P3 |
| Mean value (%) | 8.380 | 8.396 | 6.739 | 7.064 | 7.684 | 6.775 |
| Standard deviation of | 0.104 | 0.163 | 0.099 | 0.210 | 0.111 | 0.146 |
From the absolute value of the error, the average peak difference is 0.0004, and the average position difference is 1.436, where the unit of the peak difference is the signal strength and the position difference is one step (0.01 m). In addition, for random sample input, the output which is qualified for the relative error of the peak is obtained with 87.208% probability, the output which is qualified for the relative position error of the peak is obtained with 84.820% probability, and the evaluation index within 15% is better reached.
Appendix 1: algorithm pseudo code for basic residual block
Input of X
f=conv_f(X)
t=conv_1(X)
t=relu_1(t)
t=bn_1(t)
t=conv_2(t)
t=relu_2(t)
res=bn_2(t)
t=f+res
output=MaxPool(t)
returnoutput
Appendix 2: algorithmic pseudo code for deep learning networks
Appendix 3: algorithm pseudo code for training deep learning networks