CN111126199A

Movatterモバイル変換

Info

Publication number: CN111126199A
Application number: CN201911268281.8A
Authority: CN
Inventors: 朱殷; 张军平
Original assignee: Fudan University
Current assignee: Fudan University
Priority date: 2019-12-11
Filing date: 2019-12-11
Publication date: 2020-05-08
Anticipated expiration: 2039-12-11
Also published as: CN111126199B

Abstract

Translated fromChinese

本发明属于机器学习与信号特征提取技术领域，具体为基于回波测量数据的信号特征提取与数据挖掘方法。本发明方法包括：根据已有仿真数据额外生成仿真数据；构建可同时用于编码器解码器的基本残差块；编码器和解码器同一抽象等级跨链接；将由多个目标产生的原始回波输入进编码器得到易处理的特征图，再用解码器来得到多个由不同目标产生的回波，并根据波峰的幅值和位置的相对误差来评估性能。本发明可以很好地从多个目标共同产生的回波中分离出不同目标产生的回波，使得模型能很好的学习到目标产生回波的特征，从而提取出多个相对简单、由单个目标产生的回波。

The invention belongs to the technical field of machine learning and signal feature extraction, in particular to a signal feature extraction and data mining method based on echo measurement data. The method of the invention includes: additionally generating simulation data according to the existing simulation data; constructing a basic residual block which can be used for the encoder and decoder at the same time; the encoder and the decoder are linked across the same abstraction level; Input into the encoder to get a tractable feature map, and then use the decoder to obtain multiple echoes generated by different targets, and evaluate the performance according to the relative error of the amplitude and position of the peaks. The present invention can well separate the echoes generated by different targets from the echoes jointly generated by multiple targets, so that the model can well learn the characteristics of the echoes generated by the targets, so as to extract a plurality of relatively simple and single-target echoes. echoes generated by the target.

Description

Signal feature extraction and data mining method based on echo measurement data

Technical Field

The invention belongs to the technical field of machine learning and signal feature extraction, and particularly relates to a signal feature extraction and data mining method based on echo measurement data.

Background

The feature extraction and data mining of echo measurement data actually belong to the technical field of signal processing, but the traditional signal processing method relies on a physical model of a signal to be processed, and a network needs to be built according to the characteristics of the signal when deep learning is used for signal processing. At present, an echo feature extraction method based on deep learning has almost no published literature with high feasibility.

The problem with the present invention, which consists of separating the signal reflected by a single signal source from the mixed echoes reflected by multiple signal sources, is called blind source separation (BlindSourceSeparation) because of the lack of useful information about the system characteristics of the mixed source signal and the need to recover the single desired signal by some methods and transformations [1 ].

The adaptive spatial sampling algorithm, originally proposed in 1986 by Heraul and Jutten et al, achieves blind separation of two signals using a simple neural network [2 ]. In 1994, Comon proposed ICA methods based on minimal mutual information, and the system elaborated the concept of independent components and also defined the basic assumptions of the blind source separation problem [3 ]. In 1995, an ICA algorithm based on the information maximization criterion published by Bell and Sejnowski successfully studied adaptive blind separation and blind deconvolution by making full use of the information amount transmitted by a nonlinear network by using a method of maximizing the entropy of nonlinear nodes [4 ]. The blind source separation theory is developed and widely applied to the fields of image processing, voice processing, signal processing and the like.

In the era of wide application of deep learning, many blind source separation problems can be solved by using a deep learning method. For example, in speech processing, the cocktail party problem is one of the most studied areas [5], which attempts to solve the problem of separating each person's voice in a waveform of a mixture of voices of a plurality of persons. Similar tasks are performed in music processing, where the different instruments that make up a song are separated from the vocal part, all belonging to the multi-channel blind deconvolution problem [6 ]. Many deep learning approaches are successfully utilized and address such problems. Jansson et al successfully isolated human voices and different instrument parts in music by applying the structure network to a music spectrogram with reference to a U-Net model structure [7] in the field of image segmentation [8 ]. Daniel et al propose a Wave-U-Net structure based on this to improve the phase information lost when Fourier transform is performed on music [9] directly in the frequency domain. Li et al proposed TF-Attention-Net [10] to apply the Attention mechanism to this type of problem.

[1]Chabriel G,Kleinsteuber M,Moreau E,et al.Joint matricesdecompositions and blind source separation:A survey of methods,identification,and applications[J].IEEE Signal Processing Magazine,2014,31(3):34-43.

[2]Jutten C,Herault J.Blind separation of sources,part I:An adaptivealgorithm based on neuromimetic architecture[J].Signal processing,1991,24(1):1-10.

[3]Comon P.Separation of stochastic processes[C]//Workshop on Higher-Order Spectral Analysis.IEEE,1989:174-179.

[4]Bell A J,Sejnowski T J.An information-maximization approach toblind separation and blind deconvolution[J].Neural computation,1995,7(6):1129-1159.

[5]Haykin S,Chen Z.The cocktail party problem[J].Neural computation,2005,17(9):1875-1902.

[6]Cardoso J F.Blind signal separation:statistical principles[J].Proceedings of the IEEE,1998,86(10):2009-2025.

[7]Ronneberger O,Fischer P,Brox T.U-net:Convolutional networks forbiomedical image segmentation[C]//International Conference on Medical imagecomputing and computer-assisted intervention.Springer,Cham,2015:234-241.

[8]Jansson A,Humphrey E,Montecchio N,et al.Singing voice separationwith deep U-Net convolutional networks[J].2017.

[9]Stoller D,Ewert S,Dixon S.Wave-u-net:A multi-scale neural networkfor end-to-end audio source separation[J].arXiv preprint arXiv:1806.03185,2018.

[10]Li T,Chen J,Hou H,et al.TF-Attention-Net:An End To End NeuralNetwork For Singing Voice Separation[J].arXiv preprint arXiv:1909.05746,2019.。

Disclosure of Invention

The invention aims to provide a method for accurately and quickly extracting characteristics and mining data of a signal under the condition of unknown physical characteristics of the signal, so that an echo signal generated by a plurality of targets can be separated into a plurality of echo signals generated by a single target.

The invention provides a method for extracting characteristics and mining data of signals, which is based on echo measurement data and adopts a deep learning information processing technology, and comprises the following specific steps:

(1) additionally generating echo data for each individual target as a tag for separating echoes, on the basis of echoes originally generated jointly by a plurality of targets;

(2) constructing a basic residual block, wherein the structure of the basic residual block comprises: three one-dimensional convolutions, one ReLU activation function, two batch regularization functions, and one maximum pooling layer; as shown in fig. 1;

(3) respectively linking 12 basic residual blocks into an encoder and a decoder, using one-dimensional convolution to link, and finally using cross-layer link between the same abstract levels of the encoder and the decoder to form a complete deep learning network;

(4) dividing a training set, a verification set and a test set on the complete data set according to the ratio of 8:1:1 and different relative postures under the same target position;

(5) for the deep learning network, firstly, training is carried out on a training set for multiple times, then, the learning rate is reduced, the training is carried out on the training set, the performance is verified on a verification set, the training is stopped after a certain condition is met, and the best network model on the verification set is output.

In step (1) of the present invention, the process of additionally generating a single target echo comprises: for a given target position and target pose, say k targets, to generate an echo from the first target alone, the position and relative pose of the first target is saved, while the other targets are removed to generate a tag corresponding to the target. Such a process is repeated k times to obtain all the tags.

In step (2) of the present invention, because the amplitude distribution of the echo signal to be processed is not uniform compared to the common sound signal, the network needs a deeper layer number and more neurons to obtain a stronger nonlinear expression capability. The deeper layers easily cause the gradient to explode or disappear, so that each layer needs to use a residual mode. Let x denote our input in the form of N x C L, which respectively denotes the number of samples (also called batch number of samples, which N samples together make up a batch), the number of channels and the length; f denotes a layer in the network, and the use of the residual error changes the output of this layer from f (x) to x + f (x). However, if this is done, the number of output channels cannot be changed, because if more neurons are used, the number of channels f (x) is greater than x, and thus direct addition cannot be achieved.

There are therefore two paths within the basic residual block, one path, called the mapping path, intended to map the input signal directly into a higher dimensional space with a higher number of channels, and it is desirable to obtain a higher dimensional representation that is easier to handle. While the other path, called the residual path, is intended to learn the residual between the high-dimensional representation obtained in the previous path and the better one.

In the mapping path, due to strong timing correlation of echo signals, a one-dimensional convolution is used to obtain timing information and change the number of input channels, and the one-dimensional convolution is represented by conv _ f. In order to realize the residual addition, the step size is set according to the kernel size of the previous convolution (the kernel size used by the invention is 3, and the step size is 1), so that the input length is ensured to be unchanged, and the residual addition can be directly added with the output of the residual path. Let conv _ f (x) denote the mapping we get.

In the residual pathway, two identical structures are reused for the expression of capacity, and are described in detail below:

(a) similarly, to obtain timing information, the input is first subjected to a one-dimensional convolution, denoted by conv _ 1.

(b) Again using a linear rectification function (rectifiedlireunit) as the activation function, a neuron-like mechanism forces the part of the result that is smaller than 0 to be 0, which part does not involve a gradient back-pass, denoted relu _ 1.

(c) For training purposes, a batch regularization (Batchnormalization) is used, denoted by bn _ 1. Because the input and output are relative to a batch, batch regularization maps all samples in the batch to a mean value of 0 and a variance of 1, which speeds up network training.

The input passes through the above paths (a), (b) and (c) twice, the one-dimensional convolution in the second time, the activation function and the batch regularization function are respectively represented by conv _2, relu _2 and bn _2, and the obtained residual is represented by res (x).

Finally, a maximum pooling level is used to obtain the final result, i.e. a sliding window is set on the input and the maximum value in the window is output, and the input of the pooling level is conv _ f (x) + res (x). The role of the pooling layer is to reduce the number of parameters while preserving significant features, denoted by MaxPool.

The whole basic residual block is schematically shown in the attached figure 1, and the algorithm pseudo code of the basic residual block is shown in the attached appendix 1.

In step (3), the method for forming the complete network is that firstly L (10 can be particularly taken) basic residual blocks are taken to form an encoder in an end-to-end mode, and the first passing basic residual block is numbered as 1 (using f) according to the sequence of input passing₁Indicating, otherwise, the same) that the last passed basic residual block is numbered L.

This is then subjected to a one-dimensional convolution, denoted conv _ med, intended to process the encoded features to the following decoder. Then, taking L basic residual blocks, forming a decoder according to the end-to-end mode, the number sequence is opposite to that of the encoder, and the first passing basic residual block is numbered as L (using h)_LRepresenting, otherwise) the last one is 1. Note that the input here is the feature subjected to the aforementioned one-dimensional convolution processing.

Finally, cross-linking of the encoder and decoder is established. For a basic residual block numbered i by the decoder, if i < L, i.e., not the first passing basic residual block, the input is the encoder residual block output numbered i, and the along-the-channel of the decoder residual block numbered i +1 (i.e., the last passing decoder residual block) is pieced together; if i is L, it is pieced together with the output from conv _ med as input.

The schematic diagram of the whole deep learning network is shown in the attached figure 2, and the pseudo code of the algorithm of the deep learning network is shown in the attached appendix 2.

In step (4) of the invention, the data in the data set is in the form that each file corresponds to a multi-target situation, for example, there are three targets, then there is a collected echo for the relative angles between different targets and the radar, and there are also multiple echoes generated by a single target in step (1) as tags. To take care of a variety of situations, each file is internally divided into a training set, a validation set, and a test set in a ratio of 8 to 1 when the validation set is divided.

In step (5), the network is trained by randomly dividing the data in the training set into a plurality of batches, each batch consisting of 64 echo signals. In general, the final network model is the network model that performs best on the validation set, but there is no explicit optimal way as to when to stop training. The method of the invention divides the training network into two stages, the first stage adopts relatively high learning rate, which is to estimate reasonable training times from the loss transformation curve, and in addition, the higher learning rate can reduce the probability of reaching the saddle point. Specifically, we use an Adam optimizer with a learning rate of 0.001, beta 1-0.9, and beta 2-0.999. Run only iterations (say 1000 times) on the training set, make a gradient pass-back with MSE (MeanSquareError) as the loss function, the expression is as follows:

MSE(Z_i,Y_i)＝||Z_i-Y_i||₂

here, Z_iAnd Y_iRepresents the ith input and output;

the second phase uses a lower learning rate, which is to fine tune the model. The difference from the first stage is that the learning rate is reduced to 0.0001, and furthermore, the iteration is run on the training set (e.g. 1000 times), the same loss function is still used, then the performance is evaluated on the verification set every 5 times, if the performance does not improve within 50 iterations, the model is considered to have converged, the training is stopped and the historical best model is output.

In the invention, two indexes for actually evaluating the network performance are provided, one is the relative error of the peak value size, and the other is the relative error of the peak value relative position, and if the two relative errors are within fifteen percent, the qualification is carried out.

An accurate definition of these two performance indicators is given below, first defining the peak of the signal X (here and after X is assumed to be a tensor of only one dimension in length) as:

φ(X)＝max(X)

the peak position of the signal X is then defined as:

P(X)＝argmax(X)

then the relative error in peak magnitude is:

the relative error in peak position is:

note that the denominator here means the length of occurrence of a defined peak, i.e. the number of points in the whole wave band which are greater than a certain proportion of the peak, where the certain proportion is denoted by k, and the specific implementation is 0.01.

Since the evaluation index is only related to a certain maximum point of the peak, it is required that the input parameters of these evaluation indexes have only one distinct peak, which is otherwise meaningless. This is why the echo of a single object is extracted, since such an echo necessarily has only one distinct peak.

This also results in that the gradient feedback directly by the evaluation index tends to distort the separated signal at other points, so that the MSE (MeanSquareeror) is used as the loss function for the gradient feedback in the first stage. In a specific operation, the gradient can be smoothed by taking the mean value according to the number of batches of samples or the signal length. The reason for using such a loss function to perform gradient feedback is mainly to make all points of the output signal as close to the tag as possible, so that the output has more reliability.

The pseudo code of the algorithm for training the deep learning network is shown in appendix 3.

The method can well separate the echoes generated by different targets from the echoes commonly generated by a plurality of targets, so that the model can well learn the characteristics of the echoes generated by the targets, thereby extracting a plurality of relatively simple echoes generated by a single target.

Drawings

Fig. 1 is a diagram of a basic residual block structure.

FIG. 2 is a schematic representation of the model structure of the present invention.

Fig. 3 shows an example of input data. Wherein, (a) represents an echo signal generated by three targets, and (b), (c), and (d) represent echo signals generated by three targets, respectively.

FIG. 4 is a sample presentation of the present invention showing the visualization of the predicted results. (a) In (f), each row represents a predicted echo and label from left to right (e.g., (b), (c), (d) in fig. 3).

Detailed Description

Having introduced the algorithmic principles and the specific steps of the present invention, the effect of the signal separation test on simulation data is shown below.

The data set used for the experiment was derived from 802 generated frequency band-specific echo data for three targets. There were 428 target positions, 24 relative poses, and a total of 10272 samples.

In the test, the peak amplitude relative error (e) is adopted_φ) Peak position relative error (e)_P) Two indexes are used for measuring the experimental effect, wherein the first two relative errors are generally within 15% and have practical application value.

Experimental example 1: 20 random samples

I denotes the sample number, H denotes the peak amplitude relative error, P denotes the peak position relative error, and the following numbers indicate relative to the several targets. For example, H1 represents the relative error in the amplitude of the predicted first target echo signal and the actual echo signal. The numbers marked in bold indicate that the indicator is not acceptable.

Table 1: performance of the algorithm over 20 random samples

From the results of the random extraction it can be seen that:

1. the mean peak error generated by the model is 8.455% at the maximum.

2. The model generated average peak position error is only 5.761% at the maximum.

3. The model behavior can in some cases present an over-fitting problem, since the robustness of the model is reduced when the data is highly noisy. But from experimental results, in most cases, has good predictive performance. We will solve this problem in the next step of the study by augmenting the data and improving the robustness of the model.

Experimental example 2: all samples

Table 2: performance of the algorithm on all samples

	H1	P1	H2	P2	H3	P3
							Mean value (%)	8.380	8.396	6.739	7.064	7.684	6.775
Standard deviation of	0.104	0.163	0.099	0.210	0.111	0.146

From the absolute value of the error, the average peak difference is 0.0004, and the average position difference is 1.436, where the unit of the peak difference is the signal strength and the position difference is one step (0.01 m). In addition, for random sample input, the output which is qualified for the relative error of the peak is obtained with 87.208% probability, the output which is qualified for the relative position error of the peak is obtained with 84.820% probability, and the evaluation index within 15% is better reached.

Appendix 1: algorithm pseudo code for basic residual block

Input of X

f＝conv_f(X)

t＝conv_1(X)

t＝relu_1(t)

t＝bn_1(t)

t＝conv_2(t)

t＝relu_2(t)

res＝bn_2(t)

t＝f+res

output＝MaxPool(t)

returnoutput

Appendix 2: algorithmic pseudo code for deep learning networks

Appendix 3: algorithm pseudo code for training deep learning networks

Claims

1. A signal feature extraction and data mining method based on echo measurement data is characterized in that a deep learning information processing technology is adopted, and the method comprises the following specific steps:

(2) constructing a basic residual block, wherein the structure of the basic residual block comprises: three one-dimensional convolutions, one ReLU activation function, two batch regularization functions, and one maximum pooling layer;

2. The method for signal feature extraction and data mining based on echo measurement data according to claim 1, wherein the additional procedure for generating a single target echo in step (1) is as follows: for a given target position and target posture, k targets are set, echo generated by the first target independently is generated, the position and the relative posture of the first target are stored, and other targets are removed to generate a label corresponding to the target; this process was repeated k times to obtain all the tags.

3. The method of claim 2, wherein two paths are provided in the basic residual block in step (2): one path is called a mapping path and is used for directly mapping the input signal to a high-dimensional space with more channels so as to obtain a high-dimensional representation which is easier to process; the other path is called a residual path and is used for learning the residual between the high-dimensional representation obtained in the previous path and the better high-dimensional representation;

in a mapping path, due to strong time sequence correlation of echo signals, a one-dimensional convolution is used for obtaining time sequence information and changing the number of input channels, and the one-dimensional convolution is represented by conv _ f; in order to realize the residual addition, the step size is set according to the kernel size of the convolution in the prior art, the input length is ensured to be unchanged, and the step size can be directly added with the output of a residual path, and the obtained mapping is represented by conv _ f (x);

in the residual path, two identical structures are reused, which are described in detail below:

(a) similarly, in order to obtain time sequence information, the input is firstly subjected to one-dimensional convolution and is represented by conv _ 1;

(b) then, a linear rectification function is used as an activation function, the part which is smaller than 0 in the result is forcibly set to be 0, the part does not relate to gradient postback and is represented by relu _ 1;

(c) for the convenience of training, a batch of regularization is used, and is represented by bn _ 1; because the input and the output are relative to a batch, the batch regularization is to map all samples in the batch to map the mean value to 0 and the variance to 1 so as to accelerate the network training;

the input passes through the paths (a), (b) and (c) above twice, the one-dimensional convolution in the second time is used for activating the function, the batch regularization function is respectively represented by conv _2, relu _2 and bn _2, the obtained residual is represented by res (x);

finally, a maximum pooling layer is used to obtain the final result, i.e. a sliding window is set on the input, and the maximum value in the window is output, wherein the input of the pooling layer is conv _ f (x) + res (x); the role of the pooling layer is to reduce the number of parameters while preserving significant features, denoted by MaxPool.

4. The method for signal feature extraction and data mining based on echo measurement data according to claim 3, wherein the method for forming the complete network in step (3) is as follows: firstly taking L basic residual blocks, forming an encoder according to a head-to-tail connection mode, numbering a first passing basic residual block as 1 according to the sequence of input passes, and using f₁Showing, other similar reason; the last passing basic residual block is numbered L;

then, after a one-dimensional convolution, indicated by conv _ med, the one-dimensional convolution is used for processing the coded features to a subsequent decoder; then, taking L basic residual blocks, forming a decoder according to the end-to-end mode, the number sequence is opposite to that of the encoder, numbering the first passing basic residual block as L, and using h_LShowing, other similar reason; the last one is 1; the input here is the aforementioned feature after one-dimensional convolution processing;

finally, establishing a cross link of the encoder and the decoder; for the basic residual block with the decoder number i, if i < L, i.e. not the first passing basic residual block, the input is the output of the encoder residual block with the number i, and the input is spliced with the channel of the decoder residual block with the number i + 1; if i is L, it is pieced together with the output from conv _ med as input.

5. The method for signal feature extraction and data mining based on echo measurement data according to claim 4, wherein in the step (5), the network is trained by randomly dividing the data in the training set into a plurality of batches, each batch consisting of 64 echo signals; the training network is divided into two stages:

the first stage uses an Adam optimizer with a higher learning rate, i.e., a learning rate between 0.001, beta1 equal to 0.9, and beta2 equal to 0.999; only running iteration on the training set, and taking MSE as a loss function to carry out gradient pass-back, wherein the expression is as follows:

MSE(Z_i,Y_i)＝||Z_i-Y_i||₂

the second stage uses a lower learning rate, which differs from the first stage in that the learning rate is reduced to 0.0001, and furthermore, the iteration is run on the training set, the same loss function is still used, then the performance is evaluated on the verification set every 5 times, if the performance is not improved within 50 iterations, the model is considered to be converged, the training is stopped, and the historical best model is output.

6. The method for extracting signal characteristics and mining data based on echo measurement data according to claim 5, wherein in the step (5), two training network indexes are evaluated, one is a relative error of the size of the peak value, and the other is a relative error of the relative position of the peak value, and if the two relative errors are within fifteen percent, the evaluation is qualified;

the following gives an accurate definition of two performance indicators, first defining the peak value of signal X as:

φ(X)＝max(X)

here, the signal X and the following X are assumed to be tensors with only one dimension of length;

the peak position of the signal X is then defined as:

P(X)＝argmax(X)

then the relative error in peak magnitude is:

the relative error in peak position is:

the denominator means the length of the peak, i.e. the number of points in the whole wave band which are greater than the peak by a certain proportion, and k is used for representing the certain proportion;

the evaluation index is only related to a certain maximum point of the wave crest, so that the input parameter of the evaluation index only has an obvious wave crest;

smoothing the gradient by averaging according to the number of batch samples or the signal length; the loss function is used for gradient return, all points of an output signal are close to a label as much as possible, and the output has higher reliability.