Disclosure of Invention
In view of the above, the present invention aims to provide a depressive disorder recognition system based on multi-modal domain adaptation, which is capable of improving fusion capability between different data modalities and feature alignment capability between different domains by collecting multi-modal data including electroencephalogram signals, skin electric conduction and electrocardiosignals and combining multi-modal domain adaptation technology, so as to improve accuracy and robustness of depressive disorder recognition. The method utilizes a deep migration learning algorithm to realize optimization in a multi-modal feature space, enhances the recognition capability of depressive disorder, adapts to physiological differences of different individuals, and effectively supports real-time analysis and diagnosis.
A depressive disorder recognition system based on multi-modal domain adaptation comprises a data acquisition module, a feature extraction module, a multi-modal domain adaptation module, a neural network classification module and a model training and depression detection module;
The data acquisition module is used for acquiring three mode signals of brain electricity, skin electricity and electrocardio under the state of rest and audio stimulus;
the characteristic extraction module is used for extracting characteristics of electroencephalogram, skin electricity and electrocardiosignal;
The multi-modal domain adaptation module is configured to:
dividing the three mode signal data into source domain data and target domain data;
For each source domain and target domain data, using the brain electrical signal characteristics, skin electrical signal characteristics and electrocardiosignal characteristics to form tensors, and then carrying out CP decomposition on the tensors, wherein the tensors are specifically expressed as:
wherein S represents a source domain feature tensor, T represents a target domain feature tensor, R is a decomposition rank and represents the number of components used in decomposition; represents the corresponding scalar weights when the source domain features are decomposed,Respectively representing the interaction characteristics of the electroencephalogram signal, the skin electric signal and the electrocardiosignal on different ranks when the source domain characteristics are decomposed, which are the parameters to be learned, and correspondingly,Represents the scalar weight corresponding to the decomposition of the target domain feature,The interaction characteristics of the electroencephalogram signal, the skin electric signal and the electrocardiosignal on different ranks when the source and target domain characteristics are decomposed are respectively represented, and the interaction characteristics are parameters to be learned.
After tensor decomposition, the first alignment loss is calculated:
Wherein phi () is a feature mapping function representing mapping the original feature space into a higher dimensional space, Si represents the ith sample feature in the source domain feature S, Tj represents the jth sample feature in the target domain feature T, nS and nT are the source domain and target domain sample numbers, respectively;
relevant information between a source domain and a target domain is extracted through an attention mechanism, and the method comprises the following specific steps:
Information related to the source domain S is extracted from the target domain T using the attention mechanism:
constructing a new target domain feature T' =t+a containing commonality information;
Wherein Q is a query vector for searching for related information, K is a key vector for matching with the query vector to find information related thereto, V is a value vector representing information on the match which is information of ultimate interest, dk is a dimension of the key;
For the input source domain feature S and target domain feature T, obtaining corresponding query vectors and key vector number vectors through linear transformation:
QS=SWQ,KS=SWK,VS=SWV;
QT=TWQ,KT=TWK,VT=TWV;
Wherein, WQ、WK and WV are weight matrices to be learned;
information related to the target domain T is extracted from the source domain S using the attention mechanism:
Constructing a new source domain feature S' =s+b containing commonality information;
The second alignment loss is calculated and expressed as:
Wherein S 'i represents the i-th sample feature in the new source domain feature S', T 'j represents the j-th sample feature in the new target domain feature T', and nS′ and nT′ are the new source domain and target domain sample numbers, respectively;
the neural network classification module is used for classifying the input characteristic data to obtain a classification result;
The model training and depression detection module is used for:
1) Model training:
the source domain data is input to a neural network classification module, and the classification cross entropy loss is calculated according to the obtained classification result:
Where yi is the actual tag of the feature data,Is a predictive tag;
finally, a total loss function is calculated by combining dynamic fusion and cross-domain alignment to optimize the whole model:
L=λ1Lmmd1+λ2Lmmd2+Lcls
Wherein λ1、λ2 is a hyper-parameter for balancing the effects of different penalty terms;
Training a neural network classification module and parameters to be learned based on the loss function L combining dynamic fusion and cross-domain alignment, and updating model weights through a back propagation algorithm to finally obtain an optimized depression detection system.
2) Depression detection:
in practical application, the acquired new tested multi-mode signals are input into the feature extraction module for preprocessing, and then are input into the trained neural network classification module to generate classification results.
2. The depressive disorder recognition system based on multi-modal domain adaptation of claim 1, wherein the model training and depression detection module further comprises detection of a neural network classification module:
and detecting the classification performance of the neural network classification module by adopting the source domain data and the corresponding labels.
Further, the device also comprises a data preprocessing module, which comprises the steps of carrying out power frequency denoising, band-pass filtering, artifact removing and downsampling on the electroencephalogram signals, carrying out low-pass filtering and smoothing on the skin electric signals, and carrying out power frequency denoising, band-pass filtering and smoothing on the electrocardiosignals so as to improve the signal quality.
Preferably, the feature extraction module performs manual feature extraction based on priori knowledge on the electroencephalogram signal, the skin electric signal and the electrocardiosignal signal respectively, and then extracts depth features through a transducer encoder.
Further, the feature extraction module is further configured to:
Three types of linear features and three types of nonlinear features are extracted from the electroencephalogram signals, wherein the linear features comprise average values, power spectrum densities and center frequencies, and the nonlinear features comprise sample entropy, approximate entropy and Lyapunov indexes.
Further, the feature extraction module is further configured to:
heart rate variability, frequency ratio and fractal dimension features are extracted for the electrocardiographic signals.
Further, the feature extraction module is further configured to:
skin conductance level, peak number and variance features are extracted for the skin electrical signal.
Preferably, the neural network classification module is composed of a plurality of fully-connected layers, and an activation function is connected to the back of each fully-connected layer.
Furthermore, the neural network classification module also introduces batch normalization and Dropout technologies to accelerate convergence and stabilize the training process.
The invention has the following beneficial effects:
The invention aims to solve the challenges of low recognition rate of single-mode physiological signals in detection of depressive disorder and individual differences. These challenges are particularly pronounced in depressive disorder identification due to the significant differences in physiological signal characteristics among different individuals. Therefore, a depressive disorder recognition method based on multi-modal domain adaptation is proposed. The domain refers in the present invention to the physiological signal feature space of different individuals. By aligning the feature spaces between these different individuals, the multi-modal domain adaptation can overcome the influence of individual differences on the recognition effect, thereby improving the accuracy and robustness of depressive disorder recognition (see fig. 1). By integrating multi-mode data such as an electroencephalogram signal, a skin electric signal, an electrocardiosignal and the like, effective fusion among signals is realized, and the identification accuracy is improved by utilizing complementary information of signals in different modes. In addition, domain adaptation techniques are introduced to solve the problem of data distribution offset caused by individual differences, thereby enhancing generalization ability and stability of the model. The method provides more efficient and accurate technical support for clinical application.
According to the depressive disorder recognition method based on multi-modal domain adaptation, the interaction relation of three signals of electroencephalogram, skin electricity and electrocardio is modeled into a deep learning framework, so that dynamic fusion and parameter optimization between the signals are realized. The method fully utilizes the complementary information of the multi-mode signals, so that the model can extract more discriminative characteristics from complex physiological data, and the accuracy and the robustness of identifying the depressive disorder are effectively improved. In addition, the cross-domain commonality feature is successfully mined by pulling the source domain and the target domain data distribution twice. Firstly aligning and using the maximum mean value difference to ensure the consistency of the source domain and the target domain in the feature space, secondly, further deeply excavating and fusing information features related to the target through an attention mechanism to construct more accurate feature characterization. The cross-domain alignment strategy effectively improves the adaptability of the model in different data domains, and remarkably reduces the negative influence caused by domain offset. In general, the identification method provided by the invention has obvious advantages in improving the detection accuracy of depressive disorder and enhancing the generalization capability of the model. It not only demonstrates the effectiveness of multi-modal fusion and domain adaptation, but also in practice provides an innovative solution for the automated detection of depressive disorders.
Detailed Description
The invention will now be described in detail by way of example with reference to the accompanying drawings.
The invention provides a depressive disorder recognition system based on multi-modal domain adaptation, which comprises the steps of firstly preprocessing multi-modal electrophysiological signals in a resting state and an audio stimulus state and extracting corresponding features, wherein the extracted features are artificial features and depth features which are commonly used in the corresponding modes, then modeling dynamic interaction relations of electroencephalogram signals, skin electrical signals and electrocardiosignals through CP tensor synthesis and decomposition, secondly digging a cross-domain commonality feature training model through two maximum mean value differences and an attention mechanism, and finally using a test set to detect depression to obtain the expected depression detection recognition rate. As shown in fig. 1, the system includes the following modules:
(1) The data acquisition module acquires the information of the number, name, sex, age and the like of the tested, and simultaneously acquires signals of brain electricity, skin electricity, electrocardio and the like of the tested in a resting state and an audio stimulation state in sequence. The acquisition positions of brain electricity, skin electricity and electrocardio are shown in figure 2.
In this embodiment, the electroencephalogram signal is collected by the self-grinding three-conductive electroencephalogram device (see left of fig. 2) and the software, and the skin electricity signal and the electrocardiosignal signal are collected by the secondarily developed software, and the two software record the time stamps at the same time, so as to ensure that the data are aligned in time. All the tested students in universities have normal hearing and intelligence, and no other mental medical history exists in the past. No psychotropic drugs were taken prior to data acquisition, and all subjects were conducted under the same laboratory conditions. The sampling rate of the electroencephalogram equipment is 250Hz, and the sampling rate of the skin electricity and electrocardio equipment is 50Hz.
(2) The data preprocessing module comprises power frequency denoising, band-pass filtering, artifact removing and downsampling for the electroencephalogram signals, low-pass filtering and smoothing for the skin electric signals, and power frequency denoising, band-pass filtering and smoothing for the electrocardiosignals so as to improve the signal quality.
In this embodiment, a notch filter is first used to remove the 50Hz line interference to eliminate the power frequency noise. Then, the brain electrical signal, the skin electrical signal and the electrocardiosignal are respectively subjected to band-pass filtering, and the filtering ranges are respectively set to be 0.5-50Hz, 0.5-2Hz and 0.5-40Hz so as to keep effective signal frequency bands. Then, artifact removal is performed, and abnormal data caused by motion artifacts or eye electrical artifacts are detected and removed, so that the quality of signals is improved. In order to reduce random fluctuation in the signal, a moving average technology is applied to carry out signal smoothing processing, so that the stability of data is enhanced. Finally, the electroencephalogram sampling rate is reduced from 250Hz to 100Hz through downsampling, the data volume is reduced, and key information is kept.
(3) And the feature extraction module is used for respectively carrying out manual feature extraction based on priori knowledge on the preprocessed electroencephalogram signals, skin electric signals, electrocardiosignals and the like, and extracting depth features through a transducer encoder. And the electroencephalogram signal extracts three types of linear characteristics and three types of nonlinear characteristics. The linear features include average value, power spectral density, center frequency, and the nonlinear features include sample entropy, approximate entropy, and Lyapunov exponent. And the electrocardiosignal extracts the characteristics such as heart rate variability, frequency ratio, fractal dimension and the like. Skin electric signal, which extracts the characteristics of skin conductance level, peak value number and variance.
(4) The multi-modal domain adaptation module models dynamic interaction between multi-modal physiological signals through tensor synthesis and decomposition, and embeds the dynamic interaction into a training process of deep learning to optimize model parameters. And then, mining the commonality information of the source domain and the target domain, and performing cross-domain alignment by using a maximum mean difference method, so that a model which is well represented on the target domain is trained, and the source domain features are effectively applied to the target domain.
(5) The neural network classification module consists of a plurality of fully connected layers, and an activation function is immediately followed by each fully connected layer and is used for introducing nonlinearity and avoiding the problem of gradient disappearance. In addition, in order to enhance the generalization capability of the network and prevent overfitting, regularization technologies such as batch normalization and Dropout are also introduced into the network, so as to accelerate convergence and stabilize the training process.
(6) The model training and depression detection module (see figure 3) uses the trained model and the neural network classification module to detect depression of the newly acquired data so as to realize efficient classification and identification.
The data acquisition module comprises the following steps:
1) The experimental design is that firstly, the multi-mode physiological signals of the tested in the eye-closing resting state are collected. An auditory stimulus experiment was then performed using auditory stimuli of a variety of different emotional attributes (e.g., 2 positive stimuli, 2 neutral stimuli, 2 negative stimuli), each auditory stimulus lasting for a set period of time. After each auditory stimulus, the test will have a brief rest time. In the whole process, the brain electrical signals, skin electrical signals, electrocardiosignals and the like under the eye-closing state of the tested are continuously collected, and finally complete data are obtained.
2) Electroencephalogram acquisition, namely, electroencephalogram acquisition is carried out by adopting electroencephalogram equipment meeting the international widely-used 10-20 system electrode position standard.
3) The multi-mode electrophysiological acquisition realizes synchronous and stable acquisition of other physiological signals through multi-mode acquisition equipment. The equipment is optimally designed, so that the accuracy and stability of signal acquisition are ensured.
And in the data preprocessing stage, power frequency denoising is realized by using a notch filter to remove 50Hz wire interference.
The band-pass filtering ranges of the electroencephalogram signal, the skin electric signal and the electrocardiosignal in the data preprocessing stage are respectively 0.5-50Hz, 0.5-2Hz and 0.5-40Hz.
And artifact removal in the data preprocessing stage is to detect and remove data anomalies caused by motion artifacts or eye electrical artifacts so as to improve the quality of signals.
The smoothing in the data preprocessing stage is to reduce random fluctuation of signals by a moving average technology so as to improve the stability of data.
The downsampling in the data preprocessing stage is to reduce the sampling rate from 250Hz to 100Hz.
The characteristic extraction module is used for respectively carrying out manual characteristic extraction based on priori knowledge on the preprocessed brain electrical signals, skin electrical signals and electrocardiosignals and extracting depth characteristics through a transducer encoder. For electroencephalogram signals, three types of linear features (average value, power spectral density, center frequency) and three types of nonlinear features (sample entropy, approximate entropy, lyapunov index) are extracted. For electrocardiosignals, characteristics such as heart rate variability, frequency ratio, fractal dimension and the like are extracted. For skin electrical signals, features such as skin conductance level, peak number and variance are extracted. Depth feature extraction is then performed by a transducer encoder, and the complex timing patterns and global dependencies in the signal are automatically learned using a self-attention mechanism.
Power spectral density:
Where x (t) is the time domain signal and f is the frequency.
Center frequency:
Where f1 is the frequency down line and f2 is the frequency up line.
Sample entropy:
Wherein ai is the logarithm of the sequence with a matching length of m+1, Bi is the logarithm of the sequence with a matching length of m, r is the tolerance, and N is the signal length.
Approximate entropy:
ApEn(m,r,N)=φm(r)-φm+1(r)
Wherein, theIs all the sequence pairs similar to the template sequence.
Lyapunov index:
Wherein, theIs a small disturbance at time i.
Heart rate variability:
wherein RRi is the RR interval of adjacent heartbeats,Is the average of the RR intervals.
Frequency ratio:
Where HFB represents a high frequency band range and LFB represents a low frequency band range.
Fractal dimension:
Where e is the scale parameter of the fractal structure and N is the length of the signal.
Skin conductance level:
Where SCi is the skin conductance value at the i-th time point.
The multi-modal domain adaptation module performs the following method steps:
1) Dynamic fusion:
First, a tensor X ε RI×J×K is composed using an electroencephalogram feature (feature dimension: I), a skin electrical feature (feature dimension: J), and an electrocardiosignal feature (feature dimension: K). The resultant tensor X captures the co-action of the three signals in the respective characteristic dimensions. In order to extract meaningful mutual information from this synthesized tensor, CP decomposition is used, which decomposes a high-dimensional tensor into a combination of multiple low-dimensional factor vectors, expressing the intrinsic structure of the tensor, specifically expressed as:
Wherein R is the rank of decomposition, which represents the number of components used in the decomposition, lambdar is scalar weight, which represents the importance of different components, ar、br and cr are corresponding factor vectors, which are respectively from factor matrices of the characteristics of the brain electrical signal, the skin electrical signal and the electrocardiosignal.Representing the outer product, ar、br and cr respectively represent the interaction characteristics of the electroencephalogram signal, the skin electric signal and the electrocardiosignal on different ranks. Lambdar、ar、br and cr are updated parameters embedded in deep learning.
In this embodiment, the tested multi-modal data is divided into 10 parts using ten-fold cross-validation, where 9-fold is the source domain data and the remaining 1-fold is the target domain data. We can only use the source domain data and the tag, target domain data during the training phase. The remaining 1-fold data, including the target domain data and the labels, is used during the test phase.
The source domain feature S and the target domain feature T are constructed as follows:
Wherein, theRepresents the corresponding scalar weights when the source domain features are decomposed,Respectively representing the interaction characteristics of the electroencephalogram signal, the skin electric signal and the electrocardiosignal on different ranks when the source domain characteristics are decomposed, which are the parameters to be learned, and correspondingly,Represents the scalar weight corresponding to the decomposition of the target domain feature,The interaction characteristics of the electroencephalogram signal, the skin electric signal and the electrocardiosignal on different ranks when the source and target domain characteristics are decomposed are respectively represented, and the interaction characteristics are parameters to be learned.
The decomposition process is to extract a low-dimensional representation of the multimodal data, capturing complex interactions between signals, and the synthesis is to reassemble low-dimensional factors into a new, more semantically and easily aligned feature representation. X is multi-mode data, the source domain is multi-mode data, the target domain is multi-mode data, the two steps are needed, the source domain is the source domain characteristics which are decomposed and synthesized, and the target domain data is the same.
2) Cross-domain alignment:
the source domain and the target domain both contain brain electrical signals, skin electrical signals and electrocardiosignals. The present invention uses the maximum mean difference for alignment. After tensor decomposition, the source domain feature is denoted as S and the target domain feature is denoted as T. The first alignment loss can be expressed as:
Where phi () is a feature mapping function representing the mapping of the original feature space into a higher dimensional space (the regenerated kernel hilbert space). Si represents the ith sample feature in the source domain feature S, Tj represents the jth sample feature in the target domain feature T, and ns and nT are the number of source domain and target domain samples, respectively.
In practice, feature mapping is achieved by a kernel function, such as a gaussian kernel (RBF kernel):
where S is the bandwidth hyper-parameter of the kernel function, Si∈Si,sj∈Tj.
Relevant information between a source domain and a target domain is extracted through an attention mechanism, and the method comprises the following specific steps:
Information related to the source domain S is extracted from the target domain T using the attention mechanism:
Constructing a new target domain feature T' =t+a containing commonality information. Q is a query vector for finding relevant information, K is a key vector for matching with the query vector to find information related thereto, and V is a value vector representing information on the match, which is information of ultimate interest. In the source domain, the features of the target domain are used as queries (Q) to match the features (K) of the source domain, thereby extracting information in the source domain that is related to the target domain, and vice versa. For the input source and target domain features (S, T), Q, K and V are obtained by linear transformation.
QS=SWQ,KS=SWK,VS=SWV
Where WQ、WK and WV are weight matrices to be learned, dk is the dimension of the key, and vice versa for the target domain, namely:
QT=TWQ,KT=TWK,VT=TWV;
information related to the target domain T is extracted from the source domain S using the attention mechanism:
And constructing a new source domain feature S' =S+B containing the commonality information.
The second alignment loss is expressed as:
Wherein Si 'represents the i-th sample feature in the new source domain feature S', Tj 'represents the j-th sample feature in the new target domain feature T', and nS′ and nT′ are the new source domain and target domain sample numbers.
The neural network classification module classifies the input new source domain features S 'and the new target domain features T' to obtain classification results;
The model training and depression detection module is used for:
1) Model training and testing:
Training a model by adopting source domain data, and classifying the source domain by using two kinds of cross entropy loss:
wherein yi is the actual label,Is a predictive tag.
Finally, the loss functions of dynamic fusion and cross-domain alignment are combined to optimize the whole model:
L=λ1Lmmd1+λ2Lmmd2+Lcls
where lambda1、λ2 is a super parameter used to balance the effects of different loss terms.
Training a neural network classification module and parameters to be learned based on the loss function L combining dynamic fusion and cross-domain alignment, and updating model weights through a back propagation algorithm to finally obtain an optimized depression detection system.
At this stage, a ten-fold cross-validation approach is used to train the model to improve its generalization ability. In the method, a data set is divided into ten equal parts, one part is selected as a test set each time, the other nine parts are used as training sets, and verification is repeated for a plurality of times. In each verification, one piece of data which is currently reserved is input into a neural network classifier, and a prediction result of the model is obtained. And comparing the model output with a real label, and calculating classification performance indexes including accuracy, sensitivity, specificity and the like so as to comprehensively evaluate the detection effect of the model. The entire process will be performed on a plurality of data under test.
2) Depression detection:
In practical application, the acquired new tested multi-mode signals (such as brain electricity, skin electricity, electrocardio and the like) are input into a feature extraction module for preprocessing, and then are input into a trained neural network model to generate a prediction result. By analyzing the output of the model, efficient depression classification and identification is performed to aid in clinical diagnosis and treatment decisions.
In summary, the above embodiments are only preferred embodiments of the present invention, and are not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.