Deep learning decoding method and system for hand motion driven by neural characterizationTechnical Field
The invention belongs to the technical field of neuroscience and deep learning, and particularly relates to a deep learning decoding method and system for hand motions driven by neural characterization.
Background
Brain-computer interface (BCI) has been a hotspot because it can translate brain signals directly to establish a communication pathway between the brain and peripheral devices. Brain signals may be recorded in an invasive or non-invasive manner. Electroencephalogram (EEG) is one of the main non-invasive recording methods. Because the electroencephalogram has the advantages of low cost, portability, small wound and the like, the electroencephalogram-based brain-computer interface has wide application and is an important branch of the brain-computer interface.
Typical electroencephalogram-based brain-computer interfaces include visual brain-computer interfaces, auditory brain-computer interfaces, and motor brain-computer interfaces. The motor brain-computer interface may reflect voluntary motor intent and thus be more natural and intuitive than visual and auditory brain-computer interfaces that rely on passive evoked external stimuli. The motor brain-computer interface is intended to restore or compensate for central nervous system function. Applications for the motor brain interface include neurological rehabilitation and assistance in the daily life of motor loss patients. For example, combining a motor brain-computer interface with functional electrical stimulation can help the patient actively mobilize the impaired limb according to his or her motor intent, further promoting neuroplasticity, helping to reestablish neuromuscular circuitry.
Common motor brain-computer interface paradigms are the Motor Imagery (MI) and Motor Execution (ME) paradigms. MI can be seen as a psychological activity of a specific motion without any apparent motor output, more dependent on repeated imagination of the motor pattern. Studies have shown that MI of motor motion can produce reproducible brain activation patterns on the auxiliary (SMA) and primary (M1) motor regions. Their corresponding neuromodulation is based on sensory-motor rhythms (SMRs), with decreases and increases in event-related and frequency-band specific power, known as event-related desynchronization (ERD) and event-related synchronization (ERS). Although the MI task has some value in helping and restoring impaired motor function to the patient, it is limited to a small number of decoded instructions, low decoding accuracy, and a large mental burden. Furthermore, MI tasks often require imagination of movements of different parts of the human body, such as the right hand, left hand, foot and tongue, and thus in some cases MI tasks are inconsistent with the actual output commands. The ME paradigm is more natural than the MI paradigm because it is targeted, with the locomotor tasks closely matching the participants' natural behavior. Furthermore, the ME has the advantage that it has more pronounced brain activity in the time and spectral domains and better decoding performance for movements, especially multi-classified movements. Unlike other brain-computer interface paradigms (including MI, P300, and SSVEP), neural activity in the ME paradigm contains both event-related potentials and oscillating components. Evoked potentials can be captured from the low frequency bands of electroencephalogram signals, known as motion-related cortical potentials (MRCPs), which are induced during performance of exercise planning, preparation, and actual exercise. The oscillation component of ME has a similar pattern to MI, manifesting as early ERD of mu rhythm (mu-rhythm) before motion begins and late ERS of 20-30Hz after motion is performed. Both MRCP and ERS/D oscillations reflect sensorimotor cortex processes, implying complementary information related to motion.
Many studies explore how to use MRCP or ERD to decode hand movement intent based on ME paradigm. Decoding includes motion initiation detection, motion direction classification, torque level, speed and motion type, continuous motion reconstruction. These studies demonstrate the feasibility of decoding upper limb or hand movement intent from electroencephalogram signals. However, existing hand motion decoding studies based on the ME paradigm are mostly limited to single hand motion decoding, requiring that the non-dominant hand remain stationary during the experiment. The reason for this is to reduce motion disturbances in decoding the dominant hand motion.
However, in daily life, it is very common to coordinate the hands to accomplish a task, and in neurological rehabilitation, bilateral training can promote rehabilitation after stroke. Thus, decoding a two-handed motion is valuable. However, it suffers from at least two problems. First, brain activity is more complex than single-handed motion when performing two-handed motion. Another is that there are more modes of two-hand motion than one-hand motion. These two problems pose significant challenges to the decoding performance of the two-handed motion. Therefore, the problem of motion decoding of both hands is in need of further exploration.
Disclosure of Invention
The invention aims to provide a deep learning decoding method and a deep learning decoding system for hand motions driven by nerve characterization, so as to improve decoding performance of multi-classification single-hand and double-hand motions.
In order to achieve the above object, the present invention provides a method for deep learning and decoding hand motion driven by neural characterization, comprising:
Acquiring an electroencephalogram signal, and performing feature extraction on the electroencephalogram signal to acquire a motion-related cortex potential feature and an event-related synchronization/desynchronization oscillation feature;
Acquiring a time-frequency spectrum-space characteristic map based on the motion-related cortex potential characteristic and the event-related synchronization/desynchronization oscillation characteristic;
And performing one-hand motion intention decoding on the time-frequency spectrum-space feature map, obtaining a decoding result, and completing the deep learning decoding of the hand motion driven by the neural characterization.
Optionally, extracting features of the electroencephalogram signal, and obtaining the motion-related cortex potential feature and the event-related synchronization/desynchronization oscillation feature includes:
filtering the electroencephalogram signals by utilizing discrete wavelet transformation to obtain a plurality of frequency bands;
acquiring an optimal motion related cortex potential frequency band based on mutual information and a plurality of frequency bands;
performing continuous wavelet transformation processing on the optimal motion-related cortex potential frequency band to obtain motion-related cortex potential characteristics;
filtering the electroencephalogram signals by utilizing continuous wavelet transformation to obtain spectrum power;
based on the mutual information and the spectrum power, acquiring an optimal event related synchronization/desynchronization frequency band;
And acquiring the event-related synchronization/desynchronization oscillation characteristics based on the optimal event-related synchronization/desynchronization frequency band.
Optionally, based on the motion-related cortical potential signature and the event-related synchronous/desynchronized oscillation signature, obtaining a temporal-spectral-spatial signature comprises:
Fusing the motion-related cortex potential characteristics and the event-related synchronization/desynchronization oscillation characteristics to obtain time-frequency spectrum characteristics;
and weighting the channel weight of the time-frequency spectrum characteristic based on an attention mechanism to acquire a time-frequency spectrum-space characteristic diagram.
Optionally, weighting the channel weights for the time-spectrum features based on an attention mechanism, and acquiring the time-spectrum-space feature map includes:
constructing a channel attention model, inputting the time-frequency spectrum characteristics into the channel attention model, and obtaining a channel score;
Scaling the dot product of the channel score, and weighting based on a normalized exponential function to obtain an attention score;
Performing dot product processing on the attention score and the time-frequency spectrum characteristic to acquire channel weighted data;
and superposing and projecting the channel weighted data to obtain a time-frequency spectrum-space characteristic diagram.
Optionally, performing one-hand motion intention decoding on the time-frequency spectrum-space feature map, and obtaining a decoding result includes:
performing first convolution processing on the time-frequency spectrum-space feature map to obtain a plurality of feature maps after convolution;
performing second convolution processing on a plurality of the convolved feature images to obtain a plurality of depth convolved feature images;
pooling a plurality of the feature images after the depth convolution to obtain a plurality of one-dimensional feature images;
And carrying out full-connection processing on the plurality of one-dimensional feature maps based on a serial connection mode to obtain a decoding result of the single-hand and double-hand motion.
In order to achieve the above purpose, the present invention further provides a neural characterization driven deep learning decoding system for hand motion, which includes a feature representation module, an attention-based channel weighting module, and a shallow convolutional neural network module, wherein the feature representation module, the attention-based channel weighting module, and the shallow convolutional neural network module are sequentially connected;
the characteristic representation module is used for acquiring the potential characteristics of the motion-related cortex and the event-related synchronization/desynchronization oscillation characteristics;
the attention-based channel weighting module is used for acquiring the time-frequency spectrum-space characteristic diagram;
The shallow convolutional neural network module is used for extracting features of the time-frequency spectrum-space feature map.
Optionally, the attention-based channel weighting module includes a query and key sub-module for weighting the channel score.
The invention has the following beneficial effects:
The deep learning decoding method and system for the hand motions driven by the nerve representation designs a deep learning model driven by the nerve physiological characteristics, improves the decoding performance of multi-classification single-hand and double-hand motions, is more applicable to EEG data in the invention, can improve the decoding performance and shorten the calculation time, and tries to improve the motion decoding performance for the first time by fusing the nerve representation of the motion-related cortex potential and the time-related synchronization/desynchronization in the deep learning.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application. In the drawings:
FIG. 1 is a schematic diagram of a neural-characterization-driven deep learning decoding method for hand motions in accordance with an embodiment of the present invention;
fig. 2 is a schematic diagram of a channel attention module according to an embodiment of the present invention.
Detailed Description
It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other. The application will be described in detail below with reference to the drawings in connection with embodiments.
It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is illustrated in the flowcharts, in some cases the steps illustrated or described may be performed in an order other than that illustrated herein.
As shown in fig. 1, the embodiment provides a deep learning decoding system for hand motion driven by neural characterization, which comprises a feature representation module, an attention-based channel weighting module and a shallow convolutional neural network module. The system adopts an electroencephalogram signal as an input signal and outputs a decoding result of single-hand and double-hand movement.
1. Neural characterization module for MRCP and ERS/D activities
Representing the original input EEG signal asWhere i is the ith test, x is the input electroencephalogram sample, n is the total number of tests, C is the number of electrodes, T is the number of time sampling points, R is real space, and the corresponding class label is expressed asSix types of one-hand and two-hand movement combinations are represented, where y is a subset of class labels.
To obtain MRCPs features, the original EEG signal Xtrain of the training dataset is first decomposed into different frequency bands using Discrete Wavelet Transform (DWT), then an optimal frequency band is selected using mutual information (M-Info), wavelet-based methods are well characterized in both the time and spectral domains. In wavelet transformation, the inner product of the original brain electrical signal and the basis wavelet function is calculated discretely, then the basis to be analyzed is found in a specific wave band, and finally a series of signal basis are reconstructed and weighted to obtain a filtered signal. The Continuous Wavelet Transform (CWT) is defined as:
Wherein the method comprises the steps ofThe electroencephalogram signals corresponding to the electrode c at the time t in the test i are scaling parameters and shifting parameters of the wavelet basis function respectively, ψ is the wavelet function, the complex conjugate is represented by a, b epsilon R, and a noteq 0. The wavelet transform is performed by discretizing the scale and translation parameters a and b, binary scale and transform are used in the present invention:
aj=2j,bj,k=k2j,k,j∈Z (2)
Where j, k are discrete sampling points.
In this case, the set of ψj,k(t)=2-j/2Ψ(2-jt-k),Ψj,k (t) is in the square integrable space L2 (R). Through multi-resolution decomposition, L2 (R) can be decomposed into multiple subspaces Wj, each of which can be obtained by expansion and translation of a single basis function, and then the sequence Vj of a closed subspace can be found. All signals contained in subspace Vj are Vj+1 plus an additional high resolution signal:
Vj=Vj+1⊕Wj+1 (3)
Where ∈ indicates that both Vj+1 and Wj+1 are part of subspace Vj and are orthogonal to each other. All resolution signals in the subspace Vj are accumulated, and an original signal f (t) can be obtained as follows:
Wherein phi (·) is a scale function, ck is a scale factor, dj,k is a wavelet coefficient, and L is the total number of decomposition layers. The wavelet function ψj,k (t) corresponds to a high pass filter, maintaining signal details, the wavelet function φ (·) corresponds to a low pass filter, maintaining an approximation of the signal. The original signal is convolved by a high-pass filter and a low-pass filter to obtain detailed coefficients and approximation coefficients from which the filtered signal can be reconstructed within a particular frequency band. In the present invention, wavelet family "sym5" is employed, and the maximum decomposition level is set to 6.
After the electroencephalogram signals are filtered to different frequency bands, an M-Info method is applied to screening out an optimal frequency band on a training data set. From an informative perspective, M-Info may measure the statistical dependency between EEG sample Xtrain and class label Ytrain, and may be defined as:
In the middle ofP (x, y) is the joint probability density of the continuous random variable, and p (x) and p (y) are the marginal probability densities, respectively. Shannon entropy H (Xtrain) and H (Ytrain) can measure information obtained from variables as follows:
the joint entropy is defined as:
the conditional entropy of a given Ytrain,Xtrain is defined as:
Note H (Xtrain,Ytrain)=H(Xtrain|Ytrain)+H(Ytrain). Entropy H (Ytrain) measures the amount of uncertainty about Ytrain, while H (Xtrain|Ytrain) measures the amount of uncertainty of Xtrain given Ytrain. Thus, the mutual information of I (Xtrain,Ytrain) in equation (5) may be equal to:
The invention calculates the sum of the mutual information of all variables in each frequency band, maintains the frequency band with the maximum mutual information, and then carries out DWT processing on the frequency band to obtain MRCPs characteristics.
To obtain ERS/D oscillation characteristics, CWT is first performed to obtain spectral power, and then M-Info is applied to select an optimal frequency band. The definition of CWT is given in equation (1), CWT is achieved by performing wavelet transforms in large numbers on each possible scale and translation. For each EEG electrode in each trial, the calculated spectral Power is defined as Power (c, t, f), where c is the electrode number, t is the time sampling point, and f is the frequency sampling point. Then, we calculate the power sum for each band as:
Sum_Power(c,t)=∑f Power(c,t,f) (11)
Feature selection of mutual information is then performed on the training data set to select an optimal ERS/D band. We normalize and splice MRCPs and ERS/D oscillation features to form a feature array such as n 2C T.
2. Channel space weight calculation module based on attention mechanism
Different brain regions have different contributions to different brain activities, and therefore an attention-based spatial channel weighting algorithm is employed. The invention further improves the attention mechanism on the basis of scale dot product attention. Specifically, query (Q) and key (K) modules are used to weight the score of a channel by dot product, each module consisting of a linear layer, layerNorm layers, and a Dropout layer, and then the channel attention score is passedScaling is performed and weighted by using the Softmax function, this process can be expressed as:
And finally, carrying out superposition projection on the channel weighted data through a linear layer, a LayerNorm layer and a Dropout layer. LINEAR LAYERS is c×c, the dropout rate is set to 0.3, and the principle procedure of the channel attention module is shown in fig. 2.
3. Shallow convolutional neural network module
In this module, a shallow convolutional neural network structure is constructed to process the 3-D time-spectrum-space feature map. For the classification problem of small data, namely EEG data in the invention, a shallower network structure is more applicable, so that the decoding performance can be improved, and the calculation time can be shortened. Thus, the invention constructs a shallow convolutional neural network with only two convolutional neural network layers and a small kernel size, and the specific description of the network structure and parameters is shown in Table 1. When the feature map is input into the network, first a convolution layer with a time kernel size of 5 and 4 convolution filters are used to receive the time information, then the depth convolution layer concentration electrodes with a space kernel size of 20 and 8 convolution filters are used, and the single feature map across the feature channels is summarized, and then an average pooling layer (kernel size of 16) is applied to reduce the time feature dimension. It is worth noting that in the proposed network, a larger pooling kernel is used instead of a larger convolution kernel to reduce the time dimension, since a larger convolution kernel would destroy the low frequency information of the electroencephalogram signal, which mainly encodes the performed actions. For all convolution layers, batch normalization was added, after the deep convolution layer, an Exponential Linear Unit (ELU) was used as an activation function, and a dropout layer with a probability of 50% was used to prevent overfitting. Finally, flattening the output feature map into one dimension and connecting to the full connection layer.
TABLE 1
The present application is not limited to the above-mentioned embodiments, and any changes or substitutions that can be easily understood by those skilled in the art within the technical scope of the present application are intended to be included in the scope of the present application. Therefore, the protection scope of the present application should be subject to the protection scope of the claims.