Disclosure of Invention
The present invention provides a method, apparatus, device and medium for decoding EEG signals based on an aggregate perceptually enhanced convolutional transducer network to improve at least one of the above-mentioned technical problems.
In a first aspect, the present invention provides an EEG signal decoding method based on an aggregate perception enhanced convolutional transducer network, comprising steps S1 to S9.
S1, acquiring off-line data of EEG signals.
S2, extracting multi-scale shallow layer local features through a space-time convolution module according to the offline data.
And S3, performing interactive reinforcement learning on the multi-scale shallow local features through a self-adaptive feature recalibration module to obtain a plurality of recalibrated features.
And S4, fusing the multiple recalibrated features to obtain a first fused feature.
S5, inputting the first fusion feature into a position perception enhancement module, extracting deep fine granularity features through parallel enhancement convolution, and carrying out self-adaptive coding on the deep fine granularity features to obtain position perception enhancement features.
And S6, extracting long-range dependence and local association through a sparse information aggregation transducer module to obtain global refinement features.
S7, inputting the global refined features into a classifier to complete training of the model, and obtaining a pre-training model capable of being used for real-time prediction.
S8, acquiring real-time data of the EEG signals.
S9, inputting the real-time data into the pre-training model to obtain a decoding result of the real-time EEG signal.
As a preferred aspect of the present invention, step S1 specifically includes steps S11 to S15.
S11, acquiring stored EEG signal data, eliminating 50Hz power frequency and environmental noise interference, and screening out the EEG signals with the key frequency range of 0.5-30 Hz.
S12, carrying out segmentation cutting processing on the continuous EEG signals, removing rest segment data, and only keeping MI segment data.
S13, taking the average value of rest segment data 300 milliseconds before MI segment data as a baseline, and carrying out independent baseline correction on each MI segment data so as to eliminate the potential influence of baseline deviation.
And S14, removing artifacts confused with the EEG signals after normalizing the data, and obtaining the preprocessed EEG data.
S15, dividing and reconstructing the training data according to label division, additionally adding random value simulation batch data which follow Gaussian distribution, mixing the newly generated data with the original batch data in a disordered way, obtaining EEG data with enhanced data, and then inputting the EEG data into a model together for learning features.
As a preferred aspect of the present invention, step S8 specifically includes steps S81 to S83.
S81, storing the acquired real-time EEG data stream through a buffer area.
S82, extracting real-time fragment data from the buffer area through a sliding time window.
S83, preprocessing the real-time fragment data, wherein the preprocessing step of the real-time data is less than the preprocessing step of the offline data, the data enhancement processing is omitted, and the baseline correction operation is based on the whole fragment mean value.
As a preferred aspect of the invention, the space-time convolution module is provided with three branches. Each branch is sequentially provided with a time convolution layer, a space convolution layer, a batch normalization layer, an ELU activation function layer and an average pooling layer. The temporal convolution layer of the first leg is 16 (1,) Is a convolution kernel of (a). The time convolution layer of the second leg is 16 (1,) Is a convolution kernel of (a). The time convolution layer of the third leg is 16 (1,) Is a convolution kernel of (a). Fs is the sampling frequency. The spatial convolution layer uses 32 convolution kernels of size (C, 1), C being the number of channels. Wherein, during model training, a random inactivation layer is also arranged.
As a preferred aspect of the present invention, step S2 specifically includes steps S21 to S22.
S21, inputting the EEG signals into three branches of a space-time convolution module respectively to acquire multi-scale shallow local features.
S22, three branches of the space-time convolution module respectively pass through the time convolution layer and the space convolution layer in sequence to carry out convolution operation to extract characteristics. And inputting the features extracted by the convolution operation into a batch normalization layer, an ELU activation function layer and an average pooling layer which are sequentially connected.
As a preferred aspect of the present invention, step S3 specifically includes step S31 and step S32.
S31, feature superposition is carried out on the features extracted by each branch of the space-time convolution module by an interactive connection structure, and a plurality of interactive superposition features with the same number of branches as the space-time convolution module are obtained.
S32, calculating the characteristics of the multiple interactive superposition characteristics along the channel and the space through a convolution attention mechanism, and obtaining multiple characteristics after recalibration.
As a preferred aspect of the present invention, step S5 specifically includes steps S51 to S54.
S51, according to the first fusion features, features are further extracted in the time direction through 1x3 convolution and 1x7 convolution which are arranged in parallel, and the extracted features are subjected to batch normalization processing and then feature fusion.
S52, processing the features after feature fusion by using an ELU activation function and an average pooling layer. Wherein, the training stage is also provided with a random inactivation layer.
And S53, adding the features processed by the average pooling layer and the first fusion features through jump connection.
S54, performing dimension conversion on the added features, and then performing adaptive coding to obtain the position perception enhancement features.
As a preferred aspect of the present invention, step S6 specifically includes steps S61 to S64.
S61, performing block division on the position perception enhancement features through a sliding window to obtain a plurality of blocks.
S62, respectively averaging the blocks, and carrying out average operation on continuous Token in the blocks to aggregate the continuous Token into a single block to represent the single block, so as to obtain an aggregate block.
S63, calculating the attention score of each aggregation block through the highest attention mechanism, screening k important blocks, recovering the original Token of the important blocks, and obtaining the highest attention block.
S64, combining the aggregation block and the highest attention block through a gating mechanism to acquire the global refinement feature.
As a preferred aspect of the present invention, step S7 specifically includes steps S71 to S73.
And S71, performing flattening and dimension reduction operation on the features by using a flattening layer, and converting the multidimensional features into one-dimensional features for feature integration.
And S72, processing the flattened features through two full-connection layers. Wherein, the training stage inserts a random inactivating layer in the two fully connected layers.
S73, calculating the prediction probability of each category through a softmax function according to the characteristics processed by the full connection layer, so as to complete training of the model and obtain a pre-training model for real-time prediction.
As a preferred aspect of the present invention, step S9 specifically includes steps S91 to S93.
S91, when predicting for the first time, after the buffer area accumulates one complete window data, predicting by using a pre-training model.
S92, data quantity of a sliding distance is collected every time, and the data in the window is predicted by using a pre-training model.
S93, mapping the prediction result of each time into a control instruction.
In a second aspect, the invention provides an EEG signal decoding apparatus based on an aggregate perception enhanced convolution transducer network, comprising a signal acquisition module, a space-time convolution module, a recalibration module, a fusion module, a position perception module, an information aggregation module and a classifier module.
And the off-line data acquisition module is used for acquiring off-line data of the EEG signals.
And the space-time convolution module is used for extracting multi-scale shallow local features through the space-time convolution module according to the offline data.
And the recalibration module is used for carrying out interactive sharing and strengthening key characteristics on the multi-scale shallow local characteristics through the self-adaptive characteristic recalibration module to obtain a plurality of recalibrated characteristics.
And the fusion module is used for fusing the multiple recalibrated features to obtain a first fusion feature.
The position sensing module is used for inputting the first fusion feature into the position sensing enhancement module, extracting deep fine granularity features through parallel enhancement convolution, and carrying out self-adaptive coding on the deep fine granularity features to obtain the position sensing enhancement features.
And the information aggregation module is used for extracting long-range dependence and local association through the sparse information aggregation transducer module to acquire global refinement features.
And the classifier module is used for inputting the global refinement features into a classifier to complete training of the model, and a pre-training model which can be used for real-time prediction is obtained.
And the real-time data acquisition module is used for acquiring the real-time data of the EEG signals.
And the real-time decoding module is used for inputting the real-time data into the pre-training model and obtaining a decoding result of the real-time EEG signal.
In a third aspect, the invention provides an EEG signal decoding apparatus based on an aggregate perception enhanced convolutional transducer network, comprising a processor, a memory, and a computer program stored in said memory. The computer program is executable by the processor to implement an EEG signal decoding method based on an aggregate perception enhanced convolutional transducer network as described in any one of the first aspects.
In a fourth aspect, the present invention provides a computer readable storage medium, wherein the computer readable storage medium includes a stored computer program, and when the computer program is executed, controls a device in which the computer readable storage medium is located to execute an EEG signal decoding method based on an aggregate perception enhanced convolutional transducer network according to any one of the first aspects.
By adopting the technical scheme, the invention can obtain the following technical effects:
The EEG signal decoding method based on the aggregate perception enhanced convolution transducer network is excellent in EEG classification and identification tasks, and the partial defects of the traditional network in the aspect of feature extraction are effectively overcome. The network is a multi-scale feature interactive fusion network structure, and features are gradually refined in a layering mode. The space-time convolution and self-adaptive feature recalibration module captures rough granularity features for the network, refines the features through the position perception enhancement module, inputs position coding information and strengthens the internal connection between the features. The sparse information aggregation transducer module strengthens long-range dependence and local association, balances local and global correlation, and improves the depth and breadth of EEG data analysis in an omnibearing way.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1 to 8, a first embodiment of the present invention provides an EEG signal decoding method (APCformer) based on an aggregate-perceptually enhanced convolutional transducer network, which can be performed by an EEG signal decoding apparatus (hereinafter referred to as a decoding apparatus). In particular by one or more processors in the decoding device. It is understood that the decoding device may be an electronic device with computing capabilities, such as a portable notebook computer, a desktop computer, a server, a smart phone, or a tablet computer.
EEG represents brain waves, and EEG signals are brain electrical signals. In a deep learning method for exploring EEG data analysis, challenges are faced on how to effectively decode spatiotemporal dynamic information, so that an analysis model needs to have accurate multi-domain feature joint learning capability and strong sequence processing capability. To this end, the present invention proposes a new EEG decoding network APCformer. The overall architecture of the network is shown in fig. 2. The network mainly comprises five parts, namely a space-time convolution module, an adaptive feature recalibration module (abbreviated as AFR), a position perception enhancement module (abbreviated as PAE), a sparse information aggregation transducer module (abbreviated as SAT) and a classifier module.
A brief description of the EEG signal decoding method based on the aggregate perception enhanced convolution Transformer network of the present invention is that the EEG data after preprocessing is set as, wherein,For real numbers, C denotes the number of EEG channels, T denotes the number of sampling points along the time dimension. The method comprises the steps of inputting a space-time convolution module to extract multi-scale shallow local features according to a batch size with a batch sample number of N=32, focusing key space-time features through an AFR module, inputting the fused features to a PAE module to extract deep fine granularity features, carrying out self-adaptive coding on the features, further extracting long-range dependence and local association through a SAT module to refine the features, realizing effective aggregation of the local features and global features, and finally outputting classification results through a classifier. Training the model is completed through the previous steps, and a pre-training model which can be used for real-time prediction is obtained. And then the sliding window is used for extracting the real-time data stream to conduct real-time preprocessing, the data is predicted through the pre-training model, and a final prediction result is obtained and mapped into an instruction.
Table 1 APCformer network architecture parameters
In table 1, N is the number of batch samples. C is the number of channels. T is the number of sampling points. F1 and F2 denote the filter numbers of the corresponding convolution layers, respectively. F1=16,F2 =32. Fs is the sampling frequency. axis is the dimension to which the normalization operation applies. ELU represents an activation function for convolution. Softmax represents the activation function used for classification.
S1, acquiring off-line data of EEG signals.
In this embodiment, step S1 is preprocessing of offline data prior to the training phase. The method specifically comprises the steps of data filtering, segmentation cutting, independent baseline correction, data normalization, artifact removal and data enhancement. Wherein the data enhancement includes segmentation reconstruction and addition of gaussian noise.
Specifically, interference of 50Hz power frequency and environmental noise is eliminated first, and EEG signals with a key frequency range of 0.5-30 Hz are screened out to serve as a subsequent analysis basis. The continuous EEG signal is then segmented into segments, leaving only MI segment data closely related to the task, and the segments are spliced. Where MI represents motor imagery (Motor Imagery), and MI segment data is motor imagery segment data obtained after EEG signal segmentation cutting.
In order to further improve the signal quality, according to the time characteristic of human response to stimulus, the average value of the rest period data 300 milliseconds before the beginning of each MI segment is selected as a baseline, and then independent baseline correction is carried out on each MI segment so as to eliminate the potential influence of baseline deviation. Artifacts that are often confused in EEG signals are then removed after the Z-score normalization data is used.
The invention carries out data enhancement processing on training data because small sample electroencephalogram data easily causes model overfitting, specifically, samples are randomly selected according to categories in each batch of training process, the samples are divided into a plurality of sections on average, and a section is randomly selected from each sample to be combined into new data, but the time sequence principle is required to be followed when the samples are combined. And then, adding random values following Gaussian distribution into the new data to realize data enhancement, so that the robustness and generalization capability of the model are improved. And finally, adding new data into the original training data, and completing training together by scrambling the sequence.
Data enhancement is applied before the model trains the study on each batch of data. The batch size of the model is set to 32 and after introducing the data enhancement technique, the system will enhance one batch of data input before the model begins to learn the data characteristics. The system can divide and reconstruct batch data according to the labels of the data, and generates simulation data by adding Gaussian noise. These newly generated data will mix with the original batch data and randomly shuffle the order, doubling the amount of data. Finally, the processed batch data are input together into the model for learning features.
S2, extracting multi-scale shallow layer local features through a space-time convolution module according to the offline data.
In order to ensure that the model is not limited to a single convolution sensing range any more, but can flexibly process the characteristic information of different scales and layers, the invention adopts multi-scale space-time convolution to extract the local characteristics of EEG data, and enhances the characteristic identification capability of the model on the premise of not increasing the network depth.
The first core component of the network structure of the EEG signal decoding method based on the aggregate perception enhancement convolution transducer network is a space-time convolution module based on a CNN architecture. As shown in fig. 2, the space-time convolution module is provided with three branches. The structure of each branch is shown in fig. 4, and a time convolution layer, a space convolution layer, a batch normalization layer, an ELU activation function layer, an average pooling layer and a random inactivation (Dropout) layer are sequentially arranged.
Specifically, in the space-time convolution module, the structural composition of each branch is approximately the same, and double-layer convolution is adopted as a shallow layer feature encoder. However, the time convolution of the first layer differs in design by providing 16 large convolution kernels of different scales, specifically (1,)、(1,) And (1) the steps of (a),) The purpose is to obtain more abundant time dimension characteristics through convolution kernels of different receptive fields. The spatial convolution of the second layer uses 32 convolution kernels of size (C, 1) that are consistent with the number of device sampling channels to facilitate processing the spatial features of the data. After the convolution operation is completed, batch normalization is accessed to alleviate the covariate offset problem. The nonlinear expression capacity of the model is then enhanced by the ELU activation function. The feature dimension is then reduced by an average pooling layer of size (1, 75), reducing the redundancy features. During model training, a random inactivation (Dropout) technology is applied to accelerate convergence so as to reduce the risk of overfitting, and finally, more representative shallow space-time characteristics are obtained.
And S3, carrying out interactive sharing and strengthening of key features on the multi-scale shallow local features through a self-adaptive feature recalibration module to obtain a plurality of recalibrated features.
Step S3 specifically includes step S31 and step S32.
S31, feature superposition is carried out on the features extracted by each branch of the space-time convolution module by an interactive connection structure, and a plurality of interactive superposition features with the same number of branches as the space-time convolution module are obtained.
S32, calculating the characteristics of the multiple interactive superposition characteristics along the channel and the space through a convolution attention mechanism, and obtaining multiple characteristics after recalibration.
Specifically, the invention creates a self-adaptive feature recalibration module (AFR, adaptive feature recalibration for short), and re-recognizes the space-time features learned by calibration so as to emphasize the key part features, improve the sensitivity of the model to important information and improve the accuracy of the small data set decoding task.
The self-adaptive characteristic recalibration module firstly realizes characteristic information sharing among branches through an interactive connection structure. After each branch feature interaction, a convolution attention mechanism is introduced to strengthen the learning key features with excellent accurate focusing capability.
And S4, effectively fusing the characteristics of each branch through the self-adaptive characteristic recalibration module to obtain more comprehensive characteristic expression.
The fusion model is as follows:
。
in the formula,Is a fusion characteristic of a branched network,Representing characteristic splicing operation,Representing the characteristics of the first, second and third branch outputs, respectively.
S5, inputting the first fusion feature into a Position perception enhancement module (PAE, position AWARE ENHANCEMENT) to extract deep fine granularity features through parallel enhancement convolution, and carrying out self-adaptive coding on the deep fine granularity features to obtain Position perception enhancement features.
In the prior art, the problem of losing fine-grained information in the space and time dimensions exists. The invention provides a position perception enhancement module (PAE), which can give consideration to fine granularity and coarse granularity information, enhance APCformer learning ability of a network to characteristics, and enable subsequent SAT to optimize global and local characteristics based on richer and more relevant characteristics. The position perception enhancement module designs the structure of a parallel small convolution kernel, and is beneficial to capturing the local characteristics of fine granularity by reducing the convolution kernel receptive field. The structure of which is shown in fig. 5. The first fusion feature is transmitted to the PAE module, and deep fine granularity EEG features are extracted.
The location awareness enhancement module is shown in fig. 5. After the feature is subjected to parallel enhanced convolution, the feature is subjected to dimension transformation, and position coding information is added to the sequence.
Specifically, step S5 specifically includes steps S51 to S54.
S51, according to the first fusion features, features are further extracted in the time direction through 1x3 convolution and 1x7 convolution which are arranged in parallel, and the extracted features are subjected to batch normalization processing and then feature fusion.
S52, processing the features after feature fusion by using an ELU activation function and an average pooling layer. Wherein, the training stage is also provided with a random inactivation layer.
And S53, adding the features processed by the average pooling layer and the first fusion features through jump connection.
S54, performing dimension conversion on the added features, and then performing adaptive coding to obtain the position perception enhancement features.
Specifically, the input first fusion feature further extracts features in the time direction through 32 convolution kernels with the sizes of (1, 3) and (1, 7) which are arranged in parallel, normalizes feature batches obtained in the two convolutions, and then performs feature fusion to enrich the extracted information.
The model was then optimized by an average pooling layer of size (1, 3) and a random inactivation (Dropout) layer using the ELU activation function as the activation layer. To address the problem of insufficient adaptability of EEG signals in deep networks, the addition of a jump connection can reduce the risk of model overfitting to some extent and improve the stability of the model. And then, carrying out linear transformation on the feature dimensions, adding a leachable position encoder (PE, positional Encoding), setting a trainable matrix with the same size as the input feature dimensions, randomly initializing the matrix, continuously updating parameters of the position encoder along with network training, and independently mining association modes among different positions, so that the network can sense the position information of each feature.
S6, carrying out refinement treatment on the position perception enhancement features through a sparse information aggregation transducer module, further extracting long-range dependence and local association, realizing effective aggregation of local features and global features, and obtaining global refinement features. Preferably, step S6 specifically includes steps S61 to S64.
It will be appreciated that EEG signals tend to have cross-period correlations in time series, making it difficult to effectively capture long dependencies in the EEG signal, which is critical for accurate decoding. While the transducer exhibits excellent properties when processing sequence data, each element in the sequence can focus on all other elements, thereby capturing long dependencies accurately.
Convolution, while having unique advantages, focuses primarily on local information, with obvious shortcomings in dealing with global dependencies. Although a simple fusion transducer architecture can acquire long-range dependencies, local fine-grained features in the time sequence are ignored.
Therefore, the invention designs a sparse information aggregation transducer module (SAT, sparse Information Aggregation Transformer for short) for breaking the bottleneck that the traditional network is difficult to simultaneously consider global and local characteristics when processing EEG signals. Meanwhile, the thought of sparse attention is introduced, the efficiency of processing large-scale and high-dimension EEG data is improved, and the comprehensive capture of the intra-sequence relationship is realized efficiently. The complex relation and dependence among the features are fully mined through SAT, and the performance of the model is further improved.
The SAT structure shown in FIG. 6 includes a sliding window, aggregate attention, highest attention, and gating mechanisms.
S61, performing block division on the position perception enhancement features through a sliding window.
In particular, the method comprises the steps of,For inputting characteristic sequences of SAT、AndRespectively a first feature, a second feature and a first featureAnd features.
The sliding window performs block division on the input sequence, and then processes the blocks, and each block can be regarded as a local area, and finally integrates the local information into global features through an attention mechanism.
In particular, the invention adopts a fixed length ofDividing an input feature sequence with a length of n into m blocks, and setting a sliding interval between adjacent blocks asCan be divided intoNon-overlapping blocks. Wherein,. Thus, each block contains a succession of blocksA total of Token numbers will be generatedThe number of tokens to be used in the process of the Token,Greater than n.
Because the invention sets a sliding interval instead of directly dividing the whole sequence into a plurality of blocks, the calculation amount is increased compared with the method without dividing the blocks, but the method can supplement the connection between the blocks to a certain extent, and the information overlapping degree between the blocks can be adjusted by controlling the sliding interval, namely the sparsity ratio is controlledThe global context association is enhanced while maintaining local feature independence, making the overall network more flexible.
S62, respectively averaging the blocks, and carrying out average operation on continuous Token in the blocks to aggregate into a single block representation to obtain an aggregation characteristic.
In the initial stage of aggregation attention, the concept of sparse attention is introduced based on the consideration of global view, and the global mode of the sequence is quickly captured. It is not necessary that each position in the sequence is attentively calculated with all other positions, but divided m blocks are respectively averaged to make continuous blocksThe Token performs average operation and is aggregated into a single block representation, so that feature capture of the global mode is realized.
Sequentially arranging Token in parallel in all blocks, wherein the first Token position information of any block can be expressed asAny Token location information within a block can be represented as。AndThe key value pairs respectively representing the ith Token of the jth block are respectively corresponding to the block keys and the block value vectors respectively as follows,。
,The definition is as follows:
。
。
in the formula,Represent the firstEach block has a value ofRepresenting intra-block firstThe number of Token is given as。
Each query in the sequenceWith all aggregated block bondsPerforming dot product operation to obtain attention fraction, and then passing through block valueThe weighted sum is obtained. Wherein,In order to query the weights of the data,For the weight of the block key,Is the weight of the block value.
The output of the aggregate attention path may be expressed as:
。
in the formula,Represents the position of any Token,AndIs a block keySum block valueA collection of (C),Representing the transposition,Representing the feature vector dimension.
Global features of the whole sequence can be obtained by a method of aggregating blocks, and the calculation complexity is changed from originalTo reduce to,And the processing efficiency of the long sequence is obviously improved.
S63, calculating the attention score of each aggregation block through the highest attention mechanism, screening k important blocks, recovering the original Token of the important blocks, and obtaining the highest attention block.
The aggregate attention based on the blocks can reduce the computational complexity and realize efficient long-distance dependent modeling, but the mere dependence on the block representation will have difficulty in recovering fine-grained information of the original sequence, which operation inevitably causes loss of characteristic information. In order to balance efficiency and accuracy, the invention screens and selects the block information, and reserves key and representative local characteristics, thereby realizing efficient long-distance dependent modeling without losing accuracy. Therefore, the invention integrates the highest attention mechanism on the basis of the aggregate attention, and the blocks with obvious contribution are selected to construct a hierarchical attention structure.
Specifically, the invention evaluates the importance of m blocks divided in the aggregate attention, and the evaluation standard is the attention score obtained by the block, so that the new calculation cost can be avoided without recalculation by taking the evaluation standard as the standard. Comparing the attention scores of all the blocks, selecting k positions with the largest score, marking as important, marking the rest positions as unimportant, obtaining the maximum k indexes, and generating an attention-dependent score matrixMask matrix of (a)Each row corresponds to a query, each column corresponds to a block, 1 is filled in the k index positions selected, and the rest positions are filled inThe weight of the non-Top-k position is made to approach 0.
Attention score for important blocksExpressed as:
。
Where k represents the first k blocks of highest importance.
And recovering s original Token in each block by k important blocks screened by the mask matrix M, and rearranging according to the position sequence in the original sequence. Relative to the firstThe Token positions corresponding to the important blocks are. After sequentially expanding the original Token of all k blocks, the total Token number is. At this point, the original key and value pair is restored toAnd。
The recovered key-value pairs will be associated with the query vectorAttention calculations were performed, expressed as:
。
in the formula,AndIs thatBack keySum valueA collection of (C),A recovered key which is the ith Token in the jth important block,Is the recovered value of the ith Token in the jth important block.
The highest attention is focused on the important blocks, the strong correlation features are selected to finish feature refining, and the feature refining is only performed on a few Token in the important blocks, so that the calculated amount of the step is far smaller than that of the original input sequence. At the time of holdingOn the basis of the computational complexity, the complexity is further optimized to be the same as that of the k < < m through sparse selectionComputational redundancy can be greatly reduced while retaining key fine-grained features.
S64, combining the aggregation block and the highest attention block through a gating mechanism to acquire the global refinement feature.
In particular, the present invention creates a learnable gating mechanism on the path of aggregate attention and highest attention, which are combined to dynamically adjust the information flow through the information gate of the sigmoid activation function. This mechanism is capable of learning and identifying the importance of each path in the data, mapping values in the range of 0 to 1, adaptively determining throughput of information.
The output of the final SAT is made up of the results of aggregate attention and highest attention, expressed as:
。
。
in the formula,To output aggregate characteristics,Is a sigmoid activation function,An output representing the aggregate attention is presented,Is the output of the highest attention.
Local and global perception of EEG information is achieved by calculation of their weights. The hierarchical attention mechanism balances computational efficiency and feature integrity by both aggregating attention capture global patterns and restoring key local details by highest attention. The calculation amount and the feature granularity of the model can be flexibly controlled by adjusting the sparsity ratio r and the selection coefficient k, and the method is particularly suitable for processing EEG signals with high-dimensional time sequences.
S7, inputting the global refined features into a classifier to complete training of the model, and obtaining a pre-training model capable of being used for real-time prediction.
And the classifier module receives the features from SAT processing, performs flattening and dimension reduction operation on the features by using a flattening layer, and converts the multi-dimensional features into one-dimensional features for feature integration. And then processing through two full-connection layers, inserting a random inactivation layer into the full-connection layers to reduce the risk of overfitting, and calculating the prediction probability of each category by the features through a softmax function so as to complete the training of the model and obtain a pre-training model for real-time prediction.
The model is trained through steps S1 to S7, and a pre-trained model which can be used for real-time prediction is obtained. In this embodiment, a classifier is trained using a cross entropy loss function that is able to quantify the difference between the model predictive probability distribution and the true labels and minimize this difference through an optimization process.
S8, acquiring real-time data of the EEG signals. In this embodiment, preprocessing of real-time data is required. In this embodiment, step S8 is preprocessing of real-time data prior to the prediction phase. Step S8 specifically includes steps S81 to S83.
S81, storing the acquired real-time EEG data stream through a buffer area.
S82, extracting real-time fragment data from the buffer area through a sliding time window.
S83, preprocessing the real-time fragment data, wherein the preprocessing step of the real-time data is less than the preprocessing step of the offline data, the data enhancement processing is omitted, and the baseline correction operation is based on the whole fragment mean value.
Specifically, a real-time EEG signal is obtained through a real-time data stream, and data is extracted through a sliding time window for preprocessing, so that a preprocessed EEG signal is obtained. The aim of the EEG signal real-time processing is to realize the accurate control of external equipment through a brain-computer interface. Unlike pre-processing in the off-line training phase, the real-time process relies entirely on active imagination of the brain, without visual or audible cues, and the whole process continues to perform imagination excitation without rest segment data. Therefore, in the real-time preprocessing, the segmentation operation of the excitation interval is not needed, and the baseline correction is also adjusted from the original baseline based on the average value of the rest segment data of 300ms before the segment to the baseline based on the whole average value of the segment. In addition, the real-time preprocessing stage does not involve data enhancement operations, and the sequence of the rest steps is consistent with that of the preprocessing step of the offline training stage.
In particular, as shown in fig. 7 and 8, the present invention provides a buffer for the real-time data stream for storing all of the acquired real-time data. The sampling frequency of the device is 256Hz, a sliding time window with 1024 sampling points is arranged in the buffer zone, and the sliding distance of the window is 256 sampling points. And extracting fragment data from the buffer area in a sliding time window mode for real-time preprocessing. The buffer area ensures the integrity and continuity of window data in each prediction, and provides connection support for the access of subsequent data.
In order to solve the influence caused by equipment and data transmission delay, the fixed sampling point number related to the sampling frequency is adopted as a window positioning mark instead of taking absolute time as a dividing basis. The method for selecting the data by adopting the sliding time window has the advantages that firstly, the fixed sampling point number is used as a window division basis, so that the prediction error caused by equipment delay or time drift can be effectively avoided. Secondly, the design of the sliding time window can fully utilize the continuity of data, improve the data utilization rate and reduce the system delay. Finally, the gradual updating mechanism of the sliding time window can smooth the prediction result, and the real-time performance and stability of the system are enhanced, so that the accurate control of external equipment is ensured.
S9, inputting the data subjected to the real-time preprocessing into the pre-training model, and obtaining a decoding result of the real-time EEG signal. Preferably, step S9 specifically includes steps S91 to S93.
S91, when predicting for the first time, after the buffer area accumulates one complete window data, predicting by using a pre-training model.
S92, data quantity of a sliding distance is collected every time, and the data in the window is predicted by using a pre-training model.
S93, mapping the prediction result of each time into a control instruction in real time.
Specifically, as shown in fig. 7 and 8, the sliding time window is set in the window described in step S8, when a complete window data is accumulated in the buffer area during the first prediction, the system performs real-time preprocessing on the window data, and predicts the window data through the pre-training model obtained in the offline training stage, where the window data for prediction is retained in the buffer area. And then, when the data quantity of a sliding distance is newly acquired, the sliding time window intercepts the new data, and combines the data reserved in the buffer area after the previous prediction to form new window data, the system immediately processes the new window data, and the pre-processed data in the window is predicted through a pre-training model. Finally, all the prediction results are mapped into control instructions immediately and sent to the target equipment.
The invention provides a novel EEG signal decoding network APCformer which is excellent in EEG classification and identification tasks, and effectively solves part of defects of the traditional network in the aspect of feature extraction. The network is a multi-scale feature interactive fusion network structure, and features are gradually refined in a layering mode. The space-time convolution and self-adaptive feature recalibration module captures rough granularity features for the network, refines the features through the position perception enhancement module, inputs position coding information and strengthens the internal connection between the features. The sparse information aggregation transducer module strengthens long-range dependence and local association, balances local and global correlation, and improves the depth and breadth of EEG data analysis in an omnibearing way.
The invention provides an EEG signal decoding device based on an aggregate perception enhanced convolution transducer network, which comprises a signal acquisition module, a space-time convolution module, a recalibration module, a fusion module, a position perception module, an information aggregation module and a classifier module.
And the off-line data acquisition module is used for acquiring off-line data of the EEG signals.
And the space-time convolution module is used for extracting multi-scale shallow local features through the space-time convolution module according to the offline data.
And the recalibration module is used for carrying out interactive sharing and strengthening key characteristics on the multi-scale shallow local characteristics through the self-adaptive characteristic recalibration module to obtain a plurality of recalibrated characteristics.
And the fusion module is used for fusing the multiple recalibrated features to obtain a first fusion feature.
The position sensing module is used for inputting the first fusion feature into the position sensing enhancement module, extracting deep fine granularity features through parallel enhancement convolution, and carrying out self-adaptive coding on the deep fine granularity features to obtain the position sensing enhancement features.
And the information aggregation module is used for extracting long-range dependence and local association through the sparse information aggregation transducer module to acquire global refinement features.
And the classifier module is used for inputting the global refinement features into a classifier to complete training of the model, and a pre-training model which can be used for real-time prediction is obtained.
The real-time data acquisition module is used for acquiring real-time data of the EEG signals;
And the real-time decoding module is used for inputting the real-time data into the pre-training model and obtaining a decoding result of the real-time EEG signal.
The third embodiment provides an EEG signal decoding device based on an aggregate perception enhanced convolution transducer network, which is characterized by comprising a processor, a memory and a computer program stored in the memory. The computer program is executable by the processor to implement an EEG signal decoding method based on an aggregate perceptually enhanced convolutional Transformer network as described in any of the embodiments.
The fourth embodiment of the present invention provides a computer readable storage medium, where the computer readable storage medium includes a stored computer program, where when the computer program runs, the device where the computer readable storage medium is controlled to execute an EEG signal decoding method based on an aggregate perception enhanced convolutional transducer network according to any one of the first embodiments of the present invention.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other manners. The apparatus and method embodiments described above are merely illustrative, for example, flow diagrams and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, functional modules in the embodiments of the present invention may be integrated together to form a single part, or each module may exist alone, or two or more modules may be integrated to form a single part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, an electronic device, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. The storage medium includes various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory, a random access memory, a magnetic disk or an optical disk. It should be noted that, in the present invention, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.
The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be understood that the term "and/or" as used in the present invention is merely an association relation describing the associated object, and means that three kinds of relations may exist, for example, a and/or B, and that three kinds of cases where a exists alone, while a and B exist alone, exist alone. In the present invention, the character "/" generally indicates that the front and rear related objects are an or relationship.
The term "if" as used herein may be interpreted as "at" or "when" depending on the context "or" in response to a determination "or" in response to a detection. Similarly, the phrase "if determined" or "if detected (stated condition or event)" may be interpreted as "when determined" or "in response to determination" or "when detected (stated condition or event)" or "in response to detection (stated condition or event), depending on the context.
References to "first\second" in the embodiments are merely to distinguish similar objects and do not represent a particular ordering for the objects, it being understood that "first\second" may interchange a particular order or precedence where allowed. It is to be understood that the "first\second" distinguishing aspects may be interchanged where appropriate, such that the embodiments described herein may be implemented in sequences other than those illustrated or described herein.
The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.