CN120067843A

Movatterモバイル変換

Info

Publication number: CN120067843A
Application number: CN202510550978.3A
Authority: CN
Inventors: 黄江茵; 邹家祥
Original assignee: Xiamen University of Technology
Current assignee: Xiamen University of Technology
Priority date: 2025-04-29
Filing date: 2025-04-29
Publication date: 2025-05-30
Anticipated expiration: 2045-04-29
Also published as: CN120067843B

Abstract

EEG signal decoding method, device, equipment and medium based on aggregate perception enhanced convolution transducer network relate to the technical field of EEG signal decoding. The EEG signal decoding method comprises S1, acquiring offline data. S2, extracting multi-scale shallow local features according to the EEG data. And S3, performing interactive reinforcement learning on the multi-scale shallow local features to obtain a plurality of recalibrated features. And S4, fusing the multiple recalibrated features to obtain a first fused feature. S5, inputting the first fusion feature into a position perception enhancement module to acquire the position perception enhancement feature. S6, the refined characteristic long-range dependence is associated with the local part, and the comprehensive characteristic expression is obtained. S7, inputting the global refined features into a classifier to complete training of the model, and obtaining a pre-training model capable of being used for real-time prediction. S8, acquiring real-time data. S9, inputting the real-time data into a pre-training model to obtain a decoding result.

Description

EEG signal decoding method, device, equipment and medium based on aggregate perception enhanced convolution transducer network

Technical Field

The invention relates to the technical field of EEG signal decoding, in particular to an EEG signal decoding method, device, equipment and medium based on an aggregate perception enhanced convolution transducer network.

Background

The rapid development of brain-computer interface technology puts higher demands on the accuracy and efficiency of brain-electrical signal decoding. As a typical non-stationary and nonlinear physiological signal, the electroencephalogram signal has the characteristics of low signal-to-noise ratio, large individual difference, complex space-time dynamic change and the like, and how to effectively extract the space-time dynamic characteristics becomes a key for improving the decoding precision. Especially in motor imagery tasks, the feature expression of the electroencephalogram signals needs to be compatible with the rhythmic change of the time dimension and the cortical activation mode of the space dimension, which provides a serious challenge for the space-time modeling capability of the feature extraction method.

The current electroencephalogram signal decoding method mainly adopts a technical route combining traditional feature engineering and deep learning. The traditional method relies on time-frequency domain feature extraction of manual design, such as wavelet transformation, power spectrum analysis and the like, and has the limitations of high feature dimension and poor generalization capability. The deep learning model based on the convolutional neural network captures space-time characteristics through a local receptive field, but a convolutional kernel with a fixed scale is difficult to adapt to the multi-scale characteristics of the electroencephalogram signals, so that fine-grained characteristics are lost. Some researches try to introduce a multi-scale convolution structure to enhance the feature expression capability, but the interaction mechanism among the scale features is absent, so that effective cross-scale correlation is difficult to establish. In addition, the method which simply depends on the convolutional neural network has a bottleneck in the aspect of long time sequence dependence modeling, and the model adopting a transducer architecture can capture the global relation, but neglects the fine depiction of local features, so that important detail information is lost.

The prior art still has obvious defects in the aspect of multi-scale space-time characteristic joint modeling. Traditional convolutional neural networks are limited by local receptive fields, and can not effectively capture the cross-channel and cross-time dynamic correlation in brain electrical signals. Although the multi-scale convolution structure can extract features with different granularities, each branch feature lacks an effective information interaction and recalibration mechanism, and the enhancement expression of key features is difficult to realize. Meanwhile, the global modeling method based on the attention mechanism has contradiction between computational complexity and local feature preservation, and overfocusing on the global dependence weakens focusing on the local features, and simply emphasizing the local features can lead to the lack of context association. The problems of insufficient space-time characteristic capture, fine granularity information loss and unbalanced global local relation severely restrict the further improvement of the electroencephalogram signal decoding precision.

Disclosure of Invention

The present invention provides a method, apparatus, device and medium for decoding EEG signals based on an aggregate perceptually enhanced convolutional transducer network to improve at least one of the above-mentioned technical problems.

In a first aspect, the present invention provides an EEG signal decoding method based on an aggregate perception enhanced convolutional transducer network, comprising steps S1 to S9.

S1, acquiring off-line data of EEG signals.

S2, extracting multi-scale shallow layer local features through a space-time convolution module according to the offline data.

And S3, performing interactive reinforcement learning on the multi-scale shallow local features through a self-adaptive feature recalibration module to obtain a plurality of recalibrated features.

And S4, fusing the multiple recalibrated features to obtain a first fused feature.

S5, inputting the first fusion feature into a position perception enhancement module, extracting deep fine granularity features through parallel enhancement convolution, and carrying out self-adaptive coding on the deep fine granularity features to obtain position perception enhancement features.

And S6, extracting long-range dependence and local association through a sparse information aggregation transducer module to obtain global refinement features.

S7, inputting the global refined features into a classifier to complete training of the model, and obtaining a pre-training model capable of being used for real-time prediction.

S8, acquiring real-time data of the EEG signals.

S9, inputting the real-time data into the pre-training model to obtain a decoding result of the real-time EEG signal.

As a preferred aspect of the present invention, step S1 specifically includes steps S11 to S15.

S11, acquiring stored EEG signal data, eliminating 50Hz power frequency and environmental noise interference, and screening out the EEG signals with the key frequency range of 0.5-30 Hz.

S12, carrying out segmentation cutting processing on the continuous EEG signals, removing rest segment data, and only keeping MI segment data.

S13, taking the average value of rest segment data 300 milliseconds before MI segment data as a baseline, and carrying out independent baseline correction on each MI segment data so as to eliminate the potential influence of baseline deviation.

And S14, removing artifacts confused with the EEG signals after normalizing the data, and obtaining the preprocessed EEG data.

S15, dividing and reconstructing the training data according to label division, additionally adding random value simulation batch data which follow Gaussian distribution, mixing the newly generated data with the original batch data in a disordered way, obtaining EEG data with enhanced data, and then inputting the EEG data into a model together for learning features.

As a preferred aspect of the present invention, step S8 specifically includes steps S81 to S83.

S81, storing the acquired real-time EEG data stream through a buffer area.

S82, extracting real-time fragment data from the buffer area through a sliding time window.

S83, preprocessing the real-time fragment data, wherein the preprocessing step of the real-time data is less than the preprocessing step of the offline data, the data enhancement processing is omitted, and the baseline correction operation is based on the whole fragment mean value.

As a preferred aspect of the invention, the space-time convolution module is provided with three branches. Each branch is sequentially provided with a time convolution layer, a space convolution layer, a batch normalization layer, an ELU activation function layer and an average pooling layer. The temporal convolution layer of the first leg is 16 (1,) Is a convolution kernel of (a). The time convolution layer of the second leg is 16 (1,) Is a convolution kernel of (a). The time convolution layer of the third leg is 16 (1,) Is a convolution kernel of (a). Fs is the sampling frequency. The spatial convolution layer uses 32 convolution kernels of size (C, 1), C being the number of channels. Wherein, during model training, a random inactivation layer is also arranged.

As a preferred aspect of the present invention, step S2 specifically includes steps S21 to S22.

S21, inputting the EEG signals into three branches of a space-time convolution module respectively to acquire multi-scale shallow local features.

S22, three branches of the space-time convolution module respectively pass through the time convolution layer and the space convolution layer in sequence to carry out convolution operation to extract characteristics. And inputting the features extracted by the convolution operation into a batch normalization layer, an ELU activation function layer and an average pooling layer which are sequentially connected.

As a preferred aspect of the present invention, step S3 specifically includes step S31 and step S32.

S31, feature superposition is carried out on the features extracted by each branch of the space-time convolution module by an interactive connection structure, and a plurality of interactive superposition features with the same number of branches as the space-time convolution module are obtained.

S32, calculating the characteristics of the multiple interactive superposition characteristics along the channel and the space through a convolution attention mechanism, and obtaining multiple characteristics after recalibration.

As a preferred aspect of the present invention, step S5 specifically includes steps S51 to S54.

S51, according to the first fusion features, features are further extracted in the time direction through 1x3 convolution and 1x7 convolution which are arranged in parallel, and the extracted features are subjected to batch normalization processing and then feature fusion.

S52, processing the features after feature fusion by using an ELU activation function and an average pooling layer. Wherein, the training stage is also provided with a random inactivation layer.

And S53, adding the features processed by the average pooling layer and the first fusion features through jump connection.

S54, performing dimension conversion on the added features, and then performing adaptive coding to obtain the position perception enhancement features.

As a preferred aspect of the present invention, step S6 specifically includes steps S61 to S64.

S61, performing block division on the position perception enhancement features through a sliding window to obtain a plurality of blocks.

S62, respectively averaging the blocks, and carrying out average operation on continuous Token in the blocks to aggregate the continuous Token into a single block to represent the single block, so as to obtain an aggregate block.

S63, calculating the attention score of each aggregation block through the highest attention mechanism, screening k important blocks, recovering the original Token of the important blocks, and obtaining the highest attention block.

S64, combining the aggregation block and the highest attention block through a gating mechanism to acquire the global refinement feature.

As a preferred aspect of the present invention, step S7 specifically includes steps S71 to S73.

And S71, performing flattening and dimension reduction operation on the features by using a flattening layer, and converting the multidimensional features into one-dimensional features for feature integration.

And S72, processing the flattened features through two full-connection layers. Wherein, the training stage inserts a random inactivating layer in the two fully connected layers.

S73, calculating the prediction probability of each category through a softmax function according to the characteristics processed by the full connection layer, so as to complete training of the model and obtain a pre-training model for real-time prediction.

As a preferred aspect of the present invention, step S9 specifically includes steps S91 to S93.

S91, when predicting for the first time, after the buffer area accumulates one complete window data, predicting by using a pre-training model.

S92, data quantity of a sliding distance is collected every time, and the data in the window is predicted by using a pre-training model.

S93, mapping the prediction result of each time into a control instruction.

In a second aspect, the invention provides an EEG signal decoding apparatus based on an aggregate perception enhanced convolution transducer network, comprising a signal acquisition module, a space-time convolution module, a recalibration module, a fusion module, a position perception module, an information aggregation module and a classifier module.

And the off-line data acquisition module is used for acquiring off-line data of the EEG signals.

And the space-time convolution module is used for extracting multi-scale shallow local features through the space-time convolution module according to the offline data.

And the recalibration module is used for carrying out interactive sharing and strengthening key characteristics on the multi-scale shallow local characteristics through the self-adaptive characteristic recalibration module to obtain a plurality of recalibrated characteristics.

And the fusion module is used for fusing the multiple recalibrated features to obtain a first fusion feature.

The position sensing module is used for inputting the first fusion feature into the position sensing enhancement module, extracting deep fine granularity features through parallel enhancement convolution, and carrying out self-adaptive coding on the deep fine granularity features to obtain the position sensing enhancement features.

And the information aggregation module is used for extracting long-range dependence and local association through the sparse information aggregation transducer module to acquire global refinement features.

And the classifier module is used for inputting the global refinement features into a classifier to complete training of the model, and a pre-training model which can be used for real-time prediction is obtained.

And the real-time data acquisition module is used for acquiring the real-time data of the EEG signals.

And the real-time decoding module is used for inputting the real-time data into the pre-training model and obtaining a decoding result of the real-time EEG signal.

In a third aspect, the invention provides an EEG signal decoding apparatus based on an aggregate perception enhanced convolutional transducer network, comprising a processor, a memory, and a computer program stored in said memory. The computer program is executable by the processor to implement an EEG signal decoding method based on an aggregate perception enhanced convolutional transducer network as described in any one of the first aspects.

In a fourth aspect, the present invention provides a computer readable storage medium, wherein the computer readable storage medium includes a stored computer program, and when the computer program is executed, controls a device in which the computer readable storage medium is located to execute an EEG signal decoding method based on an aggregate perception enhanced convolutional transducer network according to any one of the first aspects.

By adopting the technical scheme, the invention can obtain the following technical effects:

The EEG signal decoding method based on the aggregate perception enhanced convolution transducer network is excellent in EEG classification and identification tasks, and the partial defects of the traditional network in the aspect of feature extraction are effectively overcome. The network is a multi-scale feature interactive fusion network structure, and features are gradually refined in a layering mode. The space-time convolution and self-adaptive feature recalibration module captures rough granularity features for the network, refines the features through the position perception enhancement module, inputs position coding information and strengthens the internal connection between the features. The sparse information aggregation transducer module strengthens long-range dependence and local association, balances local and global correlation, and improves the depth and breadth of EEG data analysis in an omnibearing way.

Drawings

In order to more clearly illustrate the technical solutions of the present invention, the drawings that are required to be used in the embodiments of the present invention will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person having ordinary skill in the art.

Fig. 1 is a flow chart diagram of a method of EEG signal decoding.

Fig. 2 is a network overview architecture diagram of an EEG signal decoding method.

Fig. 3 is a flow chart of EEG signal preprocessing.

FIG. 4 is a block diagram of a single branch of a space-time convolution module.

Fig. 5 is a block diagram of a location awareness enhancement module.

Fig. 6 is a structural diagram of the sparse information aggregation Transformer.

Fig. 7 is a flow chart of real-time data processing.

Fig. 8 is a logic diagram of real-time data processing.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1 to 8, a first embodiment of the present invention provides an EEG signal decoding method (APCformer) based on an aggregate-perceptually enhanced convolutional transducer network, which can be performed by an EEG signal decoding apparatus (hereinafter referred to as a decoding apparatus). In particular by one or more processors in the decoding device. It is understood that the decoding device may be an electronic device with computing capabilities, such as a portable notebook computer, a desktop computer, a server, a smart phone, or a tablet computer.

EEG represents brain waves, and EEG signals are brain electrical signals. In a deep learning method for exploring EEG data analysis, challenges are faced on how to effectively decode spatiotemporal dynamic information, so that an analysis model needs to have accurate multi-domain feature joint learning capability and strong sequence processing capability. To this end, the present invention proposes a new EEG decoding network APCformer. The overall architecture of the network is shown in fig. 2. The network mainly comprises five parts, namely a space-time convolution module, an adaptive feature recalibration module (abbreviated as AFR), a position perception enhancement module (abbreviated as PAE), a sparse information aggregation transducer module (abbreviated as SAT) and a classifier module.

A brief description of the EEG signal decoding method based on the aggregate perception enhanced convolution Transformer network of the present invention is that the EEG data after preprocessing is set as, wherein,For real numbers, C denotes the number of EEG channels, T denotes the number of sampling points along the time dimension. The method comprises the steps of inputting a space-time convolution module to extract multi-scale shallow local features according to a batch size with a batch sample number of N=32, focusing key space-time features through an AFR module, inputting the fused features to a PAE module to extract deep fine granularity features, carrying out self-adaptive coding on the features, further extracting long-range dependence and local association through a SAT module to refine the features, realizing effective aggregation of the local features and global features, and finally outputting classification results through a classifier. Training the model is completed through the previous steps, and a pre-training model which can be used for real-time prediction is obtained. And then the sliding window is used for extracting the real-time data stream to conduct real-time preprocessing, the data is predicted through the pre-training model, and a final prediction result is obtained and mapped into an instruction.

Table 1 APCformer network architecture parameters

In table 1, N is the number of batch samples. C is the number of channels. T is the number of sampling points. F₁ and F₂ denote the filter numbers of the corresponding convolution layers, respectively. F₁=16,F₂ =32. Fs is the sampling frequency. axis is the dimension to which the normalization operation applies. ELU represents an activation function for convolution. Softmax represents the activation function used for classification.

S1, acquiring off-line data of EEG signals.

In this embodiment, step S1 is preprocessing of offline data prior to the training phase. The method specifically comprises the steps of data filtering, segmentation cutting, independent baseline correction, data normalization, artifact removal and data enhancement. Wherein the data enhancement includes segmentation reconstruction and addition of gaussian noise.

Specifically, interference of 50Hz power frequency and environmental noise is eliminated first, and EEG signals with a key frequency range of 0.5-30 Hz are screened out to serve as a subsequent analysis basis. The continuous EEG signal is then segmented into segments, leaving only MI segment data closely related to the task, and the segments are spliced. Where MI represents motor imagery (Motor Imagery), and MI segment data is motor imagery segment data obtained after EEG signal segmentation cutting.

In order to further improve the signal quality, according to the time characteristic of human response to stimulus, the average value of the rest period data 300 milliseconds before the beginning of each MI segment is selected as a baseline, and then independent baseline correction is carried out on each MI segment so as to eliminate the potential influence of baseline deviation. Artifacts that are often confused in EEG signals are then removed after the Z-score normalization data is used.

The invention carries out data enhancement processing on training data because small sample electroencephalogram data easily causes model overfitting, specifically, samples are randomly selected according to categories in each batch of training process, the samples are divided into a plurality of sections on average, and a section is randomly selected from each sample to be combined into new data, but the time sequence principle is required to be followed when the samples are combined. And then, adding random values following Gaussian distribution into the new data to realize data enhancement, so that the robustness and generalization capability of the model are improved. And finally, adding new data into the original training data, and completing training together by scrambling the sequence.

Data enhancement is applied before the model trains the study on each batch of data. The batch size of the model is set to 32 and after introducing the data enhancement technique, the system will enhance one batch of data input before the model begins to learn the data characteristics. The system can divide and reconstruct batch data according to the labels of the data, and generates simulation data by adding Gaussian noise. These newly generated data will mix with the original batch data and randomly shuffle the order, doubling the amount of data. Finally, the processed batch data are input together into the model for learning features.

In order to ensure that the model is not limited to a single convolution sensing range any more, but can flexibly process the characteristic information of different scales and layers, the invention adopts multi-scale space-time convolution to extract the local characteristics of EEG data, and enhances the characteristic identification capability of the model on the premise of not increasing the network depth.

The first core component of the network structure of the EEG signal decoding method based on the aggregate perception enhancement convolution transducer network is a space-time convolution module based on a CNN architecture. As shown in fig. 2, the space-time convolution module is provided with three branches. The structure of each branch is shown in fig. 4, and a time convolution layer, a space convolution layer, a batch normalization layer, an ELU activation function layer, an average pooling layer and a random inactivation (Dropout) layer are sequentially arranged.

Specifically, in the space-time convolution module, the structural composition of each branch is approximately the same, and double-layer convolution is adopted as a shallow layer feature encoder. However, the time convolution of the first layer differs in design by providing 16 large convolution kernels of different scales, specifically (1,)、(1,) And (1) the steps of (a),) The purpose is to obtain more abundant time dimension characteristics through convolution kernels of different receptive fields. The spatial convolution of the second layer uses 32 convolution kernels of size (C, 1) that are consistent with the number of device sampling channels to facilitate processing the spatial features of the data. After the convolution operation is completed, batch normalization is accessed to alleviate the covariate offset problem. The nonlinear expression capacity of the model is then enhanced by the ELU activation function. The feature dimension is then reduced by an average pooling layer of size (1, 75), reducing the redundancy features. During model training, a random inactivation (Dropout) technology is applied to accelerate convergence so as to reduce the risk of overfitting, and finally, more representative shallow space-time characteristics are obtained.

And S3, carrying out interactive sharing and strengthening of key features on the multi-scale shallow local features through a self-adaptive feature recalibration module to obtain a plurality of recalibrated features.

Step S3 specifically includes step S31 and step S32.

Specifically, the invention creates a self-adaptive feature recalibration module (AFR, adaptive feature recalibration for short), and re-recognizes the space-time features learned by calibration so as to emphasize the key part features, improve the sensitivity of the model to important information and improve the accuracy of the small data set decoding task.

The self-adaptive characteristic recalibration module firstly realizes characteristic information sharing among branches through an interactive connection structure. After each branch feature interaction, a convolution attention mechanism is introduced to strengthen the learning key features with excellent accurate focusing capability.

And S4, effectively fusing the characteristics of each branch through the self-adaptive characteristic recalibration module to obtain more comprehensive characteristic expression.

The fusion model is as follows:

。

in the formula,Is a fusion characteristic of a branched network,Representing characteristic splicing operation,Representing the characteristics of the first, second and third branch outputs, respectively.

S5, inputting the first fusion feature into a Position perception enhancement module (PAE, position AWARE ENHANCEMENT) to extract deep fine granularity features through parallel enhancement convolution, and carrying out self-adaptive coding on the deep fine granularity features to obtain Position perception enhancement features.

In the prior art, the problem of losing fine-grained information in the space and time dimensions exists. The invention provides a position perception enhancement module (PAE), which can give consideration to fine granularity and coarse granularity information, enhance APCformer learning ability of a network to characteristics, and enable subsequent SAT to optimize global and local characteristics based on richer and more relevant characteristics. The position perception enhancement module designs the structure of a parallel small convolution kernel, and is beneficial to capturing the local characteristics of fine granularity by reducing the convolution kernel receptive field. The structure of which is shown in fig. 5. The first fusion feature is transmitted to the PAE module, and deep fine granularity EEG features are extracted.

The location awareness enhancement module is shown in fig. 5. After the feature is subjected to parallel enhanced convolution, the feature is subjected to dimension transformation, and position coding information is added to the sequence.

Specifically, step S5 specifically includes steps S51 to S54.

Specifically, the input first fusion feature further extracts features in the time direction through 32 convolution kernels with the sizes of (1, 3) and (1, 7) which are arranged in parallel, normalizes feature batches obtained in the two convolutions, and then performs feature fusion to enrich the extracted information.

The model was then optimized by an average pooling layer of size (1, 3) and a random inactivation (Dropout) layer using the ELU activation function as the activation layer. To address the problem of insufficient adaptability of EEG signals in deep networks, the addition of a jump connection can reduce the risk of model overfitting to some extent and improve the stability of the model. And then, carrying out linear transformation on the feature dimensions, adding a leachable position encoder (PE, positional Encoding), setting a trainable matrix with the same size as the input feature dimensions, randomly initializing the matrix, continuously updating parameters of the position encoder along with network training, and independently mining association modes among different positions, so that the network can sense the position information of each feature.

S6, carrying out refinement treatment on the position perception enhancement features through a sparse information aggregation transducer module, further extracting long-range dependence and local association, realizing effective aggregation of local features and global features, and obtaining global refinement features. Preferably, step S6 specifically includes steps S61 to S64.

It will be appreciated that EEG signals tend to have cross-period correlations in time series, making it difficult to effectively capture long dependencies in the EEG signal, which is critical for accurate decoding. While the transducer exhibits excellent properties when processing sequence data, each element in the sequence can focus on all other elements, thereby capturing long dependencies accurately.

Convolution, while having unique advantages, focuses primarily on local information, with obvious shortcomings in dealing with global dependencies. Although a simple fusion transducer architecture can acquire long-range dependencies, local fine-grained features in the time sequence are ignored.

Therefore, the invention designs a sparse information aggregation transducer module (SAT, sparse Information Aggregation Transformer for short) for breaking the bottleneck that the traditional network is difficult to simultaneously consider global and local characteristics when processing EEG signals. Meanwhile, the thought of sparse attention is introduced, the efficiency of processing large-scale and high-dimension EEG data is improved, and the comprehensive capture of the intra-sequence relationship is realized efficiently. The complex relation and dependence among the features are fully mined through SAT, and the performance of the model is further improved.

The SAT structure shown in FIG. 6 includes a sliding window, aggregate attention, highest attention, and gating mechanisms.

S61, performing block division on the position perception enhancement features through a sliding window.

In particular, the method comprises the steps of,For inputting characteristic sequences of SAT、AndRespectively a first feature, a second feature and a first featureAnd features.

The sliding window performs block division on the input sequence, and then processes the blocks, and each block can be regarded as a local area, and finally integrates the local information into global features through an attention mechanism.

In particular, the invention adopts a fixed length ofDividing an input feature sequence with a length of n into m blocks, and setting a sliding interval between adjacent blocks asCan be divided intoNon-overlapping blocks. Wherein,. Thus, each block contains a succession of blocksA total of Token numbers will be generatedThe number of tokens to be used in the process of the Token,Greater than n.

Because the invention sets a sliding interval instead of directly dividing the whole sequence into a plurality of blocks, the calculation amount is increased compared with the method without dividing the blocks, but the method can supplement the connection between the blocks to a certain extent, and the information overlapping degree between the blocks can be adjusted by controlling the sliding interval, namely the sparsity ratio is controlledThe global context association is enhanced while maintaining local feature independence, making the overall network more flexible.

S62, respectively averaging the blocks, and carrying out average operation on continuous Token in the blocks to aggregate into a single block representation to obtain an aggregation characteristic.

In the initial stage of aggregation attention, the concept of sparse attention is introduced based on the consideration of global view, and the global mode of the sequence is quickly captured. It is not necessary that each position in the sequence is attentively calculated with all other positions, but divided m blocks are respectively averaged to make continuous blocksThe Token performs average operation and is aggregated into a single block representation, so that feature capture of the global mode is realized.

Sequentially arranging Token in parallel in all blocks, wherein the first Token position information of any block can be expressed asAny Token location information within a block can be represented as。AndThe key value pairs respectively representing the ith Token of the jth block are respectively corresponding to the block keys and the block value vectors respectively as follows,。

,The definition is as follows:

。

in the formula,Represent the firstEach block has a value ofRepresenting intra-block firstThe number of Token is given as。

Each query in the sequenceWith all aggregated block bondsPerforming dot product operation to obtain attention fraction, and then passing through block valueThe weighted sum is obtained. Wherein,In order to query the weights of the data,For the weight of the block key,Is the weight of the block value.

The output of the aggregate attention path may be expressed as:

。

in the formula,Represents the position of any Token,AndIs a block keySum block valueA collection of (C),Representing the transposition,Representing the feature vector dimension.

Global features of the whole sequence can be obtained by a method of aggregating blocks, and the calculation complexity is changed from originalTo reduce to,And the processing efficiency of the long sequence is obviously improved.

The aggregate attention based on the blocks can reduce the computational complexity and realize efficient long-distance dependent modeling, but the mere dependence on the block representation will have difficulty in recovering fine-grained information of the original sequence, which operation inevitably causes loss of characteristic information. In order to balance efficiency and accuracy, the invention screens and selects the block information, and reserves key and representative local characteristics, thereby realizing efficient long-distance dependent modeling without losing accuracy. Therefore, the invention integrates the highest attention mechanism on the basis of the aggregate attention, and the blocks with obvious contribution are selected to construct a hierarchical attention structure.

Specifically, the invention evaluates the importance of m blocks divided in the aggregate attention, and the evaluation standard is the attention score obtained by the block, so that the new calculation cost can be avoided without recalculation by taking the evaluation standard as the standard. Comparing the attention scores of all the blocks, selecting k positions with the largest score, marking as important, marking the rest positions as unimportant, obtaining the maximum k indexes, and generating an attention-dependent score matrixMask matrix of (a)Each row corresponds to a query, each column corresponds to a block, 1 is filled in the k index positions selected, and the rest positions are filled inThe weight of the non-Top-k position is made to approach 0.

Attention score for important blocksExpressed as:

。

Where k represents the first k blocks of highest importance.

And recovering s original Token in each block by k important blocks screened by the mask matrix M, and rearranging according to the position sequence in the original sequence. Relative to the firstThe Token positions corresponding to the important blocks are. After sequentially expanding the original Token of all k blocks, the total Token number is. At this point, the original key and value pair is restored toAnd。

The recovered key-value pairs will be associated with the query vectorAttention calculations were performed, expressed as:

。

in the formula,AndIs thatBack keySum valueA collection of (C),A recovered key which is the ith Token in the jth important block,Is the recovered value of the ith Token in the jth important block.

The highest attention is focused on the important blocks, the strong correlation features are selected to finish feature refining, and the feature refining is only performed on a few Token in the important blocks, so that the calculated amount of the step is far smaller than that of the original input sequence. At the time of holdingOn the basis of the computational complexity, the complexity is further optimized to be the same as that of the k < < m through sparse selectionComputational redundancy can be greatly reduced while retaining key fine-grained features.

In particular, the present invention creates a learnable gating mechanism on the path of aggregate attention and highest attention, which are combined to dynamically adjust the information flow through the information gate of the sigmoid activation function. This mechanism is capable of learning and identifying the importance of each path in the data, mapping values in the range of 0 to 1, adaptively determining throughput of information.

The output of the final SAT is made up of the results of aggregate attention and highest attention, expressed as:

。

in the formula,To output aggregate characteristics,Is a sigmoid activation function,An output representing the aggregate attention is presented,Is the output of the highest attention.

Local and global perception of EEG information is achieved by calculation of their weights. The hierarchical attention mechanism balances computational efficiency and feature integrity by both aggregating attention capture global patterns and restoring key local details by highest attention. The calculation amount and the feature granularity of the model can be flexibly controlled by adjusting the sparsity ratio r and the selection coefficient k, and the method is particularly suitable for processing EEG signals with high-dimensional time sequences.

And the classifier module receives the features from SAT processing, performs flattening and dimension reduction operation on the features by using a flattening layer, and converts the multi-dimensional features into one-dimensional features for feature integration. And then processing through two full-connection layers, inserting a random inactivation layer into the full-connection layers to reduce the risk of overfitting, and calculating the prediction probability of each category by the features through a softmax function so as to complete the training of the model and obtain a pre-training model for real-time prediction.

The model is trained through steps S1 to S7, and a pre-trained model which can be used for real-time prediction is obtained. In this embodiment, a classifier is trained using a cross entropy loss function that is able to quantify the difference between the model predictive probability distribution and the true labels and minimize this difference through an optimization process.

S8, acquiring real-time data of the EEG signals. In this embodiment, preprocessing of real-time data is required. In this embodiment, step S8 is preprocessing of real-time data prior to the prediction phase. Step S8 specifically includes steps S81 to S83.

S81, storing the acquired real-time EEG data stream through a buffer area.

Specifically, a real-time EEG signal is obtained through a real-time data stream, and data is extracted through a sliding time window for preprocessing, so that a preprocessed EEG signal is obtained. The aim of the EEG signal real-time processing is to realize the accurate control of external equipment through a brain-computer interface. Unlike pre-processing in the off-line training phase, the real-time process relies entirely on active imagination of the brain, without visual or audible cues, and the whole process continues to perform imagination excitation without rest segment data. Therefore, in the real-time preprocessing, the segmentation operation of the excitation interval is not needed, and the baseline correction is also adjusted from the original baseline based on the average value of the rest segment data of 300ms before the segment to the baseline based on the whole average value of the segment. In addition, the real-time preprocessing stage does not involve data enhancement operations, and the sequence of the rest steps is consistent with that of the preprocessing step of the offline training stage.

In particular, as shown in fig. 7 and 8, the present invention provides a buffer for the real-time data stream for storing all of the acquired real-time data. The sampling frequency of the device is 256Hz, a sliding time window with 1024 sampling points is arranged in the buffer zone, and the sliding distance of the window is 256 sampling points. And extracting fragment data from the buffer area in a sliding time window mode for real-time preprocessing. The buffer area ensures the integrity and continuity of window data in each prediction, and provides connection support for the access of subsequent data.

In order to solve the influence caused by equipment and data transmission delay, the fixed sampling point number related to the sampling frequency is adopted as a window positioning mark instead of taking absolute time as a dividing basis. The method for selecting the data by adopting the sliding time window has the advantages that firstly, the fixed sampling point number is used as a window division basis, so that the prediction error caused by equipment delay or time drift can be effectively avoided. Secondly, the design of the sliding time window can fully utilize the continuity of data, improve the data utilization rate and reduce the system delay. Finally, the gradual updating mechanism of the sliding time window can smooth the prediction result, and the real-time performance and stability of the system are enhanced, so that the accurate control of external equipment is ensured.

S9, inputting the data subjected to the real-time preprocessing into the pre-training model, and obtaining a decoding result of the real-time EEG signal. Preferably, step S9 specifically includes steps S91 to S93.

S93, mapping the prediction result of each time into a control instruction in real time.

Specifically, as shown in fig. 7 and 8, the sliding time window is set in the window described in step S8, when a complete window data is accumulated in the buffer area during the first prediction, the system performs real-time preprocessing on the window data, and predicts the window data through the pre-training model obtained in the offline training stage, where the window data for prediction is retained in the buffer area. And then, when the data quantity of a sliding distance is newly acquired, the sliding time window intercepts the new data, and combines the data reserved in the buffer area after the previous prediction to form new window data, the system immediately processes the new window data, and the pre-processed data in the window is predicted through a pre-training model. Finally, all the prediction results are mapped into control instructions immediately and sent to the target equipment.

The invention provides a novel EEG signal decoding network APCformer which is excellent in EEG classification and identification tasks, and effectively solves part of defects of the traditional network in the aspect of feature extraction. The network is a multi-scale feature interactive fusion network structure, and features are gradually refined in a layering mode. The space-time convolution and self-adaptive feature recalibration module captures rough granularity features for the network, refines the features through the position perception enhancement module, inputs position coding information and strengthens the internal connection between the features. The sparse information aggregation transducer module strengthens long-range dependence and local association, balances local and global correlation, and improves the depth and breadth of EEG data analysis in an omnibearing way.

The invention provides an EEG signal decoding device based on an aggregate perception enhanced convolution transducer network, which comprises a signal acquisition module, a space-time convolution module, a recalibration module, a fusion module, a position perception module, an information aggregation module and a classifier module.

The real-time data acquisition module is used for acquiring real-time data of the EEG signals;

The third embodiment provides an EEG signal decoding device based on an aggregate perception enhanced convolution transducer network, which is characterized by comprising a processor, a memory and a computer program stored in the memory. The computer program is executable by the processor to implement an EEG signal decoding method based on an aggregate perceptually enhanced convolutional Transformer network as described in any of the embodiments.

The fourth embodiment of the present invention provides a computer readable storage medium, where the computer readable storage medium includes a stored computer program, where when the computer program runs, the device where the computer readable storage medium is controlled to execute an EEG signal decoding method based on an aggregate perception enhanced convolutional transducer network according to any one of the first embodiments of the present invention.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other manners. The apparatus and method embodiments described above are merely illustrative, for example, flow diagrams and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, functional modules in the embodiments of the present invention may be integrated together to form a single part, or each module may exist alone, or two or more modules may be integrated to form a single part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, an electronic device, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. The storage medium includes various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory, a random access memory, a magnetic disk or an optical disk. It should be noted that, in the present invention, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.

The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be understood that the term "and/or" as used in the present invention is merely an association relation describing the associated object, and means that three kinds of relations may exist, for example, a and/or B, and that three kinds of cases where a exists alone, while a and B exist alone, exist alone. In the present invention, the character "/" generally indicates that the front and rear related objects are an or relationship.

The term "if" as used herein may be interpreted as "at" or "when" depending on the context "or" in response to a determination "or" in response to a detection. Similarly, the phrase "if determined" or "if detected (stated condition or event)" may be interpreted as "when determined" or "in response to determination" or "when detected (stated condition or event)" or "in response to detection (stated condition or event), depending on the context.

References to "first\second" in the embodiments are merely to distinguish similar objects and do not represent a particular ordering for the objects, it being understood that "first\second" may interchange a particular order or precedence where allowed. It is to be understood that the "first\second" distinguishing aspects may be interchanged where appropriate, such that the embodiments described herein may be implemented in sequences other than those illustrated or described herein.

The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

Translated fromChinese

1.一种基于聚合感知增强卷积Transformer网络的EEG信号解码方法，其特征在于，包含：1. An EEG signal decoding method based on an aggregated perception enhanced convolutional Transformer network, comprising:

获取EEG信号的离线数据；Obtain offline data of EEG signals;

根据所述离线数据，通过时空卷积模块提取多尺度浅层局部特征；Extracting multi-scale shallow local features through a spatiotemporal convolution module according to the offline data;

通过自适应特征重校准模块对所述多尺度浅层局部特征进行交互共享以及强化关键特征，获取重校准后的多个特征；Interactively sharing the multi-scale shallow local features and strengthening key features through an adaptive feature recalibration module to obtain multiple recalibrated features;

将重校准后的多个特征进行融合，获取第一融合特征；Fusing the recalibrated multiple features to obtain a first fused feature;

将第一融合特征输入到位置感知增强模块中通过并行的增强卷积提取深层细粒度特征，并对深层细粒度特征进行自适应编码，获取位置感知增强特征；The first fused feature is input into the location-aware enhancement module to extract deep fine-grained features through parallel enhanced convolution, and the deep fine-grained features are adaptively encoded to obtain location-aware enhanced features;

通过稀疏信息聚合Transformer模块对所述位置感知增强特征进行精细化处理，提取长程依赖与局部关联，获取全局精细化特征；The location-aware enhanced features are refined through a sparse information aggregation Transformer module to extract long-range dependencies and local associations to obtain global refined features;

将所述全局精细化特征输入分类器，以完成对模型的训练，得到可用于实时预测的预训练模型；Inputting the global refined features into a classifier to complete the training of the model and obtain a pre-trained model that can be used for real-time prediction;

获取EEG信号的实时数据；Get real-time data of EEG signals;

将所述实时数据输入所述预训练模型，获取实时EEG信号的解码结果。The real-time data is input into the pre-trained model to obtain the decoding result of the real-time EEG signal.

2.根据权利要求1所述的一种基于聚合感知增强卷积Transformer网络的EEG信号解码方法，其特征在于，获取EEG信号的离线数据，具体包括：2. According to claim 1, the EEG signal decoding method based on the aggregated perception enhanced convolutional Transformer network is characterized in that obtaining offline data of the EEG signal specifically includes:

获取存储的EEG信号数据，消除50Hz工频及环境噪声干扰，并筛选出0.5～30Hz关键频段EEG信号；Obtain the stored EEG signal data, eliminate the interference of 50Hz power frequency and environmental noise, and filter out the EEG signal in the key frequency band of 0.5-30Hz;

将连续的EEG信号进行分段切割处理，去除休息段数据，仅保留MI片段数据；The continuous EEG signal is segmented and processed, the rest segment data is removed, and only the MI segment data is retained;

以MI片段数据前300毫秒的休息段数据均值作为基线，对每个MI片段数据进行独立基线校正，以消除基线偏移的潜在影响；The mean of the rest segment data 300 milliseconds before the MI segment data was used as the baseline, and each MI segment data was independently baseline corrected to eliminate the potential influence of baseline shift;

标准化数据后对混淆于EEG信号中的伪迹进行去除，获取预处理后的EEG数据；After standardizing the data, the artifacts that are confused in the EEG signal are removed to obtain the preprocessed EEG data;

将训练数据按照标签划分进行分割重构处理，并额外添加遵循高斯分布的随机值模拟批次数据，新产生的数据与原始批次数据打乱混合，获取数据增强后的EEG数据，而后共同输入至模型中学习特征；The training data is segmented and reconstructed according to the label division, and random values following the Gaussian distribution are added to simulate batch data. The newly generated data is mixed with the original batch data to obtain the EEG data after data enhancement, and then they are input into the model to learn features.

获取EEG信号的实时数据，具体包括：Get real-time data of EEG signals, including:

通过缓冲区存储采集到的实时EEG数据流；The real-time EEG data stream collected is stored in a buffer;

通过滑动时间窗从缓冲区中提取实时片段数据；Extract real-time segment data from the buffer using a sliding time window;

对所述实时片段数据进行预处理；其中，实时数据的预处理步骤比所述离线数据的预处理步骤少了数据增强处理，并且基线校正操作以片段整体均值为基准。The real-time segment data is preprocessed; wherein the preprocessing step of the real-time data lacks data enhancement processing compared to the preprocessing step of the offline data, and the baseline correction operation is based on the overall mean of the segment.

3.根据权利要求1所述的一种基于聚合感知增强卷积Transformer网络的EEG信号解码方法，其特征在于，所述时空卷积模块设置有三条支路；每条支路依次设有时间卷积层、空间卷积层、批归一化层、ELU激活函数层，以及平均池化层；第一条支路的时间卷积层为16个（1，）的卷积核；第二条支路的时间卷积层为16个（1，）的卷积核；第三条支路的时间卷积层为16个（1，）的卷积核；Fs为采样频率；空间卷积层使用32个大小为（C，1）的卷积核，C为通道数；其中，模型训练时，还设置有随机失活层，以减少过拟合风险；3. According to claim 1, an EEG signal decoding method based on an aggregated perception enhanced convolutional Transformer network is characterized in that the spatiotemporal convolution module is provided with three branches; each branch is provided with a temporal convolution layer, a spatial convolution layer, a batch normalization layer, an ELU activation function layer, and an average pooling layer in sequence; the temporal convolution layer of the first branch is 16 (1, ) convolution kernel; the second branch has 16 temporal convolution layers (1, ) convolution kernel; the third branch has 16 temporal convolution layers (1, ) convolution kernel; Fs is the sampling frequency; the spatial convolution layer uses 32 convolution kernels of size (C, 1), where C is the number of channels; during model training, a random dropout layer is also set to reduce the risk of overfitting;

根据所述EEG信号，通过时空卷积模块提取多尺度浅层局部特征，具体包括：According to the EEG signal, multi-scale shallow local features are extracted through the spatiotemporal convolution module, specifically including:

将所述EEG信号，分别输入时空卷积模块的三个分支，以获取多尺度浅层局部特征；The EEG signal is input into three branches of the spatiotemporal convolution module respectively to obtain multi-scale shallow local features;

时空卷积模块的三个分支分别依次通过时间卷积层和空间卷积层进行卷积操作提取特征；并将卷积操作提取的特征输入依次连接的批归一化层、ELU激活函数层和平均池化层。The three branches of the spatiotemporal convolution module perform convolution operations through the temporal convolution layer and the spatial convolution layer respectively to extract features; and the features extracted by the convolution operation are input into the batch normalization layer, the ELU activation function layer and the average pooling layer connected in sequence.

4.根据权利要求1所述的一种基于聚合感知增强卷积Transformer网络的EEG信号解码方法，其特征在于，通过自适应特征重校准模块对所述多尺度浅层局部特征进行交互共享以及强化关键特征，获取重校准后的多个特征，具体包括：4. The EEG signal decoding method based on the aggregated perception enhanced convolutional Transformer network according to claim 1 is characterized in that the multi-scale shallow local features are interactively shared and key features are strengthened through an adaptive feature recalibration module to obtain multiple features after recalibration, specifically including:

以交互式连接结构将时空卷积模块的各条支路提取到的特征进行特征叠加，获取与时空卷积模块的分支数量相同的多个交互特征；The features extracted from each branch of the spatiotemporal convolution module are superimposed with an interactive connection structure to obtain multiple interactive features with the same number of branches as the spatiotemporal convolution module;

通过卷积注意力机制，分别计算多个交互特征沿通道和空间的特征，获取重校准后的多个特征。Through the convolutional attention mechanism, the features of multiple interactive features along the channel and space are calculated separately to obtain multiple recalibrated features.

5.根据权利要求1所述的一种基于聚合感知增强卷积Transformer网络的EEG信号解码方法，其特征在于，将第一融合特征输入到位置感知增强模块中通过并行的增强卷积提取深层细粒度特征，并对深层细粒度特征进行自适应编码，获取位置感知增强特征，具体包括：5. According to claim 1, the EEG signal decoding method based on the aggregated perception enhanced convolutional Transformer network is characterized in that the first fusion feature is input into the position perception enhancement module to extract deep fine-grained features through parallel enhanced convolution, and the deep fine-grained features are adaptively encoded to obtain the position perception enhanced features, specifically including:

根据所述第一融合特征，通过并行设置的1x3卷积和1x7卷积分别在时间方向上进一步提取特征，提取后的特征通过批归一化处理然后进行特征融合；According to the first fusion feature, further extract features in the time direction through parallel 1x3 convolution and 1x7 convolution, and the extracted features are processed by batch normalization and then feature fusion is performed;

特征融合后的特征使用ELU激活函数和平均池化层进行处理；其中，训练阶段还设置有随机失活层；The features after feature fusion are processed using the ELU activation function and the average pooling layer; a random dropout layer is also set in the training stage;

通过跳跃连接，将平均池化层处理后的特征和所述第一融合特征相加；Through a skip connection, the features processed by the average pooling layer are added to the first fused features;

将相加后的特征进行维度转换，然后进行自适应编码，获取位置感知增强特征。The added features are dimensionally transformed and then adaptively encoded to obtain location-aware enhanced features.

6.根据权利要求1至5任意一项所述的一种基于聚合感知增强卷积Transformer网络的EEG信号解码方法，其特征在于，通过稀疏信息聚合Transformer模块对所述位置感知增强特征进行精细化处理，提取长程依赖与局部关联，获取全局精细化特征，具体包括：6. The EEG signal decoding method based on the aggregated perception enhanced convolutional Transformer network according to any one of claims 1 to 5 is characterized in that the position perception enhanced features are refined through the sparse information aggregation Transformer module to extract long-range dependencies and local associations to obtain global refined features, specifically including:

通过滑动窗口对所述位置感知增强特征进行区块划分，获取多个区块；Dividing the location-aware enhanced feature into blocks through a sliding window to obtain a plurality of blocks;

将多个区块分别均值化，对区块内连续的Token进行平均运算聚合成一个单独块表示，获取聚合区块；Average multiple blocks separately, perform average calculation on consecutive tokens in the block and aggregate them into a single block representation to obtain the aggregated block;

通过最高注意力机制，计算各个聚合区块的注意力分数，筛选出的k个重要区块，恢复重要区块的原始Token，获取最高注意力区块；Through the highest attention mechanism, the attention scores of each aggregated block are calculated, k important blocks are screened out, the original tokens of the important blocks are restored, and the highest attention blocks are obtained;

通过门控机制组合所述聚合区块和所述最高注意力区块，获取所述全局精细化特征。The aggregated block and the highest attention block are combined through a gating mechanism to obtain the global refined features.

7.根据权利要求1至5任意一项所述的一种基于聚合感知增强卷积Transformer网络的EEG信号解码方法，其特征在于，将所述全局精细化特征输入分类器，以完成对模型的训练，得到可用于实时预测的预训练模型，具体包括：7. The EEG signal decoding method based on the aggregated perception enhanced convolutional Transformer network according to any one of claims 1 to 5, characterized in that the global refined features are input into a classifier to complete the training of the model to obtain a pre-trained model that can be used for real-time prediction, specifically comprising:

利用展平层对特征进行展平降维操作，将多维的特征转换为一维进行特征整合；Use the flattening layer to flatten and reduce the dimension of features, converting multi-dimensional features into one dimension for feature integration;

将展平后的特征通过两个全连接层处理；其中，训练阶段在两个全连接层中插入随机失活层；The flattened features are processed through two fully connected layers; in the training phase, random dropout layers are inserted into the two fully connected layers;

将全连接层处理后的特征，通过softmax函数，计算出每个类别的预测概率，以此完成对模型的训练，得到可用于实时预测的预训练模型；The features processed by the fully connected layer are passed through the softmax function to calculate the prediction probability of each category, so as to complete the training of the model and obtain a pre-trained model that can be used for real-time prediction;

将实时预处理后的数据输入所述预训练模型，获取实时EEG信号的解码结果，具体包括：Inputting the real-time preprocessed data into the pre-trained model to obtain the decoding result of the real-time EEG signal, specifically including:

首次预测时，待缓冲区积累一个完整窗口数据后，利用预训练模型开展预测；During the first prediction, after a complete window of data is accumulated in the buffer, the pre-trained model is used to make predictions.

每新采集一个滑动距离的数据量，用预训练模型对此窗口内的数据实施预测；Every time a new amount of data of a sliding distance is collected, the pre-trained model is used to predict the data in this window;

每一次的预测结果都映射为控制指令。Each prediction result is mapped into control instructions.

8.一种基于聚合感知增强卷积Transformer网络的EEG信号解码装置，其特征在于，包含：8. An EEG signal decoding device based on an aggregated perception enhanced convolutional Transformer network, comprising:

离线数据获取模块，用于获取EEG信号的离线数据；An offline data acquisition module, used to acquire offline data of EEG signals;

时空卷积模块，用于根据所述离线数据，通过时空卷积模块提取多尺度浅层局部特征；A spatiotemporal convolution module, used to extract multi-scale shallow local features through the spatiotemporal convolution module according to the offline data;

重校准模块，用于通过自适应特征重校准模块对所述多尺度浅层局部特征进行交互共享以及强化关键特征，获取重校准后的多个特征；A recalibration module, used for interactively sharing the multi-scale shallow local features and strengthening key features through an adaptive feature recalibration module to obtain a plurality of recalibrated features;

融合模块，用于将重校准后的多个特征进行融合，获取第一融合特征；A fusion module, used for fusing the recalibrated multiple features to obtain a first fused feature;

位置感知模块，用于将第一融合特征输入到位置感知增强模块中通过并行的增强卷积提取深层细粒度特征，并对深层细粒度特征进行自适应编码，获取位置感知增强特征；A position perception module, used for inputting the first fusion feature into the position perception enhancement module to extract deep fine-grained features through parallel enhanced convolution, and adaptively encoding the deep fine-grained features to obtain position perception enhanced features;

信息聚合模块，用于通过稀疏信息聚合Transformer模块提取长程依赖与局部关联，获取全局精细化特征；The information aggregation module is used to extract long-range dependencies and local associations through the sparse information aggregation Transformer module to obtain global refined features;

分类器模块，用于将所述全局精细化特征输入分类器，以完成对模型的训练，得到可用于实时预测的预训练模型；A classifier module, used for inputting the global refined features into a classifier to complete the training of the model and obtain a pre-trained model that can be used for real-time prediction;

实时数据获取模块，用于获取EEG信号的实时数据；A real-time data acquisition module, used to acquire real-time data of EEG signals;

实时解码模块，用于将所述实时数据输入所述预训练模型，获取实时EEG信号的解码结果。The real-time decoding module is used to input the real-time data into the pre-training model to obtain the decoding result of the real-time EEG signal.

9.一种基于聚合感知增强卷积Transformer网络的EEG信号解码设备，其特征在于，包括处理器、存储器，以及存储在所述存储器内的计算机程序；所述计算机程序能够被所述处理器执行，以实现如权利要求1至7任意一项所述的一种基于聚合感知增强卷积Transformer网络的EEG信号解码方法。9. An EEG signal decoding device based on an aggregated perception enhanced convolutional Transformer network, characterized in that it includes a processor, a memory, and a computer program stored in the memory; the computer program can be executed by the processor to implement an EEG signal decoding method based on an aggregated perception enhanced convolutional Transformer network as described in any one of claims 1 to 7.

10.一种计算机可读存储介质，其特征在于，所述计算机可读存储介质包括存储的计算机程序，其中，在所述计算机程序运行时控制所述计算机可读存储介质所在设备执行如权利要求1至7任意一项所述的一种基于聚合感知增强卷积Transformer网络的EEG信号解码方法。10. A computer-readable storage medium, characterized in that the computer-readable storage medium includes a stored computer program, wherein when the computer program is running, the device where the computer-readable storage medium is located is controlled to execute the EEG signal decoding method based on the aggregated perception enhanced convolutional Transformer network as described in any one of claims 1 to 7.