CN110991226A

Movatterモバイル変換

Info

Publication number: CN110991226A
Application number: CN201911006001.6A
Authority: CN
Inventors: 梁伟; 鲁明丽; 杨晨婷; 邱佳华; 张哲�
Original assignee: Changshu Institute of Technology
Current assignee: Changshu Institute of Technology
Priority date: 2020-01-16
Filing date: 2020-01-16
Publication date: 2020-04-10
Anticipated expiration: 2040-01-16
Also published as: CN110991226B

Abstract

Translated fromChinese

本发明公开了一种基于重构模型的人体运动意向检测方法，按照如下步骤进行：训练阶段，利用执行特定的人工处理意图任务的脑电信号训练重构模型，包括特征提取步骤和分类步骤；特征提取步骤采用组合算法波器组FBCSP进行特征提取；分类步骤通过分类器，分类识别用户是否正在执行某个运动意图任务；检测阶段，将待确定的脑电周期输入重构模型，计算其重构误差；重构误差越小，在检测阶段，将待确定的脑电周期输入重构模型，计算其重构误差，重构误差越小，该观察周期存在某个运动意图任务的可能性越大；本发明利用重构模型来表示运动意图的高层次抽象，并利用重构误差来确定是否存在运动意图；对任何复杂的现实情况具有理论上的灵活性和可靠性。

The invention discloses a method for detecting human motion intention based on a reconstruction model, which is carried out according to the following steps: in the training stage, the reconstruction model is trained by using the electroencephalogram signal for performing a specific manual processing intention task, which includes a feature extraction step and a classification step; The feature extraction step adopts the combined algorithm wave filter group FBCSP for feature extraction; the classification step uses the classifier to classify and identify whether the user is performing a certain movement intention task; Reconstruction error; the smaller the reconstruction error, the more likely the EEG cycle to be determined is input to the reconstruction model in the detection stage to calculate its reconstruction error. Large; the present invention uses the reconstruction model to represent the high-level abstraction of the motion intention, and uses the reconstruction error to determine whether the motion intention exists; it has theoretical flexibility and reliability for any complex real situation.

Description

Human motion intention detection method based on reconstruction model

Technical Field

The present invention relates to human motion intention detection, and more particularly, to a human intention detection (RID) method based on reconstruction and applications thereof.

Background

Electroencephalography (EEG) signals are voltages collected from different parts of the user's scalp to measure brain activity. Electroencephalography has been widely used in brain-computer interface (BCI) systems because of its zero clinical risk and possession of portable acquisition equipment. The BCI system provides a potential bridge from the brain to peripheral devices for normal as well as disabled users. With this BCI technique, signals of brain activity (e.g., motor intent) can be used as control commands. One classic application of this system is BCI assisted stroke rehabilitation. The study of electroencephalographic signals is receiving increasing attention from researchers due to its potential medical and industrial application prospects.

Since BCI systems have achieved paradigm shifts, machine learning has been recognized as a key tool for electroencephalographic analysis. Although previous studies have demonstrated the effectiveness of DE15 encoding brain signals, there are still well-known challenges that prevent widespread use of electroencephalogram-based BCI systems. One of the key obstacles is that most electroencephalogram decoding studies are focused on answering questions such as "which hand is a subject's imagined movement, left or right? "such a problem can be modeled as a classification problem and solved by a supervised learning approach. However, such a problem necessarily precedes the above-mentioned problem: "is a subject imagining to move a hand? Before answering the question "which", the BCI system may do some uncertain action if not answering the question "no". For example, a wheelchair based on BCI technology has proven promising through several rudimentary forms controlled by motor intent. Assuming that the core algorithm of the system is a four-level classification model, which controls the wheelchair to move in four directions, the subject should maintain a fixed and accurate movement intention to accurately control the direction of the wheelchair movement. However, the real world situation is more complex and surprising, since the user's true intent is typically 30 occurrences in different and infrequent places. For example, if the user is interrupted by an event (e.g., a telephone call) or the user's mind has just left, then the current solution cannot handle this situation, but at this time it still decides into which of the four classes the brain signals are classified and then controls the wheelchair in the corresponding direction. Thus, such systems are not reliable nor practical in nature and may also lead to serious accidents. In addition to uncertainty, collecting intent data can be cumbersome because it requires the subject to focus on, which is particularly impractical for children and others who have special needs (e.g., BCI assisted autism therapy), the elderly, or the disabled. The scarcity of intent occurrences and the difficulty of obtaining accurate motor intent have resulted in inadequate data obtained through training.

It would be further desirable if the intent detection method could first determine whether the user is performing an intent to move and then determine the direction in which the user wishes to move. Therefore, the whole system is more powerful, and can better cope with complex situations. Thus, in this study, the present invention is directed to answering such a "yes or no" question, which can be abstracted by electroencephalogram signals as an intent-to-detect question. The objective is to accurately determine whether a subject has some motor intent during the observation period.

However, challenges still remain for the following reasons. One reason is that the electroencephalogram signal itself, which exhibits strong variability from subject to subject, varies greatly between different segments of the electroencephalogram signal for the same user, even in recordings made in the same experiment. Another key challenge is that although this problem appears to be a binary classification problem, the invention does not know the "no" class. Unlike previous electroencephalographic studies in which a subject is directed to perform certain mental tasks, such as imagining movement of the left or right hand, the present invention cannot define a user's mental state by excluding that he is not performing a certain mental task. The possibility of the user's mental state is unlimited except for certain intentions, such as imagining a movie, a food, and even something that he does not know himself. Therefore, the conventional supervised classification method in the field of machine learning cannot solve this problem.

Disclosure of Invention

1. Objects of the invention

A brain-computer interface (BCI) enables a human being to communicate with and intuitively control external devices through brain signals. Successful detection of motor intent paves the way to developing BCI applications. The current study is mainly focused on answering questions such as "which hand of a subject is moving in imagination, right hand or left hand? ". In answering the question of "whether a subject is imagining to move one hand", because the present invention cannot continuously perform the intention detecting task during the use of the brain wave detecting apparatus, while the interval period (where some motor intention detection is not performed) may cause an unexpected operation, resulting in a BCI system malfunction. However, this intent detection task is more difficult because it is difficult to know what the "no" case is for a "no" question, and thus obtain training samples for the "no" case. Furthermore, certain infrequent athletic intentions or accidents actually make the intent detection task more difficult. In order to solve the problem, a human motion intention detection method based on a reconstructed model is provided.

2. The technical scheme adopted by the invention

The invention discloses a human motion intention detection method based on a reconstructed model, which is carried out according to the following steps:

step 1, in a training stage, training a reconstruction model by utilizing electroencephalogram signals for executing a specific artificial processing intention task, wherein the training stage comprises a characteristic extraction step and a classification step; in the characteristic extraction step, a wave group FBCSP with a combined algorithm is adopted for characteristic extraction; the classification step is to classify and identify whether the user is executing a certain movement intention task or not through a classifier;

step 2, in the detection stage, the electroencephalogram period to be determined is input into a reconstruction model, and the reconstruction error of the reconstruction model is calculated; the smaller the reconstruction error is, in the detection stage, inputting the electroencephalogram period to be determined into a reconstruction model, and calculating the reconstruction error of the electroencephalogram period, wherein the smaller the reconstruction error is, the higher the possibility that a certain movement intention task exists in the observation period is;

step 2.1, training the reconstruction model by adopting subspace projection, taking the sample with larger reconstruction error as an abnormal value, using the reconstruction model as a classifier, carrying out classification training on the reconstruction model, and classifying each sample into the class most suitable for the reconstruction model;

step 2.2, the automatic encoder trains the reconstruction model, and two types of automatic encoders are applied according to three query strategies for detecting the movement intention, namely trial type, recording type and sectional type: fully connecting an automatic encoder and a CNN automatic encoder;

step 2.3, learning of a sparse dictionary is used as a reconstruction model for realizing the RID scheme, sparse representation of input data is learned through the dictionary, redundant atoms are caused, a single sample is allowed to have multiple representations, and particularly, the study of sparse dictionary learning is included so as to improve the representation sparsity and flexibility; the number of normal fragment data is larger than the number of fragment data of the execution intention, is various, and may have some association with the fragment data of the execution intention; the potential representation is composed of a dictionary of atoms and a sparse code, which is used during reconstruction to approximate the original input, using the sparse code, and a linear combination of the atoms themselves and the atoms.

Further, step 2.1 defines a set of m-dimensional vectors using subspace projection, the vector mappings belonging to R^dTo the samples x to belong to R^m(m.ltoreq.d) sample

And the principal component is obtained by m variables with three properties, as follows:

1) the major components are orthogonal;

2) the variance of the first principal component is the largest, and the variance of each subsequent component is gradually reduced;

3) the sum of the variations of all the principal components is equal to the sum of the variations of the original variables;

suppose that

Is formed by d variable v₁，v₂...v_dThe correlation matrix calculated by the target reconstruction training set calculates d eigenvalue eigenvector pairs from R, and orders according to the eigenvalues to obtain (lambda)₁,e₁),(λ₂,e₂),...,(λ_d,e_d) Wherein λ is₁≥λ₂≥...λ_d(ii) a And is

Principal component x at ith order of sample ═ x (x)₁,x₂,...x_d)^TCan be calculated as:

selecting a first pair of m eigenvalue eigenvectors (λ)₁,e₁),(λ₂,e₂),...,(λ_m,e_m) The resulting projection matrix P is:

P＝(e₁,e₂...e_m)^T(3)

wherein

Any observation x can be converted into:

y＝Px (4)

wherein

Due to the characteristics of the feature vectors, the reconstruction process is simple:

it should be noted that P is only present when m ═ d^-1＝P^TThe reconstruction is perfect; otherwise, due to the compression,

and x are different, m is set to be less than d,

considering that an overly perfect reconstruction may limit the effectiveness of identifying data patterns from reconstruction losses;

thus, a subspace projection based reconstruction model may be defined as:

wherein

Can be calculated by acquiring a projection matrix of the PCA; after training, the reconstruction error of the electroencephalographic query segment q can be calculated as:

further, step 2.2, using the autoencoder as a reconstruction model, two types of autoencoders are applied according to three query strategies for motion intention detection, namely trial, record and segment: fully connect automatic encoder and CNN automatic encoder, specifically as follows:

step 2.2.1, fully connected autoencoder

The fully-connected self-encoder consists of an encoder

m is the dimension of the hidden layer and a decoder

Each layer is defined by a corresponding weight W, bias b and activate function:

Φ＝f_Φ(W_Φx+b_Φ) (7)

Ψ＝f_Ψ(W_Ψx+b_Ψ) (8)

wherein

And

f_Φand f_ΨRespectively representing an encoding activation function and a decoding activation function, the reconstruction model being defined as:

the parameter θ can be obtained by minimizing:

after training, the reconstruction error of the electroencephalographic query segment q can be calculated as:

it should be noted that using a linear activation function, the auto-encoder can generate the same subspace as the PCA; therefore, a nonlinear correction linear unit is taken as an activation function;

step 2.2.2 CNN autoencoder

Since electroencephalography signals are time series data with multiple channel readings at each point in time, the input data to the auto-encoder may be in a two-dimensional (2D) format; therefore, we apply CNN autoencoder as reconstruction model to process two-dimensional electroencephalographic data; similar to fully connected autoencoders, CNN autoencoders also have an encoder portion and a decoder portion; the main difference is that the encoder and decoder sections are constructed mainly with convolutional neural networks;

specifically, there are three 2D-CNN layers at the encoder stage, one maximum pooling layer after each layer, three upsampling layers and convolutional layer pairs at the decoder layer; to formalize the two-dimensional convolution operation, the value of the neuron at position (x, y) in the k-th layer feature map of the L layer is given by:

where relu (x) max (0, x) is the activation function; in this equation, b_klIs a^thK in layer^thThe deviation of the mapping of the features is,

is the weight at (P, q) of the kernel connected to this feature map, covering the feature map in the previous layer, P_kAnd Q_lRepresents the size of the kernel;

after each convolutional layer, a maximum pooling operation with a stride of [2 × 2] is applied to reduce the data dimension by half, so that at the decoder stage, the data dimension needs to be expanded twice in each decoder layer; there are two methods of extending the phase data dimension of a decoder: transpose convolution operation and up-sampling interpolation, and using nearest neighbor up-sampling interpolation method and convolution layer as the basic component of decoder phase; after the CNN decoder, reconstructing the input data with a final output layer; the final output layer is a convolutional layer, and the size of output data is the same as that of input data;

random gradient descent using Adam update rule to minimize loss function:

wherein ζ_jIs a set of parameters for a neural network, the network parameters being numbered 10^-4Optimizing the learning rate;

step 2.3, learning sparse representation of input data through a dictionary, resulting in redundant atoms, allowing a single sample to have multiple representations, especially including works for studying sparse dictionary learning to improve representation sparsity and flexibility; the number of normal fragment data is larger than the number of fragment data of the execution intention, is various, and may have some association with the fragment data of the execution intention.

Further, step 3, learning of a sparse dictionary is taken as a reconstruction model for realizing the RID scheme

The potential representation is composed of a dictionary composed of a plurality of atoms and a sparse code, and the original input is approximated by utilizing the sparse code and the linear combination of the atoms and the atoms in the reconstruction process; formally, the reconstruction model is defined as:

wherein theta is_i＝[d₁，d₂，...,d_m]Is a dictionary containing m atoms,

is an input vector

For sparse codes of (1), most of the coefficients are zero or close to zero, to construct an overcomplete dictionary, the dictionary dimension m being set larger than the input dimension d; the hypercomplex dictionary does not require atoms to be orthogonal, thus allowing for a more flexible dictionary and richer data representation; dictionary theta_iAnd sparse code c_iThe following optimization problem can be solved to learn in the training phase:

obey | | d_k||₂1 represents all 1. ltoreq. k. ltoreq.m; the first term is the data fitting term, the second term is the sparse induction regularization; this minimization problem is a problem with NP-hard, as there are

And can be produced by mixing

By replacing it with convex slack, i.e. using

To approximate a solution; for theta_iAnd c_iTo say, one is fixed, then this problem becomes a convex one; we obtain an approximate result because it is not co-convex (theta)_i,c_i) (ii) a The optimal sparse code and dictionary can be obtained through an alternate updating scheme:

1) sparse code approximation: updating sparse code c by solving equation (15)_iAnd using the dictionary iteratively corrected last time;

2) and (3) dictionary refinement: updating dictionary θ by solving equation (15)_iAnd using the sparse code fixed in the last iteration;

in the detection phase, given a query segment q, its sparse code can be computed as c_i(q) reconstruction is represented by θ_ic_i(q); thus the reconstruction error is

After the reconstruction error is obtained, an intent-specific threshold needs to be applied to determine whether the query fragment is an intent execution fragment or a normal fragment.

Furthermore, a given observation period may be divided into a series of short segments, the intent detection task may be viewed as determining whether a subject is performing a particular intent in a short segment, and the research objective transitions from a long period of time that may have both an intent execution portion and an unintended execution portion to a short segment intent with or without a particular intent-the presentation segment is a time segment of a subject during the presentation of a particular intent task; the common segment is a time segment of which the main body does not execute a certain intended task; the query segment is a time segment for determining whether the subject performs a certain task; the training segment is a time segment used for training the reconstruction model, and must be a segment of execution intention;

given a query segment of an electroencephalographic recording q, it is determined whether the subject is performing some intended task T in that segment_i(ii) a Formally, given a set of N electroencephalographic patches

Corresponding to the same destination task T during each epoch_iThe reconstruction model is expressed as

Where d enters the dimension of the reconstructed model and θ_iIs a parameter of the reconstructed model that can be transformed by a training phase by minimizing a loss function:

after the reconstruction model is established, the segment q is inquired for each electroencephalogram, and the reconstruction error is calculated

Determining a correlation between the electroencephalogram query segment and the psychological intent as a similarity measure; reconstruction error

The smaller the electroencephalogram query segment q and psychological intent T_iThe greater the correlation.

Still further, in performing yes or no motion intention detection, the performance of the RID scheme implemented by subspace projection, autoencoder, and sparse dictionary learning normalizes the relative average errors by partitioning the average reconstruction errors of the intended segments of the executed query;

normalized Relative Mean Error (NRME) phi_relativeIs defined as:

φ^oand phiⁱThe average reconstruction error for the ordinary and the intended, respectively, execution of the query fragment.

Further, the normalized relative reconstruction error is calculated as follows:

wherein e_kIs the reconstruction error for the k-th instance,

is the average reconstruction error of the intent to execute the query fragment.

3. Advantageous effects adopted by the present invention

(1) The present invention utilizes a reconstruction model to represent a high level abstraction of the movement intent and utilizes reconstruction errors to determine whether a movement intent is present.

(2) The invention utilizes different reconstruction models to carry out comprehensive detection experiments on two movement intention tasks, and proves that the proposed RID scheme has good performance. Not only has theoretical flexibility and reliability for any complex realistic situation, but also does not need manual processing and profound professional knowledge.

(3) The present invention exhibits good performance on the synthetic query fragments, and obtains a "baseline" query fragment when acquiring data intended to perform a task, even in an environment where baseline data is acquired, thereby obtaining competitive results.

(4) The invention discusses three inquiry strategies for movement intention detection, namely trial type, record type and sectional type, and three different reconstruction models are constructed to realize the scheme so as to show the flexibility of the scheme to various reconstruction models. Furthermore, the solution of the invention does not require any manual handling and excessive expertise.

Drawings

FIG. 1 is an intent detection scheme based on reconstruction.

FIG. 2 is a flow chart of intent detection based on reconstruction.

Fig. 3 is a left fist: normalized relative reconstruction error distributions for different query strategies and different reconstruction models. (query relevance refers to the query fragment of the execution intent).

Fig. 4 is a left fist: average accuracy and recall of different query strategies and different reconstruction models over different thresholds. (query relevance refers to the query fragment of the execution intent).

Fig. 5 is a right fist: normalized relative reconstruction error distributions for different query strategies and different reconstruction models. (query relevance refers to the query fragment of the execution intent).

Fig. 6 is a right fist: average precision and recall ratio of different thresholds under different query strategies and different reconstruction models. (related queries refer to the intent to execute a query fragment).

FIG. 7 is a movement intent detection F1 score with optimal decision thresholds for different query strategies and reformulation models.

FIG. 8 is an intent detection and left-punch movement intent fragment of the synthetic query fragment: the F1 score is based on a threshold.

FIG. 9 is an intent detection and right-punch movement intent fragment of the synthetic query fragment: the F1 score is based on a threshold.

Detailed Description

The technical solutions in the examples of the present invention are clearly and completely described below with reference to the drawings in the examples of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without inventive step, are within the scope of the present invention.

The present invention will be described in further detail with reference to the accompanying drawings.

The invention discloses a reconstructed human-based intention detection (RID) method that is capable of identifying whether a subject is certain to perform an athletic intention within a given observation period. We use the reconstruction error as a criterion for identifying the intended implementation. In the training stage, the electroencephalogram signal executing a specific artificial processing intention task is used for training a reconstruction model, in the detection stage, the electroencephalogram period to be determined is input into the reconstruction model, and the reconstruction error is calculated. The smaller the reconstruction error, the more likely that a certain motor-intended task exists for the observation period. The proposed solution is scalable in any real-life scenario without the need to know in advance what the scenario is not intended to be executed. Furthermore, the solution does not require manual processing and excessive expertise, which is often important in traditional electroencephalographic communities. Whereas systematic experiments on a data set of large electroencephalograms with two motor-intended tasks were aimed at studying the effectiveness of the RID protocol. Three different reconstruction models established in the research, namely an automatic encoder, subspace projection and sparse dictionary learning, realize the RID scheme. In addition, three strategies for intention detection by means of electroencephalogram signals are studied. The experimental results show that the RID scheme provided by the invention has good effect on detecting the movement intention of the left fist and the right fist. The scheme lays a foundation for developing a more reliable and practical BCI system.

A reconstructed human intent detection (RID) based method, performed as follows:

step 1, in the field of electroencephalogram analysis, the invention provides an intention detection scheme for one person. Compared to traditional intent recognition schemes, our goal is to determine if a user is performing an intent, which is more critical to building a reliable, flexible BCI system. The electroencephalogram signal is one of the most common monitoring methods for recording the brain electrical activity of a human, and has potential application prospects in the fields of mental control wheelchairs, disease diagnosis and the like.

Most existing electroencephalographic analysis efforts focus on classifying an instance into a predefined class by a recognition task. The usual method consists of two parts: and (5) extracting and classifying the features.

Step 1.1, traditional electroencephalogram feature extraction comprises frequency band-pass filtering and spatial filtering, wherein the frequency band-pass filtering reserves significant information in an active frequency band and filters a non-active frequency band possibly containing noise. Spatial filtering usually employs a common spatial mode and its variations, and in addition, this scheme also develops a combining algorithm, namely a filter bank csp (fbcsp), which shows its competitiveness.

Step 1.2, in the aspect of the components of the classifier, many machine learning methods, such as linear discriminant analysis (lda), support vector machines (svm) and random forests (rf), have been applied in electroencephalogram-based classification of moving images. Recently, some studies investigated deep learning methods of electroencephalography analysis. The most common deep models are Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN). However, these deep models only consider data collected in the target intent task, ignoring the effects of electroencephalographic signals generated when the user is not performing a particular intent and the practical application of electroencephalography-based BCI systems. Thus, in this scenario, rather than categorizing which athletic intent task the user is performing, our goal is to identify whether the user is performing some athletic intent task.

And 2, providing a reconstruction-based electroencephalogram signal intention detection scheme. This scheme is completely different from the classification machine learning approach, which is directed to detection tasks rather than the traditional recognition task. Our approach is theoretically able to handle any real scenario and does not require domain knowledge and the need to manually handle this feature as is commonly used in the electroencephalogram analysis field. Reconstructive models have been widely used in various areas of data mining, particularly the detection or removal of outliers in computer vision. And (4) training the reconstruction model, and regarding the sample with larger reconstruction error as an abnormal value. Furthermore, the reconstructed model may also be used as a classifier. The authors perform classification training on the reconstructed model and classify each sample into the class that best fits the reconstructed model. These methods can be summarized in three broad categories.

Step 2.1, the first type is subspace projection. When the observed values of a set of possible correlated variables are converted into values of a set of linearly uncorrelated variables, one of the most representative methods is Principal Component Analysis (PCA), which obtains a principal subspace through orthogonal transformation. The training set is reconstructed using PCA, kernel PCA, robust PCA, and robust kernel PCA, which select or remove outliers that produce high reconstruction losses.

We have adopted subspace projection as the reconstruction method for this scheme. Our goal is to define a set of m-dimensional vectors whose mappings belong to R^dTo the samples x to belong to R^m(m.ltoreq.d) sample

1) the major components are orthogonal.

2) The variance of the first principal component is the largest and the variance of each subsequent component is progressively smaller.

3) The sum of the changes of all principal components is equal to the sum of the changes of the original variables.

Suppose that

Is formed by d variable v₁，v₂...v_dThe correlation matrix calculated by the training set is reconstructed by the target. D eigenvalue eigenvector pairs are calculated from R and sorted by eigenvalue to obtain (lambda)₁,e₁),(λ₂,e₂),...,(λ_d,e_d) Wherein λ is₁≥λ₂≥...λ_d. And is

in general, the first m-eigenvalue eigenvector pair (λ) is selected₁,e₁),(λ₂,e₂),...,(λ_m,e_m) The resulting projection matrix P is:

P＝(e₁,e₂...e_m)^T(3)

wherein

Any observation x can be converted into:

y＝Px (4)

wherein

and x are not the same. Therefore, we set m < d, considering that an overly perfect reconstruction may limit the effectiveness of identifying data patterns from the reconstruction loss.

Thus, a subspace projection based reconstruction model may be defined as:

wherein

Can be calculated by acquiring a projection matrix of the PCA. After training, the reconstruction error of the electroencephalographic query segment q can be calculated as:

step 2.2, the second category is to learn a compact code to represent the summary training samples. For example, existing work utilizes automated encoders to transform observations from one feature space to a new feature space, where the data separates the varying factors. We deploy the auto-encoder as another reconstruction model in the scheme. The automatic encoder is a reconstruction model based on a neural network, and has good performance on denoising and dimensionality reduction. From three query strategies for motor intention detection, namely trial, recorded and segmented (different EEG data organization strategies for inputting different reconstruction models), we apply two types of automatic encoders: and fully connecting the automatic encoder and the CNN automatic encoder. The method comprises the following specific steps:

step 2.2.1, fully connected autoencoder

Like the multilayer perceptron (MLP), the fully-connected self-encoder is a feedforward artificial neural network. The automatic encoder consists of an encoder

m is the dimension of the hidden layer and a decoder

Φ＝f_Φ(W_Φx+b_Φ) (7)

Ψ＝f_Ψ(W_Ψx+b_Ψ) (8)

wherein

And

f_Φand f_ΨRepresenting the encoding activation function and the decoding activation function, respectively. Thus, in our case, the reconstruction model is defined as:

the parameter θ can be obtained by minimizing:

it should be noted that using a linear activation function, the auto-encoder can generate the same subspace as the PCA. Therefore, we use the nonlinear correction linear unit (relu) as the activation function.

Step 2.2.2 CNN autoencoder

Since electroencephalography signals are time-series data with multiple channel readings at each point in time, the input data to the auto-encoder may be in a two-dimensional (2D) format. Therefore, we apply a CNN auto-encoder as a reconstruction model to process two-dimensional electroencephalographic data. Similar to fully connected autoencoders, CNN autoencoders also have an encoder portion and a decoder portion. The main difference is that the encoder and decoder sections are constructed primarily with convolutional neural networks.

Specifically, there are three 2D-CNN layers at the encoder stage, each layer followed by a max-pooling layer, and three upsampling layer and convolutional layer pairs at the decoder stage. To formalize the two-dimensional convolution operation, the value of the neuron at position (x, y) in the k-th layer feature map of the L layer is given by:

where relu (x) max (0, x) is the activation function. In this equation, b_klIs a^thK in layer^thThe deviation of the mapping of the features is,

is the weight at (P, q) of the kernel connected to this feature map, covering the feature map in the previous layer, P_kAnd Q_lIndicating the size of the kernel.

After each convolutional layer, a max pooling operation with a stride of [2 × 2] is applied, reducing the data dimension by half. Thus, at the decoder stage, we need to extend the data dimension twice in each decoder layer. There are two methods of extending the phase data dimension of a decoder: transpose convolution operation and upsampling interpolation. The working principle of the transposed convolution operation is almost identical to that of the convolution operation, but the inverse: the data dimension of a cell in the input layer in the transposed convolutional layer will be extended to be larger. However, some researchers claim that the interpolated layer of the post-convolutional layer performs better than the transposed convolutional layer, and the nearest neighbor upsampled interpolated value performs best on upsampling. Therefore, in this study we use nearest neighbor upsampling interpolation and convolutional layers as the basic components of the decoder phase. After the CNN decoder, the input data is reconstructed with a final output layer. The final output layer is a convolutional layer, and the size of the output data is the same as that of the input data.

The number of feature mappings of the three convolutional encoding layers at the decoder phase is 16, 8 and 8 respectively, and conversely, the final output convolution at the decoder phase is 1. The kernel size of each convolution operation of the encoder and decoder stages is kept at [3 × 3] and the stride is kept at [1 × 1 ]. The maximum pooled kernel size is set to [2 × 2 ]. Batch normalization is applied to achieve better performance.

Random gradient descent using Adam update rule to minimize loss function:

wherein ζ_jIs a set of parameters of the neural network. Network parameters are as follows 10^-4Is optimized.

And 2.3, learning sparse representation of input data through a dictionary in the third type. It results in redundant atoms, allowing multiple representations for a single sample. Particularly including works that study sparse dictionary learning to improve the sparsity and flexibility of the representation. However, most of the existing studies still follow the basic assumption of "few and different" outliers. Due to the complexity of human intent, we are more complex than outlier detection in contrast. The first difficulty is that the number of "normal fragment" data is larger than the number of fragment data of the execution intention. Further, unlike outliers, "normal fragment" data is typically diverse and may have some association with the fragment data of the execution intent.

And 3, evaluating the motion intention detection tasks of yes and no based on the electroencephalogram, and systematically researching and analyzing different types of electroencephalogram reconstruction models and various electroencephalogram data organization strategies.

We also studied sparse dictionary learning as a reconstruction model to implement RID schemes according to step 2.3 by learning sparse representations of the input data through dictionaries. Sparse dictionary learning is a learning method for constructing compact representations for input data. The potential representation is composed of a dictionary of several atoms and a sparse code. In the reconstruction process, the original input can be well approximated by using sparse coding and linear combination of atoms and atoms. Formally, the reconstruction model is defined as:

wherein theta is_i＝[d₁，d₂，...,d_m]Is a dictionary containing m atoms,

is an input vector

Most of the coefficients of the sparse code of (3) are zero or close to zero. To construct an overcomplete dictionary, the dictionary dimension m is set larger than the input dimension d. The hypercomplex dictionary does not require atoms to be orthogonal, thus allowing for a more flexible dictionary and richer data representation. Dictionary theta_iAnd sparse code c_iThe following optimization problem can be solved to learn in the training phase:

obey | | d_k||₂1 stands for all 1. ltoreq. k. ltoreq.m. The first term is the data fitting term and the second term is the sparsity-induced regularization. This minimization problem is a problem with NP-hard, as there are

And can be produced by mixing

By replacing it with convex slack, i.e. using

To approximate the solution. For theta_iAnd c_iSay one is fixed, then this problem becomes a convex problem. We obtain an approximate result because it is not co-convex (theta)_i,c_i). The optimal sparse code and dictionary can be obtained through an alternate updating scheme:

2) and (3) dictionary refinement: updating dictionary θ by solving equation (15)_iAnd use the sparse code fixed in the last iteration.

In the detection phase, given a query segment q, its sparse code can be computed as c_i(q) reconstruction is represented by θ_ic_i(q) is carried out. Thus the reconstruction error is

After the reconstruction error is obtained, an intent-specific threshold needs to be applied to determine whether the query fragment is an intent-to-execute fragment or a normal fragment, as shown in the final stage of FIG. 1.

3. A reconstruction-based intention detection (RID) solution as described instep 1, wherein the purpose of electroencephalogram-based intention detection is to determine whether a subject has had some intention within a given observation period by analyzing electroencephalographic recordings of a corresponding time period. Without loss of generality and practicality, we assume that the intended execution (if any) may last for a few seconds. Thus, a given observation period may be divided into a series of short segments. The intent detection task may be viewed as determining whether a subject is performing a particular intent in a short segment, and the research objective transitions from a long period of time that may have both an intent-to-perform portion and an unintended-to-perform portion to a short segment with or without a particular intent.

For clarity of description, we explain several terms used throughout the remainder of this document:

1) intent-presentation fragment: a time slice of a subject during presentation of a particular intended task;

2) general fragment: a time segment in which a subject does not perform some intended task;

3) and (3) inquiring the fragment: determining whether a topic performs a time slice of a task;

4) training segment-the segment of time used to train the reconstructed model, which must be a segment of execution intent. Given a query segment of an electroencephalographic recording q, our goal is to determine whether a subject is performing some intended task T in that segment_i. To achieve this goal, our reconstruction-based intention detection (RID) scheme consists of two phases, a training phase and a detection phase, as shown in fig. 1. In the training phase, several sets of electroencephalographic recordings obtained while performing certain intended tasks are used to train class-specific reconstructed models; in the detection phase, the queried electroencephalogram segment is input into a reconstruction model, and it is determined whether a certain motor intention is executed according to the reconstruction error thereof. As shown in the training phase of FIG. 1, each candidate intent requires its own reconstructed model during the training phase. The reconstructed model is a high-level representation extracted from an electroencephalographic recording of the candidate intended activity. It is compact, requiring no manual handling and no profound expertise.

Formally, given a set of N electroencephalographic patches

Corresponding to the same target task T in each period, the reconstruction model is expressed as

As a similarity measure, the relevance of the electroencephalographic query segment to the psychological intent is determined. Reconstruction error

Examples

The present invention describes a new research problem, "whether or not" to perform motor intent tasks, and proposes a reconstruction-based intent detection (RID) scheme that demonstrates the potential ability to address such problems. In the traditional field of human intent analysis, most works attempt to answer the question of "which is the subject who wants to move". However, in practical applications, this may result in erroneous operation. For example, the BCI system controls the brain's intent to open or close a door. If an algorithm only aims at recognizing that a person wants to open or close a door, a user must wear an electroencephalogram headset when he wants to open or close the door, and then take the headset off after opening or closing the door, and if he is wearing the headset all the time, then the opening or closing of the door is uncertain. To solve such a practical problem, a problem as to whether the user is "trying to control a door" or not, i.e., such "a or no" problem is much more difficult than the conventional "a or B" problem because the present invention cannot define the "no" condition as it is commonly defined for "a or B" problems.

For this "a or no" problem, the present invention proposes a reconstruction-based intent detection scheme that utilizes a reconstruction model to represent the "a" state and reconstruction errors to determine the correlation between the query and the "a" state, rather than defining the "no" state. Systematic experiments were performed on a large electroencephalographic dataset containing 55 subjects, who imagined their fist moving left or right. The feasibility of the proposed RID scheme was verified by synthesizing the data with random values between the maximum and minimum possible electroencephalogram readings, the eye-open baseline task and the eye-close baseline task as three "not" cases as query segments. The present invention exhibits good performance on the synthetic query fragments, and obtains a "baseline" query fragment when acquiring data intended to perform a task, even in an environment where baseline data is acquired, thereby obtaining competitive results. The invention discusses three inquiry strategies for movement intention detection, namely trial type, record type and sectional type, and three different reconstruction models are constructed to realize the scheme so as to show the flexibility of the scheme to various reconstruction models. Furthermore, the solution of the invention does not require any manual handling and excessive expertise.

There are many future directions to investigate such a "no or no" problem. One is to build a two-phase algorithm: the first stage is responsible for answering "if" trying to move the fist; the second phase is responsible for answering "which" tries to move the fist. Another direction is to implement a real intent detection BCI system to evaluate the effectiveness of the scheme in real scenarios.

1. The present invention evaluated the effectiveness of the RID protocol on a large-scale electroencephalographic motor intention dataset containing 55 subjects performing left/right punch open and close motor intention tasks, and two baseline tasks (eye open and eye closed). Electroencephalographic data was collected using a BCI2000 instrument with 64 electrode channels and a 160Hz sampling rate. Each subject performed approximately 45 trials (a continuous electroencephalographic recording session in which only one particular mental task was performed), with the balance of the left and right fist motor intent being approximately the same. There were 2347 trials, 1179 left punches and 1168 right punches. Each trial lasted about 4 seconds, data was extracted starting 1 second after the occurrence of a cue (instructing the subject to perform a certain task) until the end of the trial, each trial completed 497 time steps, and during the baseline trial the subject's eyes remained open or closed for 1 minute without performing any psychological task.

To evaluate the intent detection scheme, two query fragments should be used:

1) intent execution query fragment: a query fragment for a subject to actually perform an intended task;

2) common query fragments: an object has virtually any possible mental state, but does not include query fragments of a particular intent.

However, the most difficult part of modeling such a "if" problem is that a generic query fragment may be in many different forms, even without any relationship. In addition, human activities such as walking or facial movements have a significant impact on electroencephalogram readings. Therefore, it is not possible to have a common set of query fragments to exhaustively define and model all "common" mental states. In view of the above, and the fact that electroencephalogram signals are always noisy for electrical reasons, the present invention builds on synthesized data to simulate a "no" electroencephalogram reading in the case of performing a certain motor intent. The synthetic data is constructed using random values between the maximum and minimum possible electroencephalographic readings. Meanwhile, to evaluate the effectiveness of the proposed RID solution, the two baseline tasks in the data set, i.e., eye-open and eye-closed, are used as special states of "no" to perform some motor intent. Thus, the present invention tests three common query fragments and an intent to execute the query fragment:

1) and (3) comprehensive query fragment: a query segment established by randomly selecting a value between the possible maximum and minimum electroencephalogram readings;

2) open eye query segment: a query segment generated using an eye-opening task in the dataset;

3) closed-eye query fragment: query fragments generated using the closed-eye task in the dataset.

4) Intent execution query fragment: the intent execution fragment is used as a query fragment.

2. The invention evaluates the RID scheme to detect two motor intention tasks: imagine the movement of the left and right punches. Therefore, two RID schemes are constructed in the training stage by adopting the left-boxing action intention fragment and the right-boxing action intention fragment. One to answer "see if you imagine moving left fist" and the other to answer "see if you see moving right fist". The left fist motion intention fragment and the right fist motion intention fragment are respectively divided into two parts: one part is used as a training segment and the other part is used as a query segment.

Since a single electroencephalography trial is a long-term trial comprising multiple electroencephalography recordings/time steps, the present invention employs three electroencephalography data organization strategies (inquiry strategies) to evaluate RID protocols, namely, trial strategies, recording strategies, and slicing strategies.

1) Test strategy

The present invention uses separate electroencephalography trials, each trial shape (height 64, length 497) as an example to train the reconstruction model. The height 64 corresponds to 64 electroencephalographic channels, while the length is the number of recording time points in a single electroencephalographic experiment. In the detection phase, the query fragment is also an individual electroencephalography test, shaped the same as that used for training. For each evaluation task, 90% of the intents were randomly selected to perform the fragment tests to train the reconstruction model, and the remaining 10% of the tests were used as query fragments to evaluate the RID solution. The record of the baseline task was cut into identically shaped pieces (height 64 and length 497) and tested individually using the sliding window method without any overlap. The present invention has the same number of synthetic segments, including query segments that are intended to be executed. In this strategy, a CNN autoencoder is used as the autoencoder reconstruction model.

2) Recorded policy

Since there are many recordings, each recorded at a point in time, the present invention uses a single recording as input to the reconstruction model. Thus, an electroencephalogram trial is divided into a number of recorded vectors, each recorded vector having 64 elements, corresponding to 64 electroencephalographic channels. Given the potential similarity that an electroencephalogram trial may have to affect the evaluation process, the recordings of the same electroencephalogram trial are used as training or query segments. The present invention was still trained using 90% of the randomly selected trials, with the remaining 10% of the trials being evaluated. The processing of the generic query fragments is consistent with the training examples. The fully connected auto-encoder serves as a specific auto-encoder reconstruction model.

3) Slicing strategy

Similar to the recording strategy, electroencephalography is divided into several time slices as the time period of interest. Specifically, an electroencephalogram test of one shape (height 64, length 497) was cut into 18 pieces (height 64, length 320) using a sliding window technique with a window size of 320 and sliding steps of 10. The present invention still uses all slices of the same test in either the training phase or the detection phase to avoid the same test similarities. Baseline electroencephalography recordings were also processed using the sliding window method of the same setup. Similar to the experimental strategy, a CNN autoencoder was used as the autoencoder reconstruction model.

Table one: and (3) left boxing: on different target tasks, query strategies and reconstruction models, the normalized relative average reconstruction error of the common query segment to the intention execution query segment.

Table two: and (3) right fist: on different target tasks, query strategies and reconstruction models, the normalized relative average reconstruction error of the common query segment to the intention execution query segment.

3. In two electroencephalogram-based motor intention detections, the present invention evaluates the performance of the RID scheme implemented by subspace projection, auto-encoder, and sparse dictionary learning. The invention carried out systematic experiments of three inquiry strategies on electroencephalogram data of 55 subjects. Since the smaller the assumed value, the larger the reconstruction error, and the greater the correlation of the query fragment with the purpose, the present invention uses the relative average reconstruction error from the ordinary query fragment to the intended execution query fragment to evaluate the detectability of the proposed scheme. The present invention normalizes relative average errors by dividing the average reconstruction error of the intended segment of the executing query.

Normalized Relative Mean Error (NRME) phi_relativeIs defined as:

φ^°and phiⁱThe average reconstruction error for the ordinary and the intended, respectively, execution of the query fragment. The overall results are shown in tables 1 and 2. The result shows that under all the query strategies and the reconstruction models, the average reconstruction errors of the three types of ordinary query segments are always larger than the average reconstruction errors of the corresponding intent execution segments for the two intent tasks. This suggests that the reformulation-based approach has the potential to identify intent executions or common query fragments so that human intent can be detected. The reconstruction error of the synthetic query fragment is about 15 to 35 times the intent to execute the query fragment, which means that the RID scheme of the present invention is very powerful at handling noisy generic query fragments. The difference in reconstruction error between the baseline query segment (eyes closed or eyes open) and the intent to execute the query segment is relatively small compared to the synthetic query segment. The reason for this is that electroencephalogram acquisition for the intended performance task and the baseline task are similar: subjects were asked to sit in front of the computer, perform both intent and baseline tasks within the same time period, and do not make unnecessary physical movements. Furthermore, imagine that fist movements primarily affect the three electroencephalographic channels "c 3, c4, and cz", while other channels may fluctuate in a similar manner to the baseline task. Even with such special query conditions, the RID scheme of the present invention can still achieve a difference of around 30% in reconstruction error.

For the left-fist intention task and the right-fist intention task, the subspace projection reconstruction model adopts a fragment query strategy to obtain the optimal average result. In a detailed analysis, the subspace projection method using the logbook query strategy yields the best results in most cases, while the results obtained in other cases are slightly lower than the best results. The example input dimensions of the logbook query strategy are smaller, 64, while the example dimensions of the tri-axial or sliced query strategy are 64 x 497 and 64 x 320, respectively, much larger than the example dimensions of the logbook query strategy. This increases the difficulty of building a robust reconstruction model. Furthermore, the logged strategy has more training instances than other strategies, and also helps to train a more generalized reconstructed model.

FIGS. 3-6 depict detailed statistical experimental results for two intent detection tasks. The normalized relative reconstruction error is calculated as follows:

wherein e_kIs the reconstruction error for the k-th instance,

As shown in tables 1 and 2, the reconstruction error of the synthesized query fragment is an order of magnitude greater than the reconstruction error of the intent to execute the query fragment, so the present invention does not add it to the reconstruction error profile (FIGS. 3 and 5).

The results show that under all query strategies and all reconstruction models, only the segment of intent to execute the query appears in the leftmost part of the relative reconstruction error distribution. This means that the reconstructed model successfully reveals the underlying pattern of the trained electroencephalographic signal. However, open-eye and closed-eye query segments obtain a large overlap in the reconstruction error distribution of the query segment intended for execution. This is due in large part to the similar situation in conducting experiments to obtain electroencephalographic data for different tasks, resulting in only a few electroencephalographic channels working in different ways. In the general case, an electroencephalogram reading may be any possible value (synthetic query fragment), and tables 1 and 2 indicate that its relative reconstruction error is much greater than the reconstruction error of the intent to execute the query fragment.

In order to make the final arbitration, the present invention must use a threshold value, so that the query segment with reconstruction error smaller than the threshold value can be divided to execute a certain intention task, and the query segment with reconstruction error larger than the threshold value is determined not to execute a certain intention task. The present invention uses accuracy and recall to describe the performance of the invention at different thresholds. Accuracy may be interpreted as "how accurate it is when an electroencephalogram cycle is determined whether some intent is being performed. The higher the value of accuracy, the less uncertain operation of the system and therefore the higher the reliability of the system. The recall may be interpreted as "multiple intents were performed and how many intents could be identified". The higher the value of the recall rate, the more sensitive the system. Therefore, accuracy and recall are important to construct an efficient and effective system. Fig. 4 and 6 show the average accuracy recovery threshold curves for the left-punch and right-punch intended tasks, respectively. The results show that the accuracy of all strategies and reconstructed models can reach around 100% at a lower threshold and then decrease to 50% of the random value as the threshold increases. The peak in recall occurs to the right of the peak in accuracy. The range of the fluctuation of the recall rate is small and is different from 50% to 65%.

The F1 score is an evaluation method considering both accuracy and recall, and fig. 7 shows the F1 score with optimal decision threshold for different query strategies and different reformulation models. The results indicate that the cut-to-slice query strategy gave the best results when tradeoffs were made between accuracy and recall, with the F1 score for the record-based query strategy being significantly lower than the other two strategies. In most cases, the subspace projection-based reconstruction model provides the best F1 score, while in other cases it can be more competitive. The optimal F1 score for the detection of the intention of left boxing is 66.64%, and the optimal F1 score is R by using a subspace projection reconstruction model and a slicing query strategy. And (3) adopting a subspace projection reconstruction model and combining a trial calculation query strategy, wherein the detection rate of the movement intention of the right fist is 66.38%. In general, the proposed RID scheme achieves perfect performance within a certain threshold range by arbitrarily selecting possible electroencephalogram values, as shown in fig. 8 and 9. In summary, the subspace projection model provides a wider threshold range for perfect performance detection.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

Translated fromChinese

1.一种基于重构模型的人体运动意向检测方法，其特征在于：1. a human body motion intention detection method based on reconstruction model, is characterized in that:

步骤1、训练阶段，利用执行特定的人工处理意图任务的脑电信号训练重构模型，包括特征提取步骤和分类步骤；特征提取步骤采用组合算法波器组FBCSP进行特征提取；分类步骤通过分类器，分类识别用户是否正在执行某个运动意图任务；Step 1. In the training phase, the reconstruction model is trained by using the EEG signals that perform specific manual processing intention tasks, including a feature extraction step and a classification step; the feature extraction step adopts the combined algorithm wave filter group FBCSP to perform feature extraction; , classify and identify whether the user is performing a certain motion intent task;

步骤2、检测阶段，将待确定的脑电周期输入重构模型，计算其重构误差；重构误差越小，在检测阶段，将待确定的脑电周期输入重构模型，计算其重构误差，重构误差越小，该观察周期存在某个运动意图任务的可能性越大；Step 2. In the detection stage, the EEG cycle to be determined is input into the reconstruction model, and the reconstruction error is calculated; the smaller the reconstruction error is, in the detection stage, the EEG cycle to be determined is input into the reconstruction model, and its reconstruction is calculated. error, the smaller the reconstruction error, the greater the possibility that there is a certain movement intent task in the observation period;

步骤2.1、采用子空间投影，对重构模型进行训练，将重构误差较大的样本视为异常值，重构模型也可用作分类器，对重构模型进行分类训练，并将每个样本分类到最适合重构模型的类别；Step 2.1. Use subspace projection to train the reconstructed model, and treat the samples with large reconstruction errors as outliers. The reconstructed model can also be used as a classifier to classify and train the reconstructed model. The samples are classified into the categories most suitable for the reconstructed model;

步骤2.2、自动编码器，对重构模型进行训练，根据三种用于运动意图检测的查询策略，即尝试式、记录式和分段式应用了两种类型的自动编码器：全连接自动编码器和CNN自动编码器；Step 2.2, Autoencoder, trains the reconstructed model, and applies two types of autoencoders according to three query strategies for motion intent detection, namely trial, record and segment: fully connected autoencoder encoder and CNN autoencoder;

步骤2.3稀疏字典学习作为实现RID方案的重构模型，通过字典学习输入数据的稀疏表示，导致了冗余原子，允许单个样本有多个表示，特别是包括在内的研究稀疏字典学习以提高表示的稀疏性和灵活性的著作；普通片段数据的数量大于执行意图的片段数据的数量、多种多样的，并且可能与执行意图的片段数据有一些关联；潜在表示是由几个原子和一个稀疏代码组成的字典组成的，在重构过程中，利用所述的稀疏编码，以及原子本身和原子的线性组合，逼近原始输入。Step 2.3 Sparse dictionary learning As a reconstruction model to implement the RID scheme, a sparse representation of the input data is learned through a dictionary, resulting in redundant atoms, allowing multiple representations for a single sample, especially studies that include sparse dictionary learning to improve representation work on sparsity and flexibility; the amount of ordinary fragment data is larger than that of execution-intent fragment data, is diverse, and may have some correlation with execution-intent fragment data; latent representation is composed of several atoms and a sparse The code is composed of a dictionary, and in the reconstruction process, the original input is approximated using the described sparse coding, as well as the atom itself and the linear combination of the atom.

2.根据权利要求1所述的基于重构模型的人体运动意向检测方法，其特征在于步骤2.1采用子空间投影，定义一组m维向量，这些向量映射属于R^d的样本x到属于R^m(m≤d)的样本

并且主成分是通过具有三个性质的m变量得到的，性质如下：2. the human body motion intention detection method based on reconstruction model according to claim 1, is characterized in that step 2.1 adopts subspace projection, defines a group of m-dimensional vectors, and these vector maps belong to the sample x of R^d to belong to R^m (m≤d) samples

And the principal components are obtained by m variables with three properties, the properties are as follows:1)主要成分是正交的；1) The main components are orthogonal;

2)第一个主成分的方差最大，每个后续成分的方差逐渐减小；2) The variance of the first principal component is the largest, and the variance of each subsequent component gradually decreases;

3)所有主要成分的变化之和等于原始变量的变化之和；3) The sum of the changes of all principal components is equal to the sum of the changes of the original variables;

假设

是由d变量v₁，v₂...v_d的目标重构训练集计算出的相关矩阵，从R中计算d个特征值特征向量对，并按特征值排序，得到(λ₁,e₁),(λ₂,e₂),...,(λ_d,e_d)，其中λ₁≥λ₂≥...λ_d；且

样本第i次的主成分x＝(x₁,x₂,...x_d)^T可计算为：Assumption

is the correlation matrix calculated from the target reconstruction training set of d variables v₁ , v₂ ... v_d , calculates d eigenvalue eigenvector pairs from R, and sorts them by eigenvalues to get (λ₁ , e₁ ),(λ₂ ,e₂ ),...,(λ_d ,e_d ), where λ₁ ≥λ₂ ≥...λ_d ; and

The i-th principal component x=(x₁ , x₂ ,...x_d )^T can be calculated as:

选择第一个m特征值特征向量对(λ₁,e₁),(λ₂,e₂),...,(λ_m,e_m)，得到投影矩阵P为：Select the first m eigenvalue eigenvector pair (λ₁ ,e₁ ),(λ₂ ,e₂ ),...,(λ_m ,e_m ), and the projection matrix P is obtained as:

P＝(e₁,e₂...e_m)^T (3)P=(e₁ ,e₂ ...e_m )^T (3)

其中

所以任何观测值x都可以转换为：in

So any observation x can be transformed into:

y＝Px (4)y=Px (4)

其中

由于特征向量的特性，重构过程很简单：in

Due to the nature of the eigenvectors, the reconstruction process is simple:

应该注意的是，只有当m＝d,P^-1＝P^T时，重构才是完美的；否则，由于压缩，

和x是不相同的，设置m＜d，It should be noted that the reconstruction is perfect only when m = d, P^-1 = P^T ; otherwise, due to compression,

is different from x, set m<d,

考虑到过于完美的重构可能会限制从重构损失中识别数据模式的有效性；Considering that too-perfect reconstruction may limit the effectiveness of identifying data patterns from reconstruction loss;

因此，基于子空间投影的重构模型可以定义为：Therefore, the reconstruction model based on subspace projection can be defined as:

其中

可以通过获取PCA的投影矩阵来计算；训练后，脑电图查询片段q的重构误差可计算为：

in

It can be calculated by obtaining the projection matrix of PCA; after training, the reconstruction error of the EEG query segment q can be calculated as:

3.根据权利要求1所述的基于重构模型的人体运动意向检测方法，其特征在于步骤2.2、将自动编码器作为重构模型，根据三种用于运动意图检测的查询策略，即尝试式、记录式和分段式应用了两种类型的自动编码器：全连接自动编码器和CNN自动编码器，具体如下：3. the human body motion intention detection method based on reconstruction model according to claim 1 is characterized in that step 2.2, with automatic encoder as reconstruction model, according to three kinds of query strategies for motion intention detection, i.e. trial type , recorded and piecewise applied two types of autoencoders: fully connected autoencoders and CNN autoencoders, as follows:

步骤2.2.1、完全连接自动编码器Step 2.2.1, fully connect the autoencoder

全连接自编码器由一个编码器Φ:

m是隐层的维数和一个解码器Ψ:

每一层都由相应的权重W，偏见b和激活功能：The fully connected autoencoder consists of an encoder Φ:

m is the dimension of the hidden layer and a decoder Ψ:

Each layer consists of the corresponding weight W, bias b and activation function:

Φ＝f_Φ(W_Φx+b_Φ) (7)Φ=f_Φ (W_Φ x+b_Φ ) (7)

Ψ＝f_Ψ(W_Ψx+b_Ψ) (8)Ψ=f_Ψ (W_Ψ x+b_Ψ ) (8)

其中

和

f_Φ和f_Ψ分别表示编码激活函数和解码激活函数，重构模型定义为：in

and

f_Φ and f_Ψ represent the encoding activation function and the decoding activation function, respectively, and the reconstruction model is defined as:

参数θ可以通过最小化得到：The parameter θ can be obtained by minimizing:

训练后，脑电图查询片段q的重构误差可计算为：After training, the reconstruction error of the EEG query segment q can be calculated as:

应该注意的是，使用线性激活函数，自动编码器可以生成与PCA相同的子空间；因此，将非线性校正线性单元作为激活函数；It should be noted that with a linear activation function, the autoencoder can generate the same subspace as PCA; therefore, the nonlinear correction linear unit is used as the activation function;

步骤2.2.2、CNN自动编码器Step 2.2.2, CNN auto-encoder

由于脑电图信号是在每个时间点具有多个通道读数的时间序列数据，因此自动编码器的输入数据可以是二维(2D)格式；所以，我们应用CNN自动编码器作为重构模型来处理二维脑电图数据；与完全连接的自动编码器类似，CNN自动编码器也有一个编码器部分和一个解码器部分；主要区别在于编码器和解码器部分主要是用卷积神经网络构建的；Since EEG signals are time series data with multiple channel readings at each time point, the input data to the autoencoder can be in two-dimensional (2D) format; therefore, we apply the CNN autoencoder as a reconstruction model to Processes 2D EEG data; similar to fully connected autoencoders, CNN autoencoders also have an encoder part and a decoder part; the main difference is that the encoder and decoder parts are mainly built with convolutional neural networks ;

具体地说，在编码器阶段有三个2D-CNN层，每个层后接一个最大池化层，在解码器层有三个向上采样层和卷积层对；为了使二维卷积运算形式化，L层的第k层特征图中位置(x,y)处神经元的值由下式给出：Specifically, there are three 2D-CNN layers in the encoder stage, each followed by a max-pooling layer, and three pairs of upsampling and convolutional layers in the decoder layer; in order to formalize the 2D convolution operation , the value of the neuron at position (x, y) in the feature map of layer k in layer L is given by:

其中relu(x)＝max(0,x)是激活函数；在这个方程中，b_kl是l^th层中k^th特征映射的偏差，

是连接到这个特征映射的内核的(p,q)处的权重，覆盖上一层中的特征映射，P_k和Q_l表示内核的大小；where relu(x)=max(0,x) is the activation function; in this equation, b_kl is the bias of the k^th feature map in the l^th layer,

is the weight at (p, q) of the kernel connected to this feature map, covering the feature map in the previous layer, and P_k and Q_l represent the size of the kernel;

在每个卷积层之后，应用了一个步幅为[2×2]的最大池化操作，使数据维度减少一半，因此，在译码器阶段，需要在每个译码器层中扩展两次数据维度；扩展译码器相位数据维数的方法有两种：转置卷积运算和上采样插值，将最近邻上采样内插法与卷积层作为译码器相位的基本组成部分；在CNN解码器之后，用一个最终的输出层来重构输入数据；最终输出层是一个卷积层，输出数据大小与输入数据大小相同；After each convolutional layer, a max pooling operation with stride [2 × 2] is applied to reduce the data dimension by half, so in the decoder stage, it is necessary to expand two layers in each decoder layer Secondary data dimension; There are two ways to expand the decoder phase data dimension: transpose convolution operation and upsampling interpolation, and use the nearest neighbor upsampling interpolation method and convolution layer as the basic components of the decoder phase; After the CNN decoder, a final output layer is used to reconstruct the input data; the final output layer is a convolutional layer, and the output data size is the same as the input data size;

使用Adam更新规则的随机梯度下降来最小化损失函数：Use stochastic gradient descent with Adam's update rule to minimize the loss function:

其中ζ_j是神经网络的参数集，网络参数以10^-4的学习率进行优化；where ζ_j is the parameter set of the neural network, and the network parameters are optimized with a learning rate of 10^-4 ;

步骤2.3、通过字典学习输入数据的稀疏表示，导致了冗余原子，允许单个样本有多个表示，特别是包括在内的研究稀疏字典学习以提高表示的稀疏性和灵活性的著作；普通片段数据的数量大于执行意图的片段数据的数量、多种多样的，并且可能与执行意图的片段数据有一些关联。Step 2.3. Learning a sparse representation of the input data through a dictionary, resulting in redundant atoms, allowing multiple representations for a single sample, especially works that study sparse dictionary learning to improve the sparsity and flexibility of representations; common fragments The amount of data is larger than the amount of fragment data of the execution intent, is diverse, and may have some correlation with the fragment data of the execution intent.

4.根据权利要求3所述的基于重构模型的人体运动意向检测方法，其特征在于步骤3、稀疏字典学习作为实现RID方案的重构模型4. the human body motion intention detection method based on reconstruction model according to claim 3 is characterized in that step 3, sparse dictionary learning are as the reconstruction model that realizes RID scheme

潜在表示是由几个原子和一个稀疏代码组成的字典组成的，在重构过程中，利用所述的稀疏编码，以及原子本身和原子的线性组合，逼近原始输入；形式上，重构模型定义为：The latent representation is composed of a dictionary of several atoms and a sparse code, and during reconstruction, the original input is approximated using said sparse code, as well as a linear combination of the atoms themselves and atoms; formally, the reconstruction model defines for:

其中θ_i＝[d₁，d₂，...,d_m]是包含m个原子的字典，

是输入向量

的稀疏代码，大部分的系数为零或接近零，以构造一个过度完成的字典，字典维数m被设置为大于输入维数d；超复杂字典不要求原子是正交的，因此允许用更灵活的字典和更丰富的数据表示；字典θ_i和稀疏代码c_i可以在训练阶段通过解决以下优化问题来学习：where θ_i = [d₁ , d₂ , ..., d_m ] is a dictionary containing m atoms,

is the input vector

sparse code, most of the coefficients are zero or close to zero, to construct an over-complete dictionary, the dictionary dimension m is set to be larger than the input dimension d; over-complex dictionaries do not require atoms to be orthogonal, so allow more Flexible dictionaries and richer data representations; the dictionary_θi and the sparse code_ci can be learned during the training phase by solving the following optimization problems:

服从||d_k||₂＝1代表所有1≤k≤m；第一个项是数据拟合项，第二个项是稀疏诱导正则化；这个极小化问题是一个NP-hard的问题，因为有

并且可以通过将

替换为它的凸松弛，即用

来近似求解；对于θ_i和c_i来说，一个是固定的，那么这个问题就变成了一个凸问题；我们得到一个近似的结果，因为它不是共同凸(θ_i,c_i)；可通过交替更新方案，得到最优稀疏码和字典：Obey ||d_k ||₂ = 1 for all 1≤k≤m; the first term is the data fitting term, and the second term is the sparsity-induced regularization; this minimization problem is an NP-hard problem , because there are

and can be

is replaced by its convex relaxation, i.e.

to approximate the solution; for θ_i and c_i , one is fixed, then the problem becomes a convex problem; we get an approximate result because it is not jointly convex (θ_i , c_i ); but Through the alternate update scheme, the optimal sparse code and dictionary are obtained:

1)稀疏码近似：通过求解方程(15)更新稀疏码c_i，并使用上一次迭代修正的字典；1) sparse code approximation: update the sparse code_ci by solving equation (15), and use the dictionary revised by the previous iteration;

2)字典精化：通过求解方程(15)更新字典θ_i，并使用上一次迭代中固定的稀疏代码；2) Dictionary refinement: update the dictionary θ_i by solving equation (15) and use the sparse code fixed in the previous iteration;

在检测阶段，给定一个查询片段q，其稀疏代码可以计算为c_i(q)，重构表示为θ_ic_i(q)；因此重构误差为

In the detection stage, given a query fragment q, its sparse code can be computed as c_i (q), and the reconstruction is denoted as θ_i c_i (q); thus the reconstruction error is

在获得重构错误之后，需要应用一个特定于意图的阈值来确定查询片段是意图执行片段还是普通片段。After getting the refactoring error, an intent-specific threshold needs to be applied to determine whether the query fragment is an intent-execution fragment or a normal fragment.

5.根据权利要求1所述的基于重构模型的人体运动意向检测方法，其特征在于给定的观测周期可分为一系列短片段，意图检测任务可以看作是确定一个主体在一个短片段中是否正在执行某一特定的意图，并且研究目标从可能同时具有意图执行部分和非意图执行部分的长时期转变为具有或不具有特定意图的短片段意图-演示片段为一个主体在演示一个特定意图任务期间的时间片段；普通片段为一个主体没有执行某个意图任务的时间片段；查询片段为确定主题是否执行某项任务的时间片段；训练片段为用来训练重构模型的时间片段，它必须是一个执行意图的片段；5. The method for detecting human motion intention based on a reconstructed model according to claim 1, wherein a given observation period can be divided into a series of short segments, and the intent detection task can be regarded as determining that a subject is in a short segment Whether a specific intent is being performed in the 2010 and the research goal shifts from long periods of time that may have both intent-executing and non-intent-executing parts to short clips with or without specific The time segment during the intent task; the normal segment is the time segment when a subject does not perform a certain intent task; the query segment is the time segment to determine whether the subject performs a certain task; the training segment is the time segment used to train the reconstruction model, which Must be a fragment that executes the intent;

给定一个脑电图记录q的查询片段，确定受试者在该片段中是否在执行某种意图任务T_i；形式上，给定一组N脑电图碎片

在每段时期对应于相同的目的任务T_i，重构模型表示为

d在哪里输入重构模型的维度和θ_i是重构模型的参数,可以通过训练阶段通过最小化损失函数:Given a query fragment of an EEG record q, determine whether a subject is performing some intent task T_i in that fragment; formally, given a set of N EEG fragments

In each epoch corresponding to the same target task T_i , the reconstructed model is expressed as

where d is the input dimension of the reconstructed model and_θi are the parameters of the reconstructed model, which can be passed through the training phase by minimizing the loss function:

在建立重构模型后，对每个脑电图查询片段q，计算重构误差

作为相似度度量，确定脑电图查询片段与心理意图的相关性；重构误差

越小，脑电图查询片段q与心理意图T_i的相关性越大。After the reconstruction model is established, the segment q is queried for each EEG, and the reconstruction error is calculated

As a similarity measure, determining the correlation of EEG query fragments with mental intent; reconstruction error

The smaller the correlation, the greater the correlation between the_EEG query segment q and the mental intention Ti.

6.根据权利要求1所述的基于重构模型的人体运动意向检测方法，其特征在于：在执行是或否的运动意向检测中，子空间投影、自动编码器和稀疏字典学习实现的RID方案的性能，通过划分执行查询的意图片段的平均重构错误来标准化相对平均错误；6. The human body motion intention detection method based on reconstruction model according to claim 1, is characterized in that: in performing the motion intention detection of yes or no, the RID scheme realized by subspace projection, automatic encoder and sparse dictionary learning performance, normalizing the relative mean error by dividing the mean reconstruction error of the intent fragments that execute the query;

标准化相对平均误差(NRME)φ_relative定义为：The normalized relative mean error (NRME) φ_relative is defined as:

φ^o和φⁱ分别是执行查询片段的普通和意向的平均重构误差。φ^o and φⁱ are the ordinary and intentional mean reconstruction errors of executing query fragments, respectively.

7.根据权利要求1所述的基于重构模型的人体运动意向检测方法，其特征在于：归一化相对重构误差的计算方法如下：7. the human body motion intention detection method based on reconstruction model according to claim 1, is characterized in that: the calculation method of normalized relative reconstruction error is as follows:

其中e_k是第k次实例的重构误差，

是执行查询片段的意图的平均重构误差。where_ek is the reconstruction error of the kth instance,