Human motion intention detection method based on reconstruction modelTechnical Field
The present invention relates to human motion intention detection, and more particularly, to a human intention detection (RID) method based on reconstruction and applications thereof.
Background
Electroencephalography (EEG) signals are voltages collected from different parts of the user's scalp to measure brain activity. Electroencephalography has been widely used in brain-computer interface (BCI) systems because of its zero clinical risk and possession of portable acquisition equipment. The BCI system provides a potential bridge from the brain to peripheral devices for normal as well as disabled users. With this BCI technique, signals of brain activity (e.g., motor intent) can be used as control commands. One classic application of this system is BCI assisted stroke rehabilitation. The study of electroencephalographic signals is receiving increasing attention from researchers due to its potential medical and industrial application prospects.
Since BCI systems have achieved paradigm shifts, machine learning has been recognized as a key tool for electroencephalographic analysis. Although previous studies have demonstrated the effectiveness of DE15 encoding brain signals, there are still well-known challenges that prevent widespread use of electroencephalogram-based BCI systems. One of the key obstacles is that most electroencephalogram decoding studies are focused on answering questions such as "which hand is a subject's imagined movement, left or right? "such a problem can be modeled as a classification problem and solved by a supervised learning approach. However, such a problem necessarily precedes the above-mentioned problem: "is a subject imagining to move a hand? Before answering the question "which", the BCI system may do some uncertain action if not answering the question "no". For example, a wheelchair based on BCI technology has proven promising through several rudimentary forms controlled by motor intent. Assuming that the core algorithm of the system is a four-level classification model, which controls the wheelchair to move in four directions, the subject should maintain a fixed and accurate movement intention to accurately control the direction of the wheelchair movement. However, the real world situation is more complex and surprising, since the user's true intent is typically 30 occurrences in different and infrequent places. For example, if the user is interrupted by an event (e.g., a telephone call) or the user's mind has just left, then the current solution cannot handle this situation, but at this time it still decides into which of the four classes the brain signals are classified and then controls the wheelchair in the corresponding direction. Thus, such systems are not reliable nor practical in nature and may also lead to serious accidents. In addition to uncertainty, collecting intent data can be cumbersome because it requires the subject to focus on, which is particularly impractical for children and others who have special needs (e.g., BCI assisted autism therapy), the elderly, or the disabled. The scarcity of intent occurrences and the difficulty of obtaining accurate motor intent have resulted in inadequate data obtained through training.
It would be further desirable if the intent detection method could first determine whether the user is performing an intent to move and then determine the direction in which the user wishes to move. Therefore, the whole system is more powerful, and can better cope with complex situations. Thus, in this study, the present invention is directed to answering such a "yes or no" question, which can be abstracted by electroencephalogram signals as an intent-to-detect question. The objective is to accurately determine whether a subject has some motor intent during the observation period.
However, challenges still remain for the following reasons. One reason is that the electroencephalogram signal itself, which exhibits strong variability from subject to subject, varies greatly between different segments of the electroencephalogram signal for the same user, even in recordings made in the same experiment. Another key challenge is that although this problem appears to be a binary classification problem, the invention does not know the "no" class. Unlike previous electroencephalographic studies in which a subject is directed to perform certain mental tasks, such as imagining movement of the left or right hand, the present invention cannot define a user's mental state by excluding that he is not performing a certain mental task. The possibility of the user's mental state is unlimited except for certain intentions, such as imagining a movie, a food, and even something that he does not know himself. Therefore, the conventional supervised classification method in the field of machine learning cannot solve this problem.
Disclosure of Invention
1. Objects of the invention
A brain-computer interface (BCI) enables a human being to communicate with and intuitively control external devices through brain signals. Successful detection of motor intent paves the way to developing BCI applications. The current study is mainly focused on answering questions such as "which hand of a subject is moving in imagination, right hand or left hand? ". In answering the question of "whether a subject is imagining to move one hand", because the present invention cannot continuously perform the intention detecting task during the use of the brain wave detecting apparatus, while the interval period (where some motor intention detection is not performed) may cause an unexpected operation, resulting in a BCI system malfunction. However, this intent detection task is more difficult because it is difficult to know what the "no" case is for a "no" question, and thus obtain training samples for the "no" case. Furthermore, certain infrequent athletic intentions or accidents actually make the intent detection task more difficult. In order to solve the problem, a human motion intention detection method based on a reconstructed model is provided.
2. The technical scheme adopted by the invention
The invention discloses a human motion intention detection method based on a reconstructed model, which is carried out according to the following steps:
step 1, in a training stage, training a reconstruction model by utilizing electroencephalogram signals for executing a specific artificial processing intention task, wherein the training stage comprises a characteristic extraction step and a classification step; in the characteristic extraction step, a wave group FBCSP with a combined algorithm is adopted for characteristic extraction; the classification step is to classify and identify whether the user is executing a certain movement intention task or not through a classifier;
step 2, in the detection stage, the electroencephalogram period to be determined is input into a reconstruction model, and the reconstruction error of the reconstruction model is calculated; the smaller the reconstruction error is, in the detection stage, inputting the electroencephalogram period to be determined into a reconstruction model, and calculating the reconstruction error of the electroencephalogram period, wherein the smaller the reconstruction error is, the higher the possibility that a certain movement intention task exists in the observation period is;
step 2.1, training the reconstruction model by adopting subspace projection, taking the sample with larger reconstruction error as an abnormal value, using the reconstruction model as a classifier, carrying out classification training on the reconstruction model, and classifying each sample into the class most suitable for the reconstruction model;
step 2.2, the automatic encoder trains the reconstruction model, and two types of automatic encoders are applied according to three query strategies for detecting the movement intention, namely trial type, recording type and sectional type: fully connecting an automatic encoder and a CNN automatic encoder;
step 2.3, learning of a sparse dictionary is used as a reconstruction model for realizing the RID scheme, sparse representation of input data is learned through the dictionary, redundant atoms are caused, a single sample is allowed to have multiple representations, and particularly, the study of sparse dictionary learning is included so as to improve the representation sparsity and flexibility; the number of normal fragment data is larger than the number of fragment data of the execution intention, is various, and may have some association with the fragment data of the execution intention; the potential representation is composed of a dictionary of atoms and a sparse code, which is used during reconstruction to approximate the original input, using the sparse code, and a linear combination of the atoms themselves and the atoms.
Further, step 2.1 defines a set of m-dimensional vectors using subspace projection, the vector mappings belonging to R
dTo the samples x to belong to R
m(m.ltoreq.d) sample
And the principal component is obtained by m variables with three properties, as follows:
1) the major components are orthogonal;
2) the variance of the first principal component is the largest, and the variance of each subsequent component is gradually reduced;
3) the sum of the variations of all the principal components is equal to the sum of the variations of the original variables;
suppose that
Is formed by d variable v
1,v
2...v
dThe correlation matrix calculated by the target reconstruction training set calculates d eigenvalue eigenvector pairs from R, and orders according to the eigenvalues to obtain (lambda)
1,e
1),(λ
2,e
2),...,(λ
d,e
d) Wherein λ is
1≥λ
2≥...λ
d(ii) a And is
Principal component x at ith order of sample ═ x (x)
1,x
2,...x
d)
TCan be calculated as:
selecting a first pair of m eigenvalue eigenvectors (λ)1,e1),(λ2,e2),...,(λm,em) The resulting projection matrix P is:
P=(e1,e2...em)T(3)
wherein
Any observation x can be converted into:
y=Px (4)
wherein
Due to the characteristics of the feature vectors, the reconstruction process is simple:
it should be noted that P is only present when m ═ d
-1=P
TThe reconstruction is perfect; otherwise, due to the compression,
and x are different, m is set to be less than d,
considering that an overly perfect reconstruction may limit the effectiveness of identifying data patterns from reconstruction losses;
thus, a subspace projection based reconstruction model may be defined as:
wherein
Can be calculated by acquiring a projection matrix of the PCA; after training, the reconstruction error of the electroencephalographic query segment q can be calculated as:
further, step 2.2, using the autoencoder as a reconstruction model, two types of autoencoders are applied according to three query strategies for motion intention detection, namely trial, record and segment: fully connect automatic encoder and CNN automatic encoder, specifically as follows:
step 2.2.1, fully connected autoencoder
The fully-connected self-encoder consists of an encoder
m is the dimension of the hidden layer and a decoder
Each layer is defined by a corresponding weight W, bias b and activate function:
Φ=fΦ(WΦx+bΦ) (7)
Ψ=fΨ(WΨx+bΨ) (8)
wherein
And
f
Φand f
ΨRespectively representing an encoding activation function and a decoding activation function, the reconstruction model being defined as:
the parameter θ can be obtained by minimizing:
after training, the reconstruction error of the electroencephalographic query segment q can be calculated as:
it should be noted that using a linear activation function, the auto-encoder can generate the same subspace as the PCA; therefore, a nonlinear correction linear unit is taken as an activation function;
step 2.2.2 CNN autoencoder
Since electroencephalography signals are time series data with multiple channel readings at each point in time, the input data to the auto-encoder may be in a two-dimensional (2D) format; therefore, we apply CNN autoencoder as reconstruction model to process two-dimensional electroencephalographic data; similar to fully connected autoencoders, CNN autoencoders also have an encoder portion and a decoder portion; the main difference is that the encoder and decoder sections are constructed mainly with convolutional neural networks;
specifically, there are three 2D-CNN layers at the encoder stage, one maximum pooling layer after each layer, three upsampling layers and convolutional layer pairs at the decoder layer; to formalize the two-dimensional convolution operation, the value of the neuron at position (x, y) in the k-th layer feature map of the L layer is given by:
where relu (x) max (0, x) is the activation function; in this equation, b
klIs a
thK in layer
thThe deviation of the mapping of the features is,
is the weight at (P, q) of the kernel connected to this feature map, covering the feature map in the previous layer, P
kAnd Q
lRepresents the size of the kernel;
after each convolutional layer, a maximum pooling operation with a stride of [2 × 2] is applied to reduce the data dimension by half, so that at the decoder stage, the data dimension needs to be expanded twice in each decoder layer; there are two methods of extending the phase data dimension of a decoder: transpose convolution operation and up-sampling interpolation, and using nearest neighbor up-sampling interpolation method and convolution layer as the basic component of decoder phase; after the CNN decoder, reconstructing the input data with a final output layer; the final output layer is a convolutional layer, and the size of output data is the same as that of input data;
random gradient descent using Adam update rule to minimize loss function:
wherein ζjIs a set of parameters for a neural network, the network parameters being numbered 10-4Optimizing the learning rate;
step 2.3, learning sparse representation of input data through a dictionary, resulting in redundant atoms, allowing a single sample to have multiple representations, especially including works for studying sparse dictionary learning to improve representation sparsity and flexibility; the number of normal fragment data is larger than the number of fragment data of the execution intention, is various, and may have some association with the fragment data of the execution intention.
Further, step 3, learning of a sparse dictionary is taken as a reconstruction model for realizing the RID scheme
The potential representation is composed of a dictionary composed of a plurality of atoms and a sparse code, and the original input is approximated by utilizing the sparse code and the linear combination of the atoms and the atoms in the reconstruction process; formally, the reconstruction model is defined as:
wherein theta is
i=[d
1,d
2,...,d
m]Is a dictionary containing m atoms,
is an input vector
For sparse codes of (1), most of the coefficients are zero or close to zero, to construct an overcomplete dictionary, the dictionary dimension m being set larger than the input dimension d; the hypercomplex dictionary does not require atoms to be orthogonal, thus allowing for a more flexible dictionary and richer data representation; dictionary theta
iAnd sparse code c
iThe following optimization problem can be solved to learn in the training phase:
obey | | d
k||
21 represents all 1. ltoreq. k. ltoreq.m; the first term is the data fitting term, the second term is the sparse induction regularization; this minimization problem is a problem with NP-hard, as there are
And can be produced by mixing
By replacing it with convex slack, i.e. using
To approximate a solution; for theta
iAnd c
iTo say, one is fixed, then this problem becomes a convex one; we obtain an approximate result because it is not co-convex (theta)
i,c
i) (ii) a The optimal sparse code and dictionary can be obtained through an alternate updating scheme:
1) sparse code approximation: updating sparse code c by solving equation (15)iAnd using the dictionary iteratively corrected last time;
2) and (3) dictionary refinement: updating dictionary θ by solving equation (15)iAnd using the sparse code fixed in the last iteration;
in the detection phase, given a query segment q, its sparse code can be computed as c
i(q) reconstruction is represented by θ
ic
i(q); thus the reconstruction error is
After the reconstruction error is obtained, an intent-specific threshold needs to be applied to determine whether the query fragment is an intent execution fragment or a normal fragment.
Furthermore, a given observation period may be divided into a series of short segments, the intent detection task may be viewed as determining whether a subject is performing a particular intent in a short segment, and the research objective transitions from a long period of time that may have both an intent execution portion and an unintended execution portion to a short segment intent with or without a particular intent-the presentation segment is a time segment of a subject during the presentation of a particular intent task; the common segment is a time segment of which the main body does not execute a certain intended task; the query segment is a time segment for determining whether the subject performs a certain task; the training segment is a time segment used for training the reconstruction model, and must be a segment of execution intention;
given a query segment of an electroencephalographic recording q, it is determined whether the subject is performing some intended task T in that segment
i(ii) a Formally, given a set of N electroencephalographic patches
Corresponding to the same destination task T during each epoch
iThe reconstruction model is expressed as
Where d enters the dimension of the reconstructed model and θ
iIs a parameter of the reconstructed model that can be transformed by a training phase by minimizing a loss function:
after the reconstruction model is established, the segment q is inquired for each electroencephalogram, and the reconstruction error is calculated
Determining a correlation between the electroencephalogram query segment and the psychological intent as a similarity measure; reconstruction error
The smaller the electroencephalogram query segment q and psychological intent T
iThe greater the correlation.
Still further, in performing yes or no motion intention detection, the performance of the RID scheme implemented by subspace projection, autoencoder, and sparse dictionary learning normalizes the relative average errors by partitioning the average reconstruction errors of the intended segments of the executed query;
normalized Relative Mean Error (NRME) phirelativeIs defined as:
φoand phiiThe average reconstruction error for the ordinary and the intended, respectively, execution of the query fragment.
Further, the normalized relative reconstruction error is calculated as follows:
wherein e
kIs the reconstruction error for the k-th instance,
is the average reconstruction error of the intent to execute the query fragment.
3. Advantageous effects adopted by the present invention
(1) The present invention utilizes a reconstruction model to represent a high level abstraction of the movement intent and utilizes reconstruction errors to determine whether a movement intent is present.
(2) The invention utilizes different reconstruction models to carry out comprehensive detection experiments on two movement intention tasks, and proves that the proposed RID scheme has good performance. Not only has theoretical flexibility and reliability for any complex realistic situation, but also does not need manual processing and profound professional knowledge.
(3) The present invention exhibits good performance on the synthetic query fragments, and obtains a "baseline" query fragment when acquiring data intended to perform a task, even in an environment where baseline data is acquired, thereby obtaining competitive results.
(4) The invention discusses three inquiry strategies for movement intention detection, namely trial type, record type and sectional type, and three different reconstruction models are constructed to realize the scheme so as to show the flexibility of the scheme to various reconstruction models. Furthermore, the solution of the invention does not require any manual handling and excessive expertise.
Drawings
FIG. 1 is an intent detection scheme based on reconstruction.
FIG. 2 is a flow chart of intent detection based on reconstruction.
Fig. 3 is a left fist: normalized relative reconstruction error distributions for different query strategies and different reconstruction models. (query relevance refers to the query fragment of the execution intent).
Fig. 4 is a left fist: average accuracy and recall of different query strategies and different reconstruction models over different thresholds. (query relevance refers to the query fragment of the execution intent).
Fig. 5 is a right fist: normalized relative reconstruction error distributions for different query strategies and different reconstruction models. (query relevance refers to the query fragment of the execution intent).
Fig. 6 is a right fist: average precision and recall ratio of different thresholds under different query strategies and different reconstruction models. (related queries refer to the intent to execute a query fragment).
FIG. 7 is a movement intent detection F1 score with optimal decision thresholds for different query strategies and reformulation models.
FIG. 8 is an intent detection and left-punch movement intent fragment of the synthetic query fragment: the F1 score is based on a threshold.
FIG. 9 is an intent detection and right-punch movement intent fragment of the synthetic query fragment: the F1 score is based on a threshold.
Detailed Description
The technical solutions in the examples of the present invention are clearly and completely described below with reference to the drawings in the examples of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without inventive step, are within the scope of the present invention.
The present invention will be described in further detail with reference to the accompanying drawings.
The invention discloses a reconstructed human-based intention detection (RID) method that is capable of identifying whether a subject is certain to perform an athletic intention within a given observation period. We use the reconstruction error as a criterion for identifying the intended implementation. In the training stage, the electroencephalogram signal executing a specific artificial processing intention task is used for training a reconstruction model, in the detection stage, the electroencephalogram period to be determined is input into the reconstruction model, and the reconstruction error is calculated. The smaller the reconstruction error, the more likely that a certain motor-intended task exists for the observation period. The proposed solution is scalable in any real-life scenario without the need to know in advance what the scenario is not intended to be executed. Furthermore, the solution does not require manual processing and excessive expertise, which is often important in traditional electroencephalographic communities. Whereas systematic experiments on a data set of large electroencephalograms with two motor-intended tasks were aimed at studying the effectiveness of the RID protocol. Three different reconstruction models established in the research, namely an automatic encoder, subspace projection and sparse dictionary learning, realize the RID scheme. In addition, three strategies for intention detection by means of electroencephalogram signals are studied. The experimental results show that the RID scheme provided by the invention has good effect on detecting the movement intention of the left fist and the right fist. The scheme lays a foundation for developing a more reliable and practical BCI system.
A reconstructed human intent detection (RID) based method, performed as follows:
step 1, in the field of electroencephalogram analysis, the invention provides an intention detection scheme for one person. Compared to traditional intent recognition schemes, our goal is to determine if a user is performing an intent, which is more critical to building a reliable, flexible BCI system. The electroencephalogram signal is one of the most common monitoring methods for recording the brain electrical activity of a human, and has potential application prospects in the fields of mental control wheelchairs, disease diagnosis and the like.
Most existing electroencephalographic analysis efforts focus on classifying an instance into a predefined class by a recognition task. The usual method consists of two parts: and (5) extracting and classifying the features.
Step 1.1, traditional electroencephalogram feature extraction comprises frequency band-pass filtering and spatial filtering, wherein the frequency band-pass filtering reserves significant information in an active frequency band and filters a non-active frequency band possibly containing noise. Spatial filtering usually employs a common spatial mode and its variations, and in addition, this scheme also develops a combining algorithm, namely a filter bank csp (fbcsp), which shows its competitiveness.
Step 1.2, in the aspect of the components of the classifier, many machine learning methods, such as linear discriminant analysis (lda), support vector machines (svm) and random forests (rf), have been applied in electroencephalogram-based classification of moving images. Recently, some studies investigated deep learning methods of electroencephalography analysis. The most common deep models are Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN). However, these deep models only consider data collected in the target intent task, ignoring the effects of electroencephalographic signals generated when the user is not performing a particular intent and the practical application of electroencephalography-based BCI systems. Thus, in this scenario, rather than categorizing which athletic intent task the user is performing, our goal is to identify whether the user is performing some athletic intent task.
And 2, providing a reconstruction-based electroencephalogram signal intention detection scheme. This scheme is completely different from the classification machine learning approach, which is directed to detection tasks rather than the traditional recognition task. Our approach is theoretically able to handle any real scenario and does not require domain knowledge and the need to manually handle this feature as is commonly used in the electroencephalogram analysis field. Reconstructive models have been widely used in various areas of data mining, particularly the detection or removal of outliers in computer vision. And (4) training the reconstruction model, and regarding the sample with larger reconstruction error as an abnormal value. Furthermore, the reconstructed model may also be used as a classifier. The authors perform classification training on the reconstructed model and classify each sample into the class that best fits the reconstructed model. These methods can be summarized in three broad categories.
Step 2.1, the first type is subspace projection. When the observed values of a set of possible correlated variables are converted into values of a set of linearly uncorrelated variables, one of the most representative methods is Principal Component Analysis (PCA), which obtains a principal subspace through orthogonal transformation. The training set is reconstructed using PCA, kernel PCA, robust PCA, and robust kernel PCA, which select or remove outliers that produce high reconstruction losses.
We have adopted subspace projection as the reconstruction method for this scheme. Our goal is to define a set of m-dimensional vectors whose mappings belong to R
dTo the samples x to belong to R
m(m.ltoreq.d) sample
And the principal component is obtained by m variables with three properties, as follows:
1) the major components are orthogonal.
2) The variance of the first principal component is the largest and the variance of each subsequent component is progressively smaller.
3) The sum of the changes of all principal components is equal to the sum of the changes of the original variables.
Suppose that
Is formed by d variable v
1,v
2...v
dThe correlation matrix calculated by the training set is reconstructed by the target. D eigenvalue eigenvector pairs are calculated from R and sorted by eigenvalue to obtain (lambda)
1,e
1),(λ
2,e
2),...,(λ
d,e
d) Wherein λ is
1≥λ
2≥...λ
d. And is
Principal component x at ith order of sample ═ x (x)
1,x
2,...x
d)
TCan be calculated as:
in general, the first m-eigenvalue eigenvector pair (λ) is selected1,e1),(λ2,e2),...,(λm,em) The resulting projection matrix P is:
P=(e1,e2...em)T(3)
wherein
Any observation x can be converted into:
y=Px (4)
wherein
Due to the characteristics of the feature vectors, the reconstruction process is simple:
it should be noted that P is only present when m ═ d
-1=P
TThe reconstruction is perfect; otherwise, due to the compression,
and x are not the same. Therefore, we set m < d, considering that an overly perfect reconstruction may limit the effectiveness of identifying data patterns from the reconstruction loss.
Thus, a subspace projection based reconstruction model may be defined as:
wherein
Can be calculated by acquiring a projection matrix of the PCA. After training, the reconstruction error of the electroencephalographic query segment q can be calculated as:
step 2.2, the second category is to learn a compact code to represent the summary training samples. For example, existing work utilizes automated encoders to transform observations from one feature space to a new feature space, where the data separates the varying factors. We deploy the auto-encoder as another reconstruction model in the scheme. The automatic encoder is a reconstruction model based on a neural network, and has good performance on denoising and dimensionality reduction. From three query strategies for motor intention detection, namely trial, recorded and segmented (different EEG data organization strategies for inputting different reconstruction models), we apply two types of automatic encoders: and fully connecting the automatic encoder and the CNN automatic encoder. The method comprises the following specific steps:
step 2.2.1, fully connected autoencoder
Like the multilayer perceptron (MLP), the fully-connected self-encoder is a feedforward artificial neural network. The automatic encoder consists of an encoder
m is the dimension of the hidden layer and a decoder
Each layer is defined by a corresponding weight W, bias b and activate function:
Φ=fΦ(WΦx+bΦ) (7)
Ψ=fΨ(WΨx+bΨ) (8)
wherein
And
f
Φand f
ΨRepresenting the encoding activation function and the decoding activation function, respectively. Thus, in our case, the reconstruction model is defined as:
the parameter θ can be obtained by minimizing:
after training, the reconstruction error of the electroencephalographic query segment q can be calculated as:
it should be noted that using a linear activation function, the auto-encoder can generate the same subspace as the PCA. Therefore, we use the nonlinear correction linear unit (relu) as the activation function.
Step 2.2.2 CNN autoencoder
Since electroencephalography signals are time-series data with multiple channel readings at each point in time, the input data to the auto-encoder may be in a two-dimensional (2D) format. Therefore, we apply a CNN auto-encoder as a reconstruction model to process two-dimensional electroencephalographic data. Similar to fully connected autoencoders, CNN autoencoders also have an encoder portion and a decoder portion. The main difference is that the encoder and decoder sections are constructed primarily with convolutional neural networks.
Specifically, there are three 2D-CNN layers at the encoder stage, each layer followed by a max-pooling layer, and three upsampling layer and convolutional layer pairs at the decoder stage. To formalize the two-dimensional convolution operation, the value of the neuron at position (x, y) in the k-th layer feature map of the L layer is given by:
where relu (x) max (0, x) is the activation function. In this equation, b
klIs a
thK in layer
thThe deviation of the mapping of the features is,
is the weight at (P, q) of the kernel connected to this feature map, covering the feature map in the previous layer, P
kAnd Q
lIndicating the size of the kernel.
After each convolutional layer, a max pooling operation with a stride of [2 × 2] is applied, reducing the data dimension by half. Thus, at the decoder stage, we need to extend the data dimension twice in each decoder layer. There are two methods of extending the phase data dimension of a decoder: transpose convolution operation and upsampling interpolation. The working principle of the transposed convolution operation is almost identical to that of the convolution operation, but the inverse: the data dimension of a cell in the input layer in the transposed convolutional layer will be extended to be larger. However, some researchers claim that the interpolated layer of the post-convolutional layer performs better than the transposed convolutional layer, and the nearest neighbor upsampled interpolated value performs best on upsampling. Therefore, in this study we use nearest neighbor upsampling interpolation and convolutional layers as the basic components of the decoder phase. After the CNN decoder, the input data is reconstructed with a final output layer. The final output layer is a convolutional layer, and the size of the output data is the same as that of the input data.
The number of feature mappings of the three convolutional encoding layers at the decoder phase is 16, 8 and 8 respectively, and conversely, the final output convolution at the decoder phase is 1. The kernel size of each convolution operation of the encoder and decoder stages is kept at [3 × 3] and the stride is kept at [1 × 1 ]. The maximum pooled kernel size is set to [2 × 2 ]. Batch normalization is applied to achieve better performance.
Random gradient descent using Adam update rule to minimize loss function:
wherein ζjIs a set of parameters of the neural network. Network parameters are as follows 10-4Is optimized.
And 2.3, learning sparse representation of input data through a dictionary in the third type. It results in redundant atoms, allowing multiple representations for a single sample. Particularly including works that study sparse dictionary learning to improve the sparsity and flexibility of the representation. However, most of the existing studies still follow the basic assumption of "few and different" outliers. Due to the complexity of human intent, we are more complex than outlier detection in contrast. The first difficulty is that the number of "normal fragment" data is larger than the number of fragment data of the execution intention. Further, unlike outliers, "normal fragment" data is typically diverse and may have some association with the fragment data of the execution intent.
And 3, evaluating the motion intention detection tasks of yes and no based on the electroencephalogram, and systematically researching and analyzing different types of electroencephalogram reconstruction models and various electroencephalogram data organization strategies.
We also studied sparse dictionary learning as a reconstruction model to implement RID schemes according to step 2.3 by learning sparse representations of the input data through dictionaries. Sparse dictionary learning is a learning method for constructing compact representations for input data. The potential representation is composed of a dictionary of several atoms and a sparse code. In the reconstruction process, the original input can be well approximated by using sparse coding and linear combination of atoms and atoms. Formally, the reconstruction model is defined as:
wherein theta is
i=[d
1,d
2,...,d
m]Is a dictionary containing m atoms,
is an input vector
Most of the coefficients of the sparse code of (3) are zero or close to zero. To construct an overcomplete dictionary, the dictionary dimension m is set larger than the input dimension d. The hypercomplex dictionary does not require atoms to be orthogonal, thus allowing for a more flexible dictionary and richer data representation. Dictionary theta
iAnd sparse code c
iThe following optimization problem can be solved to learn in the training phase:
obey | | d
k||
21 stands for all 1. ltoreq. k. ltoreq.m. The first term is the data fitting term and the second term is the sparsity-induced regularization. This minimization problem is a problem with NP-hard, as there are
And can be produced by mixing
By replacing it with convex slack, i.e. using
To approximate the solution. For theta
iAnd c
iSay one is fixed, then this problem becomes a convex problem. We obtain an approximate result because it is not co-convex (theta)
i,c
i). The optimal sparse code and dictionary can be obtained through an alternate updating scheme:
1) sparse code approximation: updating sparse code c by solving equation (15)iAnd using the dictionary iteratively corrected last time;
2) and (3) dictionary refinement: updating dictionary θ by solving equation (15)iAnd use the sparse code fixed in the last iteration.
In the detection phase, given a query segment q, its sparse code can be computed as c
i(q) reconstruction is represented by θ
ic
i(q) is carried out. Thus the reconstruction error is
After the reconstruction error is obtained, an intent-specific threshold needs to be applied to determine whether the query fragment is an intent-to-execute fragment or a normal fragment, as shown in the final stage of FIG. 1.
3. A reconstruction-based intention detection (RID) solution as described instep 1, wherein the purpose of electroencephalogram-based intention detection is to determine whether a subject has had some intention within a given observation period by analyzing electroencephalographic recordings of a corresponding time period. Without loss of generality and practicality, we assume that the intended execution (if any) may last for a few seconds. Thus, a given observation period may be divided into a series of short segments. The intent detection task may be viewed as determining whether a subject is performing a particular intent in a short segment, and the research objective transitions from a long period of time that may have both an intent-to-perform portion and an unintended-to-perform portion to a short segment with or without a particular intent.
For clarity of description, we explain several terms used throughout the remainder of this document:
1) intent-presentation fragment: a time slice of a subject during presentation of a particular intended task;
2) general fragment: a time segment in which a subject does not perform some intended task;
3) and (3) inquiring the fragment: determining whether a topic performs a time slice of a task;
4) training segment-the segment of time used to train the reconstructed model, which must be a segment of execution intent. Given a query segment of an electroencephalographic recording q, our goal is to determine whether a subject is performing some intended task T in that segmenti. To achieve this goal, our reconstruction-based intention detection (RID) scheme consists of two phases, a training phase and a detection phase, as shown in fig. 1. In the training phase, several sets of electroencephalographic recordings obtained while performing certain intended tasks are used to train class-specific reconstructed models; in the detection phase, the queried electroencephalogram segment is input into a reconstruction model, and it is determined whether a certain motor intention is executed according to the reconstruction error thereof. As shown in the training phase of FIG. 1, each candidate intent requires its own reconstructed model during the training phase. The reconstructed model is a high-level representation extracted from an electroencephalographic recording of the candidate intended activity. It is compact, requiring no manual handling and no profound expertise.
Formally, given a set of N electroencephalographic patches
Corresponding to the same target task T in each period, the reconstruction model is expressed as
Where d enters the dimension of the reconstructed model and θ
iIs a parameter of the reconstructed model that can be transformed by a training phase by minimizing a loss function:
after the reconstruction model is established, the segment q is inquired for each electroencephalogram, and the reconstruction error is calculated
As a similarity measure, the relevance of the electroencephalographic query segment to the psychological intent is determined. Reconstruction error
The smaller the electroencephalogram query segment q and psychological intent T
iThe greater the correlation.
Examples
The present invention describes a new research problem, "whether or not" to perform motor intent tasks, and proposes a reconstruction-based intent detection (RID) scheme that demonstrates the potential ability to address such problems. In the traditional field of human intent analysis, most works attempt to answer the question of "which is the subject who wants to move". However, in practical applications, this may result in erroneous operation. For example, the BCI system controls the brain's intent to open or close a door. If an algorithm only aims at recognizing that a person wants to open or close a door, a user must wear an electroencephalogram headset when he wants to open or close the door, and then take the headset off after opening or closing the door, and if he is wearing the headset all the time, then the opening or closing of the door is uncertain. To solve such a practical problem, a problem as to whether the user is "trying to control a door" or not, i.e., such "a or no" problem is much more difficult than the conventional "a or B" problem because the present invention cannot define the "no" condition as it is commonly defined for "a or B" problems.
For this "a or no" problem, the present invention proposes a reconstruction-based intent detection scheme that utilizes a reconstruction model to represent the "a" state and reconstruction errors to determine the correlation between the query and the "a" state, rather than defining the "no" state. Systematic experiments were performed on a large electroencephalographic dataset containing 55 subjects, who imagined their fist moving left or right. The feasibility of the proposed RID scheme was verified by synthesizing the data with random values between the maximum and minimum possible electroencephalogram readings, the eye-open baseline task and the eye-close baseline task as three "not" cases as query segments. The present invention exhibits good performance on the synthetic query fragments, and obtains a "baseline" query fragment when acquiring data intended to perform a task, even in an environment where baseline data is acquired, thereby obtaining competitive results. The invention discusses three inquiry strategies for movement intention detection, namely trial type, record type and sectional type, and three different reconstruction models are constructed to realize the scheme so as to show the flexibility of the scheme to various reconstruction models. Furthermore, the solution of the invention does not require any manual handling and excessive expertise.
There are many future directions to investigate such a "no or no" problem. One is to build a two-phase algorithm: the first stage is responsible for answering "if" trying to move the fist; the second phase is responsible for answering "which" tries to move the fist. Another direction is to implement a real intent detection BCI system to evaluate the effectiveness of the scheme in real scenarios.
1. The present invention evaluated the effectiveness of the RID protocol on a large-scale electroencephalographic motor intention dataset containing 55 subjects performing left/right punch open and close motor intention tasks, and two baseline tasks (eye open and eye closed). Electroencephalographic data was collected using a BCI2000 instrument with 64 electrode channels and a 160Hz sampling rate. Each subject performed approximately 45 trials (a continuous electroencephalographic recording session in which only one particular mental task was performed), with the balance of the left and right fist motor intent being approximately the same. There were 2347 trials, 1179 left punches and 1168 right punches. Each trial lasted about 4 seconds, data was extracted starting 1 second after the occurrence of a cue (instructing the subject to perform a certain task) until the end of the trial, each trial completed 497 time steps, and during the baseline trial the subject's eyes remained open or closed for 1 minute without performing any psychological task.
To evaluate the intent detection scheme, two query fragments should be used:
1) intent execution query fragment: a query fragment for a subject to actually perform an intended task;
2) common query fragments: an object has virtually any possible mental state, but does not include query fragments of a particular intent.
However, the most difficult part of modeling such a "if" problem is that a generic query fragment may be in many different forms, even without any relationship. In addition, human activities such as walking or facial movements have a significant impact on electroencephalogram readings. Therefore, it is not possible to have a common set of query fragments to exhaustively define and model all "common" mental states. In view of the above, and the fact that electroencephalogram signals are always noisy for electrical reasons, the present invention builds on synthesized data to simulate a "no" electroencephalogram reading in the case of performing a certain motor intent. The synthetic data is constructed using random values between the maximum and minimum possible electroencephalographic readings. Meanwhile, to evaluate the effectiveness of the proposed RID solution, the two baseline tasks in the data set, i.e., eye-open and eye-closed, are used as special states of "no" to perform some motor intent. Thus, the present invention tests three common query fragments and an intent to execute the query fragment:
1) and (3) comprehensive query fragment: a query segment established by randomly selecting a value between the possible maximum and minimum electroencephalogram readings;
2) open eye query segment: a query segment generated using an eye-opening task in the dataset;
3) closed-eye query fragment: query fragments generated using the closed-eye task in the dataset.
4) Intent execution query fragment: the intent execution fragment is used as a query fragment.
2. The invention evaluates the RID scheme to detect two motor intention tasks: imagine the movement of the left and right punches. Therefore, two RID schemes are constructed in the training stage by adopting the left-boxing action intention fragment and the right-boxing action intention fragment. One to answer "see if you imagine moving left fist" and the other to answer "see if you see moving right fist". The left fist motion intention fragment and the right fist motion intention fragment are respectively divided into two parts: one part is used as a training segment and the other part is used as a query segment.
Since a single electroencephalography trial is a long-term trial comprising multiple electroencephalography recordings/time steps, the present invention employs three electroencephalography data organization strategies (inquiry strategies) to evaluate RID protocols, namely, trial strategies, recording strategies, and slicing strategies.
1) Test strategy
The present invention uses separate electroencephalography trials, each trial shape (height 64, length 497) as an example to train the reconstruction model. The height 64 corresponds to 64 electroencephalographic channels, while the length is the number of recording time points in a single electroencephalographic experiment. In the detection phase, the query fragment is also an individual electroencephalography test, shaped the same as that used for training. For each evaluation task, 90% of the intents were randomly selected to perform the fragment tests to train the reconstruction model, and the remaining 10% of the tests were used as query fragments to evaluate the RID solution. The record of the baseline task was cut into identically shaped pieces (height 64 and length 497) and tested individually using the sliding window method without any overlap. The present invention has the same number of synthetic segments, including query segments that are intended to be executed. In this strategy, a CNN autoencoder is used as the autoencoder reconstruction model.
2) Recorded policy
Since there are many recordings, each recorded at a point in time, the present invention uses a single recording as input to the reconstruction model. Thus, an electroencephalogram trial is divided into a number of recorded vectors, each recorded vector having 64 elements, corresponding to 64 electroencephalographic channels. Given the potential similarity that an electroencephalogram trial may have to affect the evaluation process, the recordings of the same electroencephalogram trial are used as training or query segments. The present invention was still trained using 90% of the randomly selected trials, with the remaining 10% of the trials being evaluated. The processing of the generic query fragments is consistent with the training examples. The fully connected auto-encoder serves as a specific auto-encoder reconstruction model.
3) Slicing strategy
Similar to the recording strategy, electroencephalography is divided into several time slices as the time period of interest. Specifically, an electroencephalogram test of one shape (height 64, length 497) was cut into 18 pieces (height 64, length 320) using a sliding window technique with a window size of 320 and sliding steps of 10. The present invention still uses all slices of the same test in either the training phase or the detection phase to avoid the same test similarities. Baseline electroencephalography recordings were also processed using the sliding window method of the same setup. Similar to the experimental strategy, a CNN autoencoder was used as the autoencoder reconstruction model.
Table one: and (3) left boxing: on different target tasks, query strategies and reconstruction models, the normalized relative average reconstruction error of the common query segment to the intention execution query segment.
Table two: and (3) right fist: on different target tasks, query strategies and reconstruction models, the normalized relative average reconstruction error of the common query segment to the intention execution query segment.
3. In two electroencephalogram-based motor intention detections, the present invention evaluates the performance of the RID scheme implemented by subspace projection, auto-encoder, and sparse dictionary learning. The invention carried out systematic experiments of three inquiry strategies on electroencephalogram data of 55 subjects. Since the smaller the assumed value, the larger the reconstruction error, and the greater the correlation of the query fragment with the purpose, the present invention uses the relative average reconstruction error from the ordinary query fragment to the intended execution query fragment to evaluate the detectability of the proposed scheme. The present invention normalizes relative average errors by dividing the average reconstruction error of the intended segment of the executing query.
Normalized Relative Mean Error (NRME) phirelativeIs defined as:
φ°and phiiThe average reconstruction error for the ordinary and the intended, respectively, execution of the query fragment. The overall results are shown in tables 1 and 2. The result shows that under all the query strategies and the reconstruction models, the average reconstruction errors of the three types of ordinary query segments are always larger than the average reconstruction errors of the corresponding intent execution segments for the two intent tasks. This suggests that the reformulation-based approach has the potential to identify intent executions or common query fragments so that human intent can be detected. The reconstruction error of the synthetic query fragment is about 15 to 35 times the intent to execute the query fragment, which means that the RID scheme of the present invention is very powerful at handling noisy generic query fragments. The difference in reconstruction error between the baseline query segment (eyes closed or eyes open) and the intent to execute the query segment is relatively small compared to the synthetic query segment. The reason for this is that electroencephalogram acquisition for the intended performance task and the baseline task are similar: subjects were asked to sit in front of the computer, perform both intent and baseline tasks within the same time period, and do not make unnecessary physical movements. Furthermore, imagine that fist movements primarily affect the three electroencephalographic channels "c 3, c4, and cz", while other channels may fluctuate in a similar manner to the baseline task. Even with such special query conditions, the RID scheme of the present invention can still achieve a difference of around 30% in reconstruction error.
For the left-fist intention task and the right-fist intention task, the subspace projection reconstruction model adopts a fragment query strategy to obtain the optimal average result. In a detailed analysis, the subspace projection method using the logbook query strategy yields the best results in most cases, while the results obtained in other cases are slightly lower than the best results. The example input dimensions of the logbook query strategy are smaller, 64, while the example dimensions of the tri-axial or sliced query strategy are 64 x 497 and 64 x 320, respectively, much larger than the example dimensions of the logbook query strategy. This increases the difficulty of building a robust reconstruction model. Furthermore, the logged strategy has more training instances than other strategies, and also helps to train a more generalized reconstructed model.
FIGS. 3-6 depict detailed statistical experimental results for two intent detection tasks. The normalized relative reconstruction error is calculated as follows:
wherein e
kIs the reconstruction error for the k-th instance,
is the average reconstruction error of the intent to execute the query fragment.
As shown in tables 1 and 2, the reconstruction error of the synthesized query fragment is an order of magnitude greater than the reconstruction error of the intent to execute the query fragment, so the present invention does not add it to the reconstruction error profile (FIGS. 3 and 5).
The results show that under all query strategies and all reconstruction models, only the segment of intent to execute the query appears in the leftmost part of the relative reconstruction error distribution. This means that the reconstructed model successfully reveals the underlying pattern of the trained electroencephalographic signal. However, open-eye and closed-eye query segments obtain a large overlap in the reconstruction error distribution of the query segment intended for execution. This is due in large part to the similar situation in conducting experiments to obtain electroencephalographic data for different tasks, resulting in only a few electroencephalographic channels working in different ways. In the general case, an electroencephalogram reading may be any possible value (synthetic query fragment), and tables 1 and 2 indicate that its relative reconstruction error is much greater than the reconstruction error of the intent to execute the query fragment.
In order to make the final arbitration, the present invention must use a threshold value, so that the query segment with reconstruction error smaller than the threshold value can be divided to execute a certain intention task, and the query segment with reconstruction error larger than the threshold value is determined not to execute a certain intention task. The present invention uses accuracy and recall to describe the performance of the invention at different thresholds. Accuracy may be interpreted as "how accurate it is when an electroencephalogram cycle is determined whether some intent is being performed. The higher the value of accuracy, the less uncertain operation of the system and therefore the higher the reliability of the system. The recall may be interpreted as "multiple intents were performed and how many intents could be identified". The higher the value of the recall rate, the more sensitive the system. Therefore, accuracy and recall are important to construct an efficient and effective system. Fig. 4 and 6 show the average accuracy recovery threshold curves for the left-punch and right-punch intended tasks, respectively. The results show that the accuracy of all strategies and reconstructed models can reach around 100% at a lower threshold and then decrease to 50% of the random value as the threshold increases. The peak in recall occurs to the right of the peak in accuracy. The range of the fluctuation of the recall rate is small and is different from 50% to 65%.
The F1 score is an evaluation method considering both accuracy and recall, and fig. 7 shows the F1 score with optimal decision threshold for different query strategies and different reformulation models. The results indicate that the cut-to-slice query strategy gave the best results when tradeoffs were made between accuracy and recall, with the F1 score for the record-based query strategy being significantly lower than the other two strategies. In most cases, the subspace projection-based reconstruction model provides the best F1 score, while in other cases it can be more competitive. The optimal F1 score for the detection of the intention of left boxing is 66.64%, and the optimal F1 score is R by using a subspace projection reconstruction model and a slicing query strategy. And (3) adopting a subspace projection reconstruction model and combining a trial calculation query strategy, wherein the detection rate of the movement intention of the right fist is 66.38%. In general, the proposed RID scheme achieves perfect performance within a certain threshold range by arbitrarily selecting possible electroencephalogram values, as shown in fig. 8 and 9. In summary, the subspace projection model provides a wider threshold range for perfect performance detection.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.