Movatterモバイル変換

Introduction

Epilepsy is a chronic, non-infectious but genetic disease that affects all ages and is caused by paroxysmal abnormal hypersynchrony of brain neurons. It is one of the most common neurological diseases globally. Due to the diversity and complexity of the clinical manifestation of epilepsy, it is often misdiagnosed or missed. Repetitive seizures can have a persistent negative impact on the patient’s mental and cognitive functions, even threatening their life. Therefore, the study of epilepsy diagnosis and treatment has important clinical significance. The brain electroencephalogram (EEG) is a microvolt-level electrical signal generated by synchronized neurons in the brain when electrodes are placed on the scalp at specific locations. As the most commonly used and cheapest non-invasive brain wave detection method, EEG has a history of over 70 years of research and is the most effective method for diagnosing epilepsy-related diseases, such as identifying seizures, predicting their occurrence, and localizing the affected areas. With the development of artificial intelligence, machine learning models are extensively used in automatic epilepsy recognition. Feature representation is a crucial step in machine learning. Research has indicated that EEG signals can be represented by both linear and non-linear features. Time-domain features are the fundamental features in EEG signal processing, primarily extracted by directly observing and calculating relevant characteristics from the raw signal. Their advantages lie in their simplicity of computation and ease of interpretation. However, the non-stationarity of EEG signals, individual differences, and external interferences can easily affect time-domain features. Frequency-domain features are based on the significant changes in energy in EEG during epileptic seizures, assuming that the background EEG is approximately stationary. Most frequency-domain features are derived from the study of signal power spectra, and various parameter estimation methods can be used for extracting spectral features. The accuracy of these parameters also affects the quality of frequency-domain features. If we consider the amount of information contained in the features, neither pure time-domain features nor frequency-domain features can comprehensively characterize an EEG signal. Additionally, EEG analysis based on the assumption of stationarity is not rigorous. Therefore, researchers have turned their attention to time-frequency analysis methods, such as time-frequency transformations, to re-represent non-stationary EEG signals and extract corresponding features. In addition to the aforementioned linear features, many studies also consider the brain as a nonlinear system and extract corresponding nonlinear features from descriptions of complexity, persistence, synchrony, and other changes in the system. These features are not affected by the non-stationarity of EEG signals and offer more flexibility in dealing with issues such as multi-channel correlation and channel loss. Based on the aforementioned linear or nonlinear feature representations, numerous scholars have constructed machine learning models for the automatic diagnosis of epilepsy. For example, the study conducted byLi, Chen & Zhang (2016) employed a dual-tree complex discrete wavelet transform to extract nonlinear features from individual components. The researchers utilized an ANOVA analysis to select relevant classification features, including the Hurst parameter and fuzzy entropy. For the classification task, a support vector machine (SVM) was employed.Reddy & Rao (2017) computed the central correlated entropy of wavelet components obtained from tunable Q-factor wavelet transform, and utilized models such as RF, LR, and multi-layer perceptron for epileptic signal recognition.Jaiswal & Banka (2017) proposed a feature extraction method called local gradient pattern transformation and applied classification methods such as k-nearest neighbors, SVM, and decision trees for epilepsy detection.

The aforementioned machine learning-based epilepsy diagnostic models utilize single EEG feature representation for epilepsy diagnosis, which have low model complexity and high interpretability. However, these models rely on expert knowledge, and deep features are not easily observed and extracted. As a result, the accuracy is limited. Multi-view learning (Zhao et al., 2017;Jiang et al., 2020;Zhang, Chung & Wang, 2018;Yan et al., 2021) improves the classification accuracy of models by utilizing the differences and similarities between multiple different views based on the principles of view consistency and complementarity. For example,Tian et al. (2019) utilized a convolutional neural network (CNN) model to extract deep features from EEG signals in the time domain, frequency domain, and time-frequency domain. These features were constructed as three views, and multi-view learning was conducted using a multi-view Takagi-Sugeno-Kang (TSK) fuzzy system, which improved the classification and detection performance compared to a single view.Yuan et al. (2018) implemented a multi-view epilepsy automatic diagnosis by utilizing channel characteristics and intra-channel time-frequency features of multi-channel EEG signals extracted using autoencoder (AE) through channel perception technology.Liu & Li (2019) utilized a user-sensitive model for channel selection and extracted time-frequency features from each sub-band of the selected channels, forming multi-view features. They extracted numerical and morphological features using a common spatial projection matrix and utilized a maximum average difference autoencoder to extract inter-channel time-frequency domain features, enabling automatic diagnosis of epilepsy with multiple views. These effective models based on collaborative regularization can construct a common feature space for multi-view learning. However, these models also have certain limitations. While these methods construct the density distributions of each view solely based on the corresponding observed data, they overlook the correlated information among all views. Additionally, they separate the original sample space from the common space obtained through mapping. This approach solely utilizes the common space for learning, neglecting the discriminative information present in the original space.

To overcome such shortcomings, in this study, a shared hidden feature space method is constructed by using kernel density estimation, and it is extended to an expanded space by combining it with the original space. Then, SVM is introduced and a multi-view SVM based on the shared hidden space is proposed to take a careful consideration of the differences and relationships between samples from different views. Through experimental verification on different multi-view data sets, the effectiveness of this method in addressing the challenges mentioned above has also been confirmed. The contributions of this study are mainly reflected in the following aspects:

(1) The kernel density estimation (KDE) technique is used to construct a new shared hidden space, and it is combined with the original space to construct an expanded space for multi-view learning, thus being able to effectively address the special issue mentioned above on multi-view learning.

(2) By constructing the expanded space and utilizing the information of both the shared hidden space and the original space for learning, thereby fully utilizing the relevant information of samples within and across views, we can effectively solve the problem that the difference between samples of the same class from different views is greater than the difference between samples of different classes from the same view.

(3) During the optimization phase, the proposed model is transformed into a classical Quadratic Programming (QP) problem, allowing for the utilization of pre-existing optimization methods that offer both high effectiveness and theoretical guarantees. This transformation enables the application of readily available optimization techniques, which have proven to be highly efficient in solving QP problems.

The following sections are organized as follows. In ‘Data’, we introduce the EEG data used in this study and the corresponding multiple feature space representation. In ‘Methodology’, we present the proposed model. In ‘Experimental studies’, experimental results are reported and in the last section, the whole study is summarized.

Data

The EEG data of epileptic patients used in this study was authorized and provided by the University of Bonn in Germany (Andrzejak et al., 2001), as shown inTable 1. The dataset included volunteers who could be divided into five groups, namely A, B, C, D, and E. Each group contained 100 single-channel EEG segments lasting 23.6 s, with a sampling rate of 173.6 Hz. The EEG signals of groups A and B were collected from healthy volunteers in a relaxed and conscious state, while the eyes of the volunteers were open during the data collection of group A and closed during the data collection of group B. The remaining three groups’ signals were collected from epileptic volunteers, with group C’s signals collected from the hippocampi of the two brain hemispheres, and group D’s signals collected from the epileptic foci. The signals of groups C and D were measured during periods without epileptic seizures, while group E collected signals during epileptic seizures.Figure 1 provides an example of EEG signals from five groups.

Table 1:

Basic collection information of epilepsy EEG signals.

Group	#Volunteers	Collection information
A	100	This group was collected from a group of healthy volunteers who were instructed to keep their eyes open during the recording process. These volunteers did not have any known neurological or psychiatric disorders and were not experiencing any abnormal symptoms at the time of data collection.
B	100	This group was collected from a group of healthy volunteers under conditions where they kept their eyes closed.
C	100	This group was collected from the hippocampal formation of the contralateral hemisphere of the brain during seizure-free intervals. These samples were obtained when the patient was not experiencing any epileptic seizures.
D	100	This group was collected from the epileptogenic zone during periods of seizure freedom. This implies that the recordings were obtained when the patient was not experiencing seizures.
E	100	The group was collected during seizure activity phase offering a unique opportunity to study the dynamics and temporal dynamics of epileptic seizures, paving the way for the development of more accurate and reliable seizure detection and prediction algorithms.

DOI:10.7717/peerj-cs.1874/table-1

Figure 1:EEG signals from five groups.
Download full-size image
DOI:10.7717/peerj-cs.1874/fig-1

Frequency-domain representation extraction

Frequency-domain feature representation originates from the significant changes in energy in EEG during epileptic seizures. To extract frequency-domain representation from EEG signals, the Daubechies4 wavelet coefficients are utilized to decompose the original signals into a series of binary wavelets. The frequency band of each Daubechies4 wavelet coefficient is provided inTable 2. By applying these settings, the EEG signals are divided into six distinct frequency bands. An illustrative example of the decomposed signals from group E is depicted inFig. 2.

Table 2:

Frequency band of each Daubechies4 wavelet coefficient.

Coefficient	Frequency band
Daubechies4 (4, 0)	0–2 Hz
Daubechies4 (4, 5)	2–4 Hz
Daubechies4 (4, 4)	4–8 Hz
Daubechies4 (4, 3)	8–15 Hz
Daubechies4 (4, 2)	16–30 Hz
Daubechies4 (4, 1)	31–60 Hz

DOI:10.7717/peerj-cs.1874/table-2

Figure 2:Example of frequency-domain representation.
Download full-size image
DOI:10.7717/peerj-cs.1874/fig-2

Time-domain feature extraction

Time-domain features are the fundamental features in EEG signal processing, primarily extracted by directly observing and calculating relevant characteristics from the raw signal. Their advantages lie in their simplicity of computation and ease of interpretation for researchers. In this study, we employ kernel principal component analysis (KPCA) (Li et al., 2022b) on the raw EEG signals to enable complex nonlinear mapping. Previous research has shown that KPCA features offer discriminative patterns suitable for pattern recognition. An illustration depicting an example of KPCA features from group E can be observed inFig. 3.

Figure 3:Example of time-domain representation.
Download full-size image
DOI:10.7717/peerj-cs.1874/fig-3

Time-frequency representation extraction

Pure time-domain or frequency-domain feature representations alone cannot comprehensively characterize an EEG signal, and EEG analysis based on the assumption of stationarity is not rigorous. Therefore, researchers have turned their attention to time-frequency analysis methods, such as time-frequency transformations, to re-represent non-stationary EEG signals and extract corresponding features. To capture time-frequency representation, researchers often employ the short-time Fourier transform (STFT) (Li et al., 2022a). STFT allows for the analysis of how the frequency content of a signal changes over time. It can be formulated as follows:

(1) $F_{t i m e - f r e} (t i m e, f r e) = \int_{- i n f}^{+ i n f} x (t i m e) g (t i m e - u) e^{- j 2 π * f r e * t i m e} d (t i m e) .$

In the context of EEG signal analysis,Eq. (1) represents the transformation of continuous EEG signals, denoted as $x (t i m e)$ , into the time-frequency plane using the function $g (t i m e - u)$ and a limited width window centered around $u$ . This transformation, referred to as $F_{t i m e - f r e} (t i m e, f r e)$ , provides a means to examine the time-varying nature of the EEG signals, revealing local spectrum discrepancies at different time points. To achieve this, the EEG signals undergo partitioning into several segments of local stationary signals using STFT. Through this process, the time-varying characteristics of the EEG signals are captured, highlighting variations in the spectrum. The extraction of six energy bands as features is accomplished usingEq. (1), which takes into account the observed discrepancies. A visualization of these six energy bands, exemplified by group E, is illustrated inFig. 4.

Figure 4:Example of time-frequency representation.
Download full-size image
DOI:10.7717/peerj-cs.1874/fig-4

Methodology

In this section, we will design a shared hidden space-driven multi-view learning method to fuse time-frequency representation, frequency-domain representation and time-domain representation.

Construction of shared hidden feature space

Suppose that $Ω \in R^{r \times d}$ is an orthogonal matrix subject to $Ω Ω^{T} = I \in R^{r \times r}$ , $f^{A} = {x_{i}^{A}, y_{i} | x_{i}^{A} \in R^{d}, i = 1, 2, \dots, N}$ represents one kind of feature space,e.g., time-domain feature space, and $f^{B} = {x_{i}^{B}, y_{i} | x_{i}^{B} \in R^{d}, i = 1, 2, \dots, N}$ represents another kind of feature space, then the hidden feature space of $f^{A}$ and $f^{B}$ can be generated by ${Ω x}_{i}^{A} \in R^{r}$ and ${Ω x}_{i}^{B} \in R^{r}$ , respectively, where $r$ represents the number of hidden features. To obtain a consistent hidden feature space between ${Ω x}_{i}^{A}$ and ${Ω x}_{i}^{B}$ , it is expected that the difference between them should be minimized as much as possible. Kernel density estimation (KDE), which is one of the non-parametric estimation methods in probability theory, is usually used to estimate the unknown probability density function (Wang, Wang & Chung, 2013). For a training set $X = {x_{i}, y_{i} | x_{i} \in R^{d}, i = 1, 2, \dots, N}$ , its corresponding kernel density estimation function can be expressed as

(2) $P (x) = \frac{1}{N} \sum_{i = 1}^{N} δ^{2} K (\frac{x - x_{i}}{δ}),$ where $δ$ is the kernel width, $K (\cdot)$ is the kernel function. If the Gaussian kernel function is adopted, thenEq. (2) can be updated as $P (x) = \frac{1}{N} \sum_{i = 1}^{N} \frac{1}{δ \sqrt{2 π}} \exp (- \frac{1}{2} {(\frac{x - x_{i}}{δ})}^{2}) .$ Therefore, the kernel density estimation of ${Ω x}_{i}^{A}$ and ${Ω x}_{i}^{B}$ can be expressed as follows when using the Gaussian kernel function, respectively,

(3) $- \frac{∥ {Ω x - Ω x}_{i}^{A} ∥^{2}}{2 δ^{2}} P_{A} (\tilde{x}) = P_{A} (Ω x) = \frac{1}{N \cdot δ \sqrt{2 π}} \sum_{i = 1}^{N} e,$

(4) $- \frac{∥ {Ω x - Ω x}_{i}^{B} ∥^{2}}{2 δ^{2}} P_{B} (\tilde{x}) = P_{B} (Ω x) = \frac{1}{N \cdot δ \sqrt{2 π}} \sum_{i = 1}^{N} e .$

In this study, the difference between $P_{A} (\tilde{x})$ and $P_{B} (\tilde{x})$ is measured by the mean square error, that is

(5) $J = \int {(P_{A} (\tilde{x}) - P_{B} (\tilde{x}))}^{2} d x .$

By minimizing $J$ , the two-view data $x_{i}^{A}$ and $x_{i}^{B}$ can be made to have the maximum commonality in the shared hidden space, and thus the challenge of excessive variability between samples from different views can be addressed. In order to solveEq. (6), we suppose that $G (Ω x, Ω x_{i}, δ^{2}) = \frac{1}{δ \sqrt{2 π}} e^{- \frac{Ω x - Ω x_{i}^{2}}{2 δ^{2}}}$ , then $P_{A} (\tilde{x})$ and $P_{B} (\tilde{x})$ can be updated as $P_{A} (\tilde{x}) = \frac{1}{N} \sum_{i = 1}^{N} G ({Ω x Ω x}_{i}^{A}, δ^{2})$ and $P_{B} (\tilde{x}) = \frac{1}{N} \sum_{i = 1}^{N} G ({Ω x Ω x}_{i}^{B}, δ^{2})$ . Therefore,Eq. (5) can be computed by $J = \int P_{A} (\tilde{x}) d x - 2 \int P_{A} (\tilde{x}) P_{B} (\tilde{x}) d x + \int P_{B} (\tilde{x}) d x$ . According toWang, Wang & Chung (2013),Hansen, Jaumard & Xiong (1994), we have $\int G (x, x_{i}, δ_{1}^{2}) G (x, x_{j}, δ_{2}^{2}) d x = G (x_{i}, x_{j}, δ_{1}^{2} + δ_{2}^{2})$ , Therefore, we have the following equations,

(6) $\int P_{A}^{2} (\tilde{x}) d x = \frac{1}{N^{2}} \sum_{i = 1}^{N} \sum_{j = 1}^{N} G ({\tilde{x}}_{i}^{A}, {\tilde{x}}_{j}^{A}, 2 δ^{2}) = \frac{1}{N} \sum_{i = 1}^{N} [\frac{1}{N} \sum_{j = 1}^{N} G ({\tilde{x}}_{i}^{A}, {\tilde{x}}_{j}^{A}, 2 δ^{2})]$

(7) $\int P_{B}^{2} (\tilde{x}) d x = \frac{1}{N^{2}} \sum_{i = 1}^{N} \sum_{j = 1}^{N} G ({\tilde{x}}_{i}^{B}, {\tilde{x}}_{j}^{B}, 2 δ^{2}) = \frac{1}{N} \sum_{i = 1}^{N} [\frac{1}{N} \sum_{j = 1}^{N} G ({\tilde{x}}_{i}^{B}, {\tilde{x}}_{j}^{B}, 2 δ^{2})]$

(8) $\int P_{A} (\tilde{x}) P_{B} (\tilde{x}) d x = \frac{1}{N^{2}} \sum_{i = 1}^{N} \sum_{j = 1}^{N} G ({\tilde{x}}_{i}^{A}, {\tilde{x}}_{j}^{B}, 2 δ^{2})$ where $\frac{1}{N} \sum_{j = 1}^{N} G ({\tilde{x}}_{i}^{A}, {\tilde{x}}_{j}^{A}, 2 σ^{2})$ can be taken as another estimation of $P_{A} ({\tilde{x}}_{i}^{A})$ . Therefore, $\int P_{A}^{2} (\tilde{x}) d x$ can be estimated by $\frac{1}{N} \sum_{j = 1}^{N} P_{A} ({\tilde{x}}_{i}^{A})$ , and further $\frac{1}{N}$ . Similarly, $\int P_{B}^{2} (\tilde{x}) d x$ can be estimated by $\frac{1}{N}$ . Thus, we finally have $J \approx \frac{1}{N} + \frac{1}{N} - \frac{2}{N^{2}} G ({\tilde{x}}_{i}^{A}, {\tilde{x}}_{j}^{B}, 2 δ^{2})$ . Therefore, we have the following objective,

(9) $\begin{array}{l} \arg \min_{Ω} J \approx \arg \min_{Ω} \sum_{i = 1}^{N} \sum_{j = 1}^{N} G ({\tilde{x}}_{i}^{A}, {\tilde{x}}_{j}^{B}, 2 δ^{2}) \\ s . t . Ω Ω^{T} = I_{r \times r} \end{array}$

However, it is difficult to solveEq. (9) directly. Thus, Taylor expansion can be used for getting an approximate solution. Hence, we have

(10) $G ({\tilde{x}}_{i}^{A}, {\tilde{x}}_{j}^{B}, 2 δ^{2}) = \frac{1}{\sqrt{2 π} δ} e^{- \frac{Ω x_{i}^{A} - Ω x_{j}^{B^{2}}}{4 σ^{2}}} \approx \frac{1}{\sqrt{2 π} δ} (1 - {(Ω x_{i}^{A} - Ω x_{j}^{B})}^{2})$

Therefore,Eq. (9) can be further updated as

(11) $\arg min_{Ω} \sum_{i = 1}^{N} \sum_{j = 1}^{N} {({Ω x}_{i}^{A} - {Ω x}_{j}^{B})}^{2}, s . t . Ω Ω^{T} = I_{r \times r}$ inEq. (11), implicit feature transformation matrix $Ω$ still cannot be solved directly, but can be solved by gradient descent method. Thus,Eq. (11) can be updated as

(12) $\begin{array}{l} J = \underset{Ω}{argmin} \sum_{i = 1}^{N} \sum_{j = 1}^{N} ({(x_{i}^{A})}^{T} Ω^{T} Ω x_{i}^{A} + {(x_{j}^{B})}^{T} Ω^{T} Ω x_{j}^{B} - 2 {(x_{i}^{A})}^{T} Ω^{T} Ω x_{j}^{B}) \\ s . t . Ω Ω^{T} = I_{r \times r} \end{array}$

The partial derivative of $J$ w.r.t. $Ω$ is

(13) $\frac{\partial J}{\partial Ω} = \sum_{i = 1}^{N} \sum_{j = 1}^{N} (2 {Ω x}_{i}^{A} {(x_{i}^{A})}^{T} + 2 {Ω x}_{j}^{B} {(x_{j}^{B})}^{T} - 2 Ω (x_{i}^{A} {(x_{i}^{A})}^{T} + x_{j}^{B} {(x_{j}^{B})}^{T}))$

Then the transformation matrix $Ω$ can be solved by gradient descent method, that is,

(14) $Ω \leftarrow Ω - η \frac{\partial J}{\partial Ω} (I_{r \times r} - Ω Ω^{T}) = Ω - η \nabla Ω$ where $η$ is the step size that can be solved by

(15) $\begin{array}{l} η = \sum_{i = 1}^{N} \sum_{j = 1}^{N} ({(x_{i}^{A})}^{T} (Ω^{T} \nabla Ω + \nabla Ω^{T} Ω) x_{i}^{A} + {(x_{j}^{B})}^{T} (Ω^{T} \nabla Ω + \nabla Ω^{T} Ω) x_{j}^{B} \\ - \frac{2 {(x_{i}^{A})}^{T} (Ω^{T} \nabla Ω + \nabla Ω^{T} Ω) x_{j}^{B})}{\sum_{i = 1}^{N} \sum_{j = 1}^{N} (2 {(x_{i}^{A})}^{T} \nabla Ω^{T} \nabla Ω x_{i}^{A}} + {(x_{j}^{B})}^{T} \nabla Ω^{T} \nabla Ω x_{j}^{B} - 4 {(x_{i}^{A})}^{T} \nabla Ω^{T} \nabla Ω x_{j}^{B}) \end{array}$

According to the above analysis and derivation, the algorithm for solving implicit feature transformation matrix $Ω$ is described as follows.

Multi-view learning based on shared hidden feature space

After determining the shared hidden space between two views, the extended space can be generated by combining the original space and the shared hidden space. Then, a multi-view classifier based on SVM is designed for multi-view data classification in the extended space. In existing multi-view learning mechanisms, it is generally assumed that each view can provide a classifier containing specific information, and classifiers constructed from different view tend to be consistent. Additionally, since views can provide specific information to each other, the proposed model establishes the objective function by considering the mutual information between two views. In summary, the proposed model, based on SVM, restructures the slack variables on each view, and then narrows the gap between the two views by using the corresponding regularization term. The objective function of multi-view learning based on shared hidden feature space can be formulated as

(16) $\begin{array}{l} a r g \min_{w_{A}, w_{B}, v_{A}, v_{B}, b_{A}, b_{B}} \frac{1}{2} ∥ w_{A} ∥^{2} + \frac{1}{2} ∥ w_{B} ∥^{2} + \frac{1}{2} ∥ v_{A} ∥^{2} + \frac{1}{2} ∥ v_{B} ∥^{2} + C^{A} \sum_{i = 1}^{N} ξ_{i}^{A} + C^{B} \sum_{i = 1}^{N} ξ_{i}^{B} + λ ∥ v_{A} - v_{B} ∥^{2} \\ s . t . y_{i} (w_{A}^{T} ϕ (x_{i}^{A}) + v_{A}^{T} ϕ (Ω x_{i}^{A}) + b_{A}) \geq 1 - ξ_{i}^{A} \\ y_{i} (w_{B}^{T} ϕ (x_{i}^{B}) + v_{B}^{T} ϕ (Ω x_{i}^{B}) + b_{B}) \geq 1 - ξ_{i}^{B} \\ ξ_{i}^{A}, ξ_{i}^{B} \geq 0, i = 1, 2, \dots, N \end{array}$ where $λ$ , $C^{A}$ and $C^{B}$ are the regularization parameters. Observe thatEq. (16) consists of three parts: the first four terms reflect the outcome risk in the original feature space and the shared hidden space respectively; the second two terms represent the empirical risk; and the third term reflects the difference between the two views in the shared hidden space. The objective function inEq. (16) strengthens the constraints based on the traditional SVM through the implicit mapping, so that the probability distributions of data from different views in the shared hidden space are as consistent as possible, which can well solve the problem described at the beginning of this study. In order to solveEq. (16) efficiently, the relevant Lagrangian multipliers are introduced according to the Lagrangian optimization theory, henceEq. (16) can be converted into the corresponding dual form as follows. The Lagrangian function corresponding toEq. (16) is

(17) $\begin{array}{l} L = \frac{1}{2} ∥ w_{A} ∥^{2} + \frac{1}{2} ∥ w_{B} ∥^{2} + \frac{1}{2} ∥ v_{A} ∥^{2} + \frac{1}{2} ∥ v_{B} ∥^{2} + C^{A} \sum_{i = 1}^{N} ξ_{i}^{A} \\ + C^{B} \sum_{i = 1}^{N} ξ_{i}^{B} + λ ∥ v_{A} - v_{B} ∥^{2} \\ + \sum_{i = 1}^{N} α_{i}^{A} (1 - ξ_{i}^{A} - y_{i} (w_{A}^{T} ϕ (x_{i}^{A}) + v_{A}^{T} ϕ (Ω x_{i}^{A}) + b_{A})) \\ + \sum_{i = 1}^{N} α_{i}^{B} (1 - ξ_{i}^{B} - y_{i} (w_{B}^{T} ϕ (x_{i}^{B}) + v_{B}^{T} ϕ (Ω x_{i}^{B}) \\ + b_{B})) - \sum_{i = 1}^{N} μ_{i}^{A} ξ_{i}^{A} - \sum_{i = 1}^{N} μ_{i}^{B} ξ_{i}^{B} \end{array}$ where $α_{i}^{A} \geq 0$ , $α_{i}^{B} \geq 0$ , $μ_{i}^{A} \geq 0$ , and $μ_{i}^{B} \geq 0$ are Lagrangian multipliers. By setting the partial derivatives of Lagrangian function $L$ with respect to $w_{A}$ , $w_{B}$ , $v_{A}$ , $v_{B}$ , $b_{A}$ , $b_{B}$ , $ξ_{i}^{A}$ , and $ξ_{i}^{B}$ to 0, we have

(18) $w_{A} = \sum_{i = 1}^{N} α_{i}^{A} y_{i} ϕ (x_{i}^{A}), w_{B} = \sum_{i = 1}^{N} α_{i}^{B} y_{i} ϕ (x_{i}^{B}),$

(19) $v_{A} = \frac{1 + 2 λ}{1 + 4 λ} \sum_{i = 1}^{N} α_{i}^{A} y_{i} ϕ (x_{i}^{A}) + \frac{2 λ}{1 + 4 λ} \sum_{i = 1}^{N} α_{i}^{B} y_{i} ϕ (x_{i}^{B}),$

(20) $v_{B} = \frac{1 + 2 λ}{1 + 4 λ} \sum_{i = 1}^{N} α_{i}^{B} y_{i} ϕ (x_{i}^{B}) + \frac{2 λ}{1 + 4 λ} \sum_{i = 1}^{N} α_{i}^{A} y_{i} ϕ (x_{i}^{A}),$

(21) $\sum_{i = 1}^{N} α_{i}^{A} y_{i} = 0, \sum_{i = 1}^{N} α_{i}^{B} y_{i} = 0,$

(22) $C_{A} = α_{i}^{A} + u_{i}^{A}, C_{B} = α_{i}^{B} + u_{i}^{B}$

By submittingEqs. (18–22) toEq. (16), we have the dual problem ofEq. (24), which can be defined as

(23) $\underset{\tilde{α}}{\arg m a x} - \frac{1}{2} {\tilde{α}}^{T} \tilde{α} + {\tilde{α}}^{T} 1. s . t . {\tilde{α}}^{T} f = 0, f = {[y^{T}, y^{T}]}^{T} {\tilde{α}}_{i} 0, \forall i$ where

(24) $\tilde{α} = {[α_{1}^{A}, α_{2}^{A}, \dots, α_{N}^{A}, α_{1}^{B}, α_{2}^{B}, \dots, α_{N}^{B}]}^{T},$

(25) $K_{A} = K (x^{A}, x^{A}) y y^{T} + \frac{1 + 2 λ}{1 + 4 λ} K (Ω x^{A}, Ω x^{A}) y y^{T}$

(26) $K_{B} = K (x^{B}, x^{B}) y y^{T} + \frac{1 + 2 λ}{1 + 4 λ} K (Ω x^{B}, Ω x^{B}) y y^{T}$

(27) $K_{A B} = \frac{2 λ}{1 + 4 λ} K (Ω x^{A}, Ω x^{B}) y y^{T}$

(28) $K = [\begin{matrix} K_{A} & K_{A B} \\ K_{A B} & K_{B} \end{matrix}]$

(29) $y = {[y_{1}, y_{2}, \dots, y_{N}]}^{T}$ and $K$ is the kernel function. It is obvious that the optimization ofEq. (23) can be considered as a QP problem, which can be solved according toDeng et al. (2013). The decision function of the proposed model in this study is defined as

(30) $f (x) = \frac{1}{2} (w_{A}^{T} ϕ (x^{A}) + v_{A}^{T} ϕ (Ω x^{A}) + b_{A} + w_{B}^{T} ϕ (x^{B}) + v_{B}^{T} ϕ (Ω x^{B}) + b_{B})$

The algorithm of multi-view learning based on shared hidden feature space can be obtained, as shown inAlgorithm 2. FromAlgorithm 2, we can find that the time complexity is mainly contributed by steps 1, 3 and 4. The time complexity ofAlgorithm 1 is $O (N r d + r^{2}) .$ The time complexity of step 3 is $O ({(r + d)}^{2})$ . The time complexity of step 4 is $O (N^{2})$ . Therefore, the time complexity ofAlgorithm 2 is $O (N r d + r^{2} + {(r + d)}^{2} + N^{2}) .$

Algorithm 1:

Shared hidden feature space generation.

Input:

x_{i}^{A}

x_{i}^{B}

, and

y = {[y_{i}]}_{i = 1, 2, \dots, N}

Output:

Ω

Procedures:

1. Initialize

Ω_{0} \in R^{r \times d}

t = 0

i t e r_{m a x}

δ = 1 e - 6

2. Repeat:

t = t + 1

4. Compute

\frac{\partial J}{\partial Ω}

and

η

byEqs. (13) and(15).

5. Update

Ω (t)

byEq. (14).

6. Until

Ω (t) - Ω (t - 1) \leq δ

t > i t e r_{m a x}

DOI:10.7717/peerj-cs.1874/table-7

Algorithm 2:

Multi-view learning based on shared hidden feature space.

Input: training samples of view-1:

{x_{i}^{A}, y_{i}}

, training samples of view-2:

{x_{i}^{B}, y_{i}}

, regularized parameters

C^{A}, C^{B}

and

λ

Output:

w_{A}^{T}

w_{B}^{T}

b_{A}

b_{B}

v_{A}

and

v_{B}

Procedures:

1. UseAlgorithm 1 to obtain

Ω

2. Use

Ω

to obtain the shared hidden space

3. Solve the

{\tilde{α}}_{i}

according toEq. (23)

4. Solve the

w_{A}^{T}

w_{B}^{T}

b_{A}

b_{B}

v_{A}

and

v_{B}

byEqs. (18)–(22)

5. Construct the decision function based on

w_{A}^{T}

w_{B}^{T}

b_{A}

b_{B}

v_{A}

and

v_{B}

DOI:10.7717/peerj-cs.1874/table-8

Experimental studies

Settings

To observe the merits of the proposed model, k-nearest neighbor (KNN) (Liu & Liu, 2016), support vector machine (SVM) (Liu & Liu, 2016), SVM2K (Farquhar et al., 2005), multi-view L2-SVM (MV-L2-SVM) (Huang, Chung & Wang, 2016), and alternative multi-view MED (AMVMED) (Chao & Sun, 2015) are introduced for comparison studies. Accuracy is used as the evaluation indicator in this study. SVM, SVM2K, MV-L2-SVM, and 2V-SVM-SH are all trained using a Gaussian kernel for experimentation. For all methods, ten-fold cross-validation (CV) is used to determine the optimal parameters.Table 3 provides the specific parameters and ranges used for each method. All experiments are conducted on a PC with a 16-core CPU with a clock speed of 3.40 GHz and 32 GB of memory. The programming environment was Matlab R2016a.

Table 3:

Parameter settings.

Method	Parameter settings
KNN	k ∈{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
SVM	C ∈{2e−8, 2e−7, …, 2e0, 2e1, …, 2e7, 2e8},σ ∈{2e−8, 2e−7, …, 2e0, 2e1, …, 2e7, 2e8}
SVM-2K	C^A ∈{2e−8, 2e−7, …, 2e0, 2e1, …, 2e7, 2e8},C^B ∈{2e−8, 2e−7, …, 2e0, 2e1, …, 2e7, 2e8},D ∈{2e−5, 2e−4, …, 2e0, 2e1, …, 2e4, 2e5},σ ∈{2e−8, 2e−7, …, 2e0, 2e1, …, 2e7, 2e8}
MV-L2-SVM	C^A∈{2e−8, 2e−7, …, 2e0, 2e1, …, 2e7, 2e8},C^B ∈{2e−8, 2e−7, …, 2e0, 2e1, …, 2e7, 2e8},σ ∈{2e−8, 2e−7, …, 2e0, 2e1, …, 2e7, 2e8}
AMVMED	C^A ∈{2e−8, 2e−7, …, 2e0, 2e1, …, 2e7, 2e8},C^B∈{2e−8, 2e−7, …, 2e0, 2e1, …, 2e7, 2e8}, γ ∈{0.1, 0.2, …, 0.9}
Proposed model	C^A ∈{2e−8, 2e−7, …, 2e0, 2e1, …, 2e7, 2e8},C^B ∈ {2e−8, 2e−7, …, 2e0, 2e1, …, 2e7, 2e8},σ ∈{2e−8, 2e−7, …, 2e0, 2e1, …, 2e7, 2e8},λ ∈{0.1, 0.2, …, 0.9, 1};

DOI:10.7717/peerj-cs.1874/table-3

To construct a two-view learning scenario, based on “Data”, three feature extraction methods, namely wavelet packet decomposition (WPD), short-time Fourier transform (STFT) and kernel principal component analysis (KPCA) are adopted, to extract time-frequency features, frequency-domain features and time-domain features from the original EEG signals, as shown inFig. 2. Finally, 12 datasets are constructed, as shown inTable 4.

Table 4:

Two-view learning scenarios.

Datasets	Classification tasks	Views (view-A, view-B)	#Sample size
DS1	ABvs CDE	WPD, STFT	500
DS2	ABvs CDE	WPD, KPCA	500
DS3	ABvs CDE	STFT, KPCA	500
DS4	ABvs CD	WPD, STFT	400
DS5	ABvs CD	WPD, KPCA	400
DS6	ABvs CD	STFT, KPCA	400
DS7	ABvs DE	WPD, STFT	400
DS8	ABvs DE	WPD, KPCA	400
DS9	ABvs DE	STFT, KPCA	400
DS10	ABvs CE	WPD, STFT	400
DS11	ABvs DE	WPD, KPCA	400
DS12	ABvs CE	STFT, KPCA	400

DOI:10.7717/peerj-cs.1874/table-4

Experimental results and analysis

The experimental results are reported inTable 5. We can see fromTable 5 that the proposed model wins the best performance on most datasets. Only on DS5, DS9, the proposed model performs worse than SVM-2K and MV-L2-SVM. The advantages of the proposed model indicate the promising ability of the shared hidden space. From the promising results, it can be found that by constructing the expanded space and utilizing the information of both the shared hidden space and the original space for learning, thereby fully utilizing the relevant information of samples within and across views, the proposed model effectively solves the problem that the difference between samples of the same class from different views is greater than the difference between samples of different classes from the same view. The experimental results also indicate the power of KDE which is used to construct the shared hidden space.

Table 5:

Classification performance in terms of accuracy on all multi-view learning scenarios.

Datasets	KNN_A (KNN on view-A)	KNN_B (KNN on view-B)	SVM_A (SVM on view-A)	SVM_B (SVM on view-B)	SVM-2K	MV-L2-SVM	AMVMED	Proposed model
DS1	0.9098 (0.0019)	0.9176 (0.0045)	0.9432 (0.0076)	0.9521 (0.0087)	0.9754 (0.0063)	0.9543 (0.0065)	0.9643 (0.0043)	0.9876 (0.0023)
DS2	0.9213 (0.0032)	0.9098 (0.0021)	0.9583 (0.0065)	0.9321 (0.0087)	0.9654 (0.0063)	0.9431 (0.0065)	0.9546 (0.0043)	0.9768 (0.0023)
DS3	0.9223 (0.0034)	0.9098 (0.0021)	0.9345 (0.0022)	0.9321 (0.0087)	0.9654 (0.0023)	0.9437 (0.0013)	0.9554 (0.0063)	0.9764 (0.0034)
DS4	0.9214 (0.0034)	0.9097 (0.0011)	0.9067 (0.0073)	0.9164 (0.0027)	0.9567 (0.0032)	0.9511 (0.0023)	0.9598 (0.0044)	0.9690 (0.0036)
DS5	0.9214 (0.0034)	0.9481 (0.0023)	0.9875 (0.0046)	0.9467 (0.0056)	0.9892 (0.0017)	0.9564 (0.0054)	0.9578 (0.0023)	0.9743 (0.0045)
DS6	0.9324 (0.0052)	0.9481 (0.0023)	0.9875 (0.0046)	0.9467 (0.0056)	0.9653 (0.0018)	0.9511 (0.0034)	0.9587 (0.0033)	0.9811 (0.0056)
DS7	0.9331 (0.0026)	0.9325 (0.0026)	0.9481 (0.0017)	0.9435 (0.0037)	0.9563 (0.0032)	0.9673 (0.0026)	0.9543 (0.0046)	0.9781 (0.0015)
DS8	0.9331 (0.0026)	0.9221 (0.0025)	0.9481 (0.0017)	0.9387 (0.0026)	0.9612 (0.0018)	0.9671 (0.0056)	0.9409 (0.0055)	0.9812 (0.0035)
DS9	0.9631 (0.0015)	0.9221 (0.0025)	0.9511 (0.0090)	0.9387 (0.0026)	0.9654 (0.0143)	0.9786 (0.0087)	0.9765 (0.0049)	0.9760 (0.0054)
DS10	0.9318 (0.0079)	0.9543 (0.0056)	0.9345 (0.0054)	0.9245 (0.0064)	0.9534 (0.0048)	0.9501 (0.0047)	0.9534 (0.0019)	0.9756 (0.0087)
DS11	0.9134 (0.0078)	0.9215 (0.0056)	0.9381 (0.0054)	0.9275 (0.0034)	0.9452 (0.0036)	0.9517 (0.0045)	0.9732 (0.0017)	0.9789 (0.0087)
DS12	0.9532 (0.0035)	0.9378 (0.0043)	0.9785 (0.0038)	0.9634 (0.0014)	0.9763 (0.0013)	0.9587 (0.0054)	0.9661 (0.0064)	0.9898 (0.0034)
Average	0.9311	0.9333	0.9472	0.9434	0.9646	0.9561	0.9596	0.9787

DOI:10.7717/peerj-cs.1874/table-5

Note:

Bold entries indicate the best performance achieved by the corresponding method.

Statistical analysis

We use the Friedman test (Zimmerman & Zumbo, 1993;Sakamoto et al., 2015) to conduct a statistical analysis of the experimental results on all methods across all datasets. The Friedman test is a non-parametric testing method that can be used to analyze whether there are significant differences in performance among multiple methods on multiple datasets. The principle is to first obtain the average ranking of each method’s performance on all datasets, and then compare whether these rankings are the same. If they are the same, it indicates that all methods have the same performance, otherwise it suggests that there are significant differences in performance among all methods. If there are significant differences among all methods, we further use a Holmpost-hoc hypothesis test to specifically analyze which methods and our proposed algorithm have significant differences. FromFig. 5, we see that 2V-SVM-SH wins the best ranking result. Thep-values embedded inFig. 5 computed by Friedman test hint that there are significant differences among different models. FromTable 6, it can be seen that all hypothesis is rejected except the proposed modelvs AMVMED and the proposed modelvs SVM-2K. These results indicate that the proposed model performs significantly better than KNN-A, KNN-B, SVM-B, SVM-A and MV-L2-SVM. Although the hypothesis of the proposed modelvs AMVMED and the proposed modelvs SVM-2K is not reject, the low p-value of the proposed modelvs AMVMED and the proposed modelvs SVM-2K also indicates the reveal the competition of the proposed model.

Figure 5:Friedman rankings of all models.
Download full-size image
DOI:10.7717/peerj-cs.1874/fig-5

Table 6:

Holm test results with α = 0.05.

$i$	Algorithm	$z = (R_{0} - R_{i}) / S E$	$p$	$H o l m = α / i$	Hypothesis
7	KNN-A	5.583333	0	0.007143	Rejected
6	KNN-B	5.25	0	0.008333	Rejected
5	SVM-B	4.166667	0.000031	0.01	Rejected
4	SVM-A	3.666667	0.000246	0.0125	Rejected
3	MV-L2-SVM	2.5	0.012419	0.016667	Rejected
2	AMVMED	2.125	0.033587	0.025	Not rejected
1	SVM-2K	1.375	0.169131	0.05	Not rejected

DOI:10.7717/peerj-cs.1874/table-6

Conclusions

In this study, a multi-view support vector machine based on a shared hidden space is constructed using kernel density estimation. The method is designed to address the problem of decreased recognition performance due to the difference in sample characteristics between different view models in multi-view learning. The method involves incorporating SVM into the shared hidden space, resulting in an effective solution to the problem of solving the classic QP problem. Experimental results on EEG-based epilepsy diagnosis demonstrate that our proposed method is better able to extract complementary information between different view models than other methods.

In practical applications, annotating training samples is often a time-consuming task. Therefore, in subsequent research, we intend to extend the multi-view algorithm proposed in this article to transfer learning scenarios, aiming to reduce the reliance on labeled samples.

Supplemental Information

Source code.

DOI:10.7717/peerj-cs.1874/supp-1

Download

EEG datasets of five group.

DOI:10.7717/peerj-cs.1874/supp-2

Download

Movatterモバイル変換

Electroencephalography (EEG) based epilepsy diagnosis via multiple feature space fusion using shared hidden space-driven multi-view learning

Electroencephalography (EEG) based epilepsy diagnosis via multiple feature space fusion using shared hidden space-driven multi-view learning

Abstract

Introduction

Data

Figure 1:EEG signals from five groups.

Frequency-domain representation extraction

Figure 2:Example of frequency-domain representation.

Time-domain feature extraction

Figure 3:Example of time-domain representation.

Time-frequency representation extraction

Figure 4:Example of time-frequency representation.

Methodology

Construction of shared hidden feature space

Multi-view learning based on shared hidden feature space

Experimental studies

Settings

Experimental results and analysis

Statistical analysis

Figure 5:Friedman rankings of all models.

Conclusions

Supplemental Information

Source code.

EEG datasets of five group.

Download article

Report a problem

Follow this publication for updates

Change notification settings or unfollow

Top referralsunique visitors

Share this publication

Metrics

Links

Articles citing this paper