CN118333107B

Movatterモバイル変換

Info

Publication number: CN118333107B
Application number: CN202410772052.4A
Authority: CN
Inventors: 黄正行; 丁正尧; 吴飞
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2024-06-16
Filing date: 2024-06-16
Publication date: 2024-08-06
Anticipated expiration: 2044-06-16
Also published as: CN118333107A

Abstract

Translated fromChinese

本发明公开了一种基于扩散模型的PPG生成ECG跨模态生成方法，包含：获取ECG、PPG配对数据集；使用时序数据转图像数据的方法将ECG和PPG数据转化为图像数据；构建深度学习模型；对深度模型进行第一阶段训练，训练VQGAN模型和CLIP模型，以及第二阶段训练，训练扩散模型；定义总损失函数；使用超参数搜索，在训练集上采用不同的超参数组合进行训练，并在验证集上验证其性能，挑选出最佳超参数组合，在测试集上测试采取最佳超参数训练的深度学习模型。本发明的基于扩散模型的PPG生成ECG跨模态生成方法，能够有效地从PPG信号生成高质量的ECG信号，提升了信号生成的准确性和可靠性。

The present invention discloses a PPG-ECG cross-modal generation method based on a diffusion model, comprising: obtaining ECG and PPG paired data sets; using a method for converting time series data to image data to convert ECG and PPG data into image data; building a deep learning model; performing a first-stage training of the deep model, training a VQGAN model and a CLIP model, and a second-stage training, training a diffusion model; defining a total loss function; using hyperparameter search, using different hyperparameter combinations for training on a training set, and verifying its performance on a validation set, selecting the best hyperparameter combination, and testing the deep learning model trained with the best hyperparameter on a test set. The PPG-ECG cross-modal generation method based on a diffusion model of the present invention can effectively generate high-quality ECG signals from PPG signals, and improve the accuracy and reliability of signal generation.

Description

Translated fromChinese

基于扩散模型的PPG生成ECG跨模态生成方法PPG to ECG cross-modal generation method based on diffusion model

技术领域Technical Field

本发明属于数据处理领域,具体涉及一种基于扩散模型的PPG生成ECG跨模态生成方法。The present invention belongs to the field of data processing, and specifically relates to a PPG-ECG cross-modal generation method based on a diffusion model.

背景技术Background technique

心电图（ECG）作为心脏监测的黄金标准，能够为筛查和诊断心血管疾病提供宝贵的洞察，尤其在检测心律失常、心肌梗塞等心脏异常方面展现出其临床意义。然而，ECG监测在实际应用中面临着依赖专业设备和训练有素的人员的挑战，这不仅增加了监测成本，也限制了其在资源有限的环境中的可访问性。As the gold standard for cardiac monitoring, electrocardiogram (ECG) can provide valuable insights for screening and diagnosing cardiovascular diseases, especially in detecting cardiac abnormalities such as arrhythmias and myocardial infarction. However, ECG monitoring faces the challenge of relying on professional equipment and trained personnel in practical applications, which not only increases the cost of monitoring, but also limits its accessibility in resource-limited settings.

与传统ECG相比，光电容积描记图（PPG）技术作为一种便捷且经济的心血管监测替代方案而兴起。PPG技术通过简单的光学传感器捕捉组织微血管床中的血容量变化，允许连续、无创地监测心血管系统，成为个人健康跟踪的有吸引力的选项。然而，PPG在临床诊断中的应用受限于其提供的心脏信息不如ECG丰富，主要反映血流动力学，更适合测量心率和血氧水平而非诊断特定的心脏状况如心律失常或缺血性心脏病。Compared with traditional ECG, photoplethysmography (PPG) technology has emerged as a convenient and economical alternative to cardiovascular monitoring. PPG technology captures blood volume changes in tissue microvascular beds through simple optical sensors, allowing continuous and non-invasive monitoring of the cardiovascular system, making it an attractive option for personal health tracking. However, the application of PPG in clinical diagnosis is limited by the fact that the cardiac information it provides is not as rich as ECG, mainly reflecting hemodynamics, and is more suitable for measuring heart rate and blood oxygen levels rather than diagnosing specific heart conditions such as arrhythmias or ischemic heart disease.

鉴于这些限制，从PPG输入合成ECG信号提出了一个引人注目但充满挑战的新领域。Given these limitations, synthesizing ECG signals from PPG inputs presents a compelling but challenging new area.

发明内容Summary of the invention

本发明提供了一种基于扩散模型的PPG生成ECG跨模态生成方法解决上述提到的技术问题，具体采用如下的技术方案：The present invention provides a PPG-ECG cross-modal generation method based on a diffusion model to solve the above-mentioned technical problems, and specifically adopts the following technical solutions:

一种基于扩散模型的PPG生成ECG跨模态生成方法，包含：A PPG-ECG cross-modal generation method based on a diffusion model, comprising:

获取ECG、PPG配对数据集，将其按比例划分为训练集、验证集和测试集；Obtain ECG and PPG paired data sets and divide them into training set, validation set and test set in proportion;

使用时序数据转图像数据的方法将ECG和PPG数据转化为图像数据；The ECG and PPG data are converted into image data using the method of converting time series data into image data;

构建深度学习模型；Build deep learning models;

第一阶段：对深度学习模型的前置模型进行训练，包含：The first stage: training the pre-model of the deep learning model, including:

训练VQGAN模型，采用自监督学习方法，采用VQGAN模型处理ECG图像数据，将ECG图像数据压缩至更低维的隐空间，并从该隐空间解压缩回原始空间，以此来深入学习数据的核心特征；Training the VQGAN model, using a self-supervised learning method, using the VQGAN model to process ECG image data, compressing the ECG image data into a lower-dimensional latent space, and decompressing it back to the original space from the latent space, in order to deeply learn the core features of the data;

训练CLIP模型，对PPG和ECG图像数据进行对比学习，采用CLIP模型的对比学习结构，优化配对的ECG和PPG图像数据在隐空间内的特征对齐；Train the CLIP model to perform comparative learning on PPG and ECG image data. Use the comparative learning structure of the CLIP model to optimize the feature alignment of paired ECG and PPG image data in the latent space.

第二阶段：对扩散模型进行训练，使用预训练的VQGAN模型和CLIP模型中的PPG编码器的参数，在训练过程中，VQGAN模型的参数被冻结以保持其编码能力稳定，而PPG编码器的参数保持活跃，以便进一步优化和学习，在扩散模型的实施中，首先通过VQGAN模型将ECG图像数据特征编码到隐空间，随后，这些编码后的特征经历扩散模型的正向步骤，其中加入高斯噪声，模拟数据生成的随机性，接着是反向去噪训练，这一过程中引入了CLIP模型中的PPG 编码器提供的PPG特征，通过交叉注意力机制，这些PPG特征被用来指导ECG数据的再生，确保生成数据的精确性和相关性，去噪训练完成后，再次利用VQGAN模型的解码器将隐空间中的特征解码，重构出真实空间中的ECG图像，最后，通过一个从图像数据到一维时序数据的逆转换过程，完成ECG时序数据的重建；The second stage: training the diffusion model, using the parameters of the pre-trained VQGAN model and the PPG encoder in the CLIP model. During the training process, the parameters of the VQGAN model are frozen to keep its encoding ability stable, while the parameters of the PPG encoder remain active for further optimization and learning. In the implementation of the diffusion model, the ECG image data features are first encoded into the latent space through the VQGAN model. Subsequently, these encoded features undergo the forward step of the diffusion model, in which Gaussian noise is added to simulate the randomness of data generation, followed by reverse denoising training. In this process, the PPG features provided by the PPG encoder in the CLIP model are introduced. Through the cross-attention mechanism, these PPG features are used to guide the regeneration of ECG data to ensure the accuracy and relevance of the generated data. After the denoising training is completed, the decoder of the VQGAN model is used again to decode the features in the latent space and reconstruct the ECG image in the real space. Finally, through an inverse conversion process from image data to one-dimensional time series data, the reconstruction of ECG time series data is completed;

定义总损失函数；Define the total loss function;

使用超参数搜索，在训练集上采用不同的超参数组合进行训练，并在验证集上验证其性能，挑选出最佳超参数组合，在测试集上测试采取最佳超参数训练的深度学习模型。Use hyperparameter search to train with different hyperparameter combinations on the training set, verify its performance on the validation set, select the best hyperparameter combination, and test the deep learning model trained with the best hyperparameters on the test set.

进一步地，ECG、PPG配对数据集包含：Bidmc数据集、UQVital数据集和TBME数据集。Furthermore, the ECG and PPG paired datasets include: Bidmc dataset, UQVital dataset and TBME dataset.

进一步地，利用VQGAN模型作为自动编码器来处理ECG图像数据，VQGAN模型包括编码器、代码本、解码器和鉴别器四个部分，VQGAN模型仅用于编码和解码心电图信号，不涉及生成组件，当将ECG信号转换为图像，其中H和W代表ECG信号转换成图像后的长度和宽度，并通过编码器处理时，产生一个中间变量，其中h，w，nz 代表中间变量的长度宽度和隐空间中的特征维度，然后在代码本中搜索最近的编码，在自监督的训练过程中还融合了GAN组件的对抗损失，总损失如下：Furthermore, the VQGAN model is used as an autoencoder to process ECG image data. The VQGAN model consists of four parts: encoder, codebook, decoder, and discriminator. The VQGAN model is only used to encode and decode electrocardiogram signals and does not involve generation components. When converting ECG signals into images, , where H and W represent the length and width of the ECG signal after it is converted into an image, and when processed by the encoder, an intermediate variable is generated , where h, w, nz represent the length and width of the intermediate variables and the feature dimension in the latent space, and then search for the nearest encoding in the code book , the adversarial loss of the GAN component is also integrated in the self-supervised training process, and the total loss is as follows:

； ;

其中，代表重建和向量化操作的损失，E为编码器，G为生成器，Z是码本，是生成操作中的对抗GAN损失，D是GAN的判别器，为权重系数。in, represents the loss of reconstruction and vectorization operations, E is the encoder, G is the generator, Z is the codebook, is the adversarial GAN loss in the generation operation, D is the discriminator of GAN, is the weight coefficient.

进一步地，采用了CLIP模型的架构，使用视觉变换器作为PPG和ECG图像数据的编码器，将这些数据编码到128维的空间中，训练过程中采用InfoNCE损失，定义如下：Furthermore, the architecture of the CLIP model is adopted, and the visual transformer is used as the encoder of the PPG and ECG image data to encode these data into a 128-dimensional space. The InfoNCE loss is used in the training process, which is defined as follows:

； ;

其中，使用点积来表示编码表示和之间的相似性，和分别代表ECG和PPG图像数据在隐空间中的特征，τ是温度参数，用于调整相似性得分，N 是批次中负样本的数量。in, Using dot product to represent the encoded representation and The similarities between and represent the features of ECG and PPG image data in the latent space, τ is the temperature parameter used to adjust the similarity score, and N is the number of negative samples in the batch.

进一步地，使用第一阶段训练的VQGAN模型和CLIP模型中的PPG编码器控制ECG信号的生成，Furthermore, the generation of ECG signals is controlled using the VQGAN model trained in the first stage and the PPG encoder in the CLIP model.

在扩散的前向步骤中，ECG信号经过VQGAN模型的CNN编码器变为潜在空间向量，假设在潜在空间中遵循分布，则扩散前向过程随后在T时间步内逐渐向添加高斯噪声，转化为，过程可表示为：In the forward step of diffusion, the ECG signal The CNN encoder of the VQGAN model becomes a latent space vector , assuming Following the distribution in latent space , then the diffusion forward process gradually moves to Adding Gaussian noise, it is transformed into , the process can be expressed as:

； ;

其中，,，噪声调度参数随着时间步t的增加而增加，是用于控制噪声强度的量，是单位向量，t是时间步；in, , , noise scheduling parameters As the time step t increases, is the amount used to control the noise intensity, is a unit vector, t is the time step;

反向步骤通过贝叶斯公式逐步推导出反向分布来重建原始的，通过神经网络，预测中间变量：The reverse step gradually derives the reverse distribution through the Bayesian formula To recreate the original , through the neural network, predict the intermediate variables:

； ;

其中，由神经网络预测，c是从PPG编码器得到的控制条件，，，都是用于强度的量；in, predicted by the neural network, c is the control condition obtained from the PPG encoder, , , All are quantities used for intensity;

整个过程的优化目标为：The optimization goal of the whole process is:

； ;

其中，x表示ECG信号，y是PPG信号，表示VQGAN模型的编码器，是使用具有注意力机制的Unet模型预测的噪声，而是来自CLIP模型的PPG信号的编码器。Among them, x represents ECG signal, y is PPG signal, represents the encoder of the VQGAN model, is the noise predicted using the Unet model with attention mechanism, and is the encoder of the PPG signal from the CLIP model.

进一步地，深度学习模型通过Adam优化器最小化所述总损失函数进行训练，超参数通过网格搜索方法确定。Furthermore, the deep learning model is trained by minimizing the total loss function through the Adam optimizer, and the hyperparameters are determined by the grid search method.

进一步地，通过使用格拉姆角场和马尔科夫场将时序的ECG和PPG数据转化为图像数据。Furthermore, the time-series ECG and PPG data are converted into image data by using Gram angle field and Markov field.

本发明的有益之处在于所提供的基于扩散模型的PPG生成ECG跨模态生成方法，采用最新的扩散模型技术，结合潜在空间的强大表征能力和高质量的图像重建技术，能够有效地从PPG信号生成高质量的ECG信号，提升了信号生成的准确性和可靠性。The benefit of the present invention lies in the fact that the provided PPG-to-ECG cross-modal generation method based on the diffusion model adopts the latest diffusion model technology, combined with the powerful characterization ability of the latent space and high-quality image reconstruction technology, and can effectively generate high-quality ECG signals from PPG signals, thereby improving the accuracy and reliability of signal generation.

本发明的有益之处还在于所提供的基于扩散模型的PPG生成ECG跨模态生成方法，通过将时序数据转换为图像数据的创新方法，如格拉姆角场（GAF）和马尔科夫场（MTF），能够更有效地利用现有的图像处理深度学习模型，从而在心脏健康监测领域开辟了新的技术路径。The benefit of the present invention also lies in the provided diffusion model-based PPG-to-ECG cross-modal generation method, which can more effectively utilize existing image processing deep learning models through innovative methods of converting time series data into image data, such as Gram Angular Field (GAF) and Markov Field (MTF), thereby opening up a new technical path in the field of cardiac health monitoring.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本申请实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本申请的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动性的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings required for use in the embodiments or the description of the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present application. For ordinary technicians in this field, other drawings can be obtained based on these drawings without paying creative labor.

图1为本发明实施例所提供的第一阶段模型训练ECG的VQGAN自动编码器模型训练方法；FIG1 is a VQGAN autoencoder model training method for ECG in the first stage model training provided by an embodiment of the present invention;

图2为本发明实施例所提供的第一阶段模型训练PPG和ECG的CLIP模型的训练方法；FIG2 is a training method for the CLIP model of PPG and ECG in the first stage of model training provided by an embodiment of the present invention;

图3为本发明实施例所提供的第二阶段模型训练基于扩散模型的PPG引导ECG生成的训练方法。FIG3 is a diagram showing a training method for generating PPG-guided ECG based on a diffusion model in the second stage model training provided by an embodiment of the present invention.

具体实施方式Detailed ways

下面详细描述本申请的实施例，实施例的示例在附图中示出，其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的，旨在用于解释本申请，而不能理解为对本申请的限制。The embodiments of the present application are described in detail below, and examples of the embodiments are shown in the accompanying drawings, wherein the same or similar reference numerals throughout represent the same or similar elements or elements having the same or similar functions. The embodiments described below with reference to the accompanying drawings are exemplary and are intended to be used to explain the present application, and should not be construed as limiting the present application.

本申请公开一种基于扩散模型的PPG生成ECG跨模态生成方法，包含：The present application discloses a method for generating ECG cross-modality from PPG based on a diffusion model, comprising:

S1：获取ECG、PPG配对数据集，将其按比例划分为训练集、验证集和测试集。S1: Obtain the ECG and PPG paired datasets and divide them into training set, validation set, and test set in proportion.

ECG、PPG配对数据集包含：Bidmc数据集、UQVital数据集和TBME数据集。这些数据集来自多样的数据源，涵盖了丰富的生理状态和条件，非常适合机器学习方法分析。以下是简要介绍各个数据集的情况：ECG and PPG paired datasets include: Bidmc dataset, UQVital dataset, and TBME dataset. These datasets come from a variety of data sources, covering a wide range of physiological states and conditions, and are very suitable for machine learning analysis. The following is a brief introduction to each dataset:

Bidmc：该数据集收集自 53 名重症监护室患者，包括约 7 小时的多导联心电图（导联 II、V 和 AVR）和 PPG 信号，记录频率均为 125 Hz。我们在工作中仅使用导联 IIECG。Bidmc: This dataset was collected from 53 ICU patients and includes about 7 hours of multi-lead ECG (leads II, V, and AVR) and PPG signals, all recorded at 125 Hz. We only use lead IIECG in our work.

UQVital：该数据集有成对 II 号导联心电图和以 300 赫兹采样的 PPG 信号。这些数据是在医务人员的监督下从 32 名麻醉患者身上记录下来的。UQVital: This dataset has paired lead II ECG and PPG signals sampled at 300 Hz. These data were recorded from 32 anesthetized patients under medical supervision.

TBME：该数据集包括约 5.6 小时的成对 II 号导联心电图和以 300 赫兹采样的PPG 信号。这些数据是在医务人员的监督下从 42 名受试者身上记录下来的。TBME: This dataset consists of approximately 5.6 hours of paired lead II ECG and PPG signals sampled at 300 Hz. The data were recorded from 42 subjects under medical supervision.

S2：使用时序数据转图像数据的方法将ECG和PPG数据转化为图像数据。S2: Convert ECG and PPG data into image data using the method of converting time series data into image data.

在本申请中，通过使用格拉姆角场（GAF）和马尔科夫场（MTF）将时序数据转化为图像数据，从而能够有效利用已经在自然图像上预训练好的模型参数。这种方法不仅节约了大量的硬件资源和时间，而且提高了模型训练的效率。格拉姆角场是将时间序列数据点转换为角度，并计算这些角度之间的余弦值来构建图像。这样生成的图像能够有效地捕捉到时间序列的周期性和趋势信息，为深度学习模型提供了丰富的特征来进行学习和分析。马尔科夫场则侧重于分析时间序列数据的状态转换。通过将时间序列值离散化成不同的状态，并计算状态之间的转移概率，构建出的图像可以详细反映数据的动态变化过程。通过将时序数据转换为图像数据，本申请方法使得原本针对自然图像训练的深度学习模型可以无缝对接到时间序列分析任务上，极大地扩展了这些模型的应用范围和效果。In this application, the time series data is converted into image data by using the Gram Angular Field (GAF) and the Markov Field (MTF), so that the model parameters that have been pre-trained on natural images can be effectively utilized. This method not only saves a lot of hardware resources and time, but also improves the efficiency of model training. The Gram Angular Field converts the time series data points into angles and calculates the cosine values between these angles to construct an image. The image generated in this way can effectively capture the periodicity and trend information of the time series, and provides rich features for deep learning models to learn and analyze. The Markov field focuses on analyzing the state transition of time series data. By discretizing the time series values into different states and calculating the transition probabilities between states, the constructed image can reflect the dynamic change process of the data in detail. By converting time series data into image data, the method of this application enables the deep learning models originally trained for natural images to be seamlessly connected to time series analysis tasks, greatly expanding the application scope and effect of these models.

具体地，对于格拉姆角场，给定一个包含个实数观测值的时间序列，通过以下方式对进行重新缩放，使所有值落在区间 [-1, 1] 或 [0, 1] 内：Specifically, for the Gram angle field, given a field containing A time series of real observations , through the following methods Rescale so that all values fall in the interval [-1, 1] or [0, 1]:

； ;

而后将其表示为极坐标形式：Then express it in polar coordinate form:

； ;

最后，格拉姆求和角场（GASF）和格拉姆差分角场（GADF）定义如下：Finally, the Gram Sum Angular Field (GASF) and Gram Difference Angular Field (GADF) are defined as follows:

； ;

其中，表示时间序列数据点在经过归一化和极坐标转换后的角度值，表示经过归一化和极坐标转换后的时间序列表示，代表的转置。上式后半部分等号即为余弦和正弦的和角公式。in, Represents the angle value of the time series data point after normalization and polar coordinate conversion. represents the time series representation after normalization and polar coordinate transformation, represent The equal sign in the second half of the above equation is the sum angle formula of cosine and sine.

对于马尔科夫场，其定义如下：For Markov fields, the definition is as follows:

； ;

这个矩阵的每个元素表示时间序列中第个数据点在状态和第个数据点在状态之间的转移概率。整个矩阵提供了一个全面的视图，描述了时间序列中各个状态之间的转移关系。Each element of this matrix Indicates the time series Data points in state and Data points in state The entire matrix provides a comprehensive view that describes the transition relationship between each state in the time series.

根据上述内容，我们可以使用 GASF、GADF 和 MTF 将时间序列转换成三通道图像。此外，对于 GASF，可以通过格拉姆矩阵的主对角线重构时间序列，但必须确保在方程 1中选择的归一化为（0,1）。转换公式如下：Based on the above, we can use GASF, GADF and MTF to convert the time series into a three-channel image. In addition, for GASF, the time series can be reconstructed through the main diagonal of the Gram matrix, but it is necessary to ensure that the normalization selected in Equation 1 is (0,1). The conversion formula is as follows:

； ;

其中为格拉姆矩阵的主对角线元素，为重构得到的时间序列。in are the main diagonal elements of the Gram matrix, is the reconstructed time series.

在实际操作中，本申请将每个人的ECG信号和PPG信号进行切分，划分成64hz，4s的片段输入给模型进行处理。In actual operation, this application divides each person's ECG signal and PPG signal into 64hz, 4s segments and inputs them into the model for processing.

S3：构建深度学习模型。本申请的深度学习模型包含第一阶段的向量量化生成对抗网络（VQGAN）模型，CLIP模型和第二阶段的扩散模型。S3: Build a deep learning model. The deep learning model of this application includes the first-stage vector quantization generative adversarial network (VQGAN) model, the CLIP model, and the second-stage diffusion model.

S4：第一阶段：对深度学习模型的前置模型进行训练，包含：S4: The first stage: training the pre-model of the deep learning model, including:

训练VQGAN模型，采用自监督学习方法，采用VQGAN模型处理ECG图像数据，将ECG图像数据压缩至更低维的隐空间，并从该隐空间解压缩回原始空间，以此来深入学习数据的核心特征。这不仅提高了数据压缩效率，也为后续的生成过程提供了基础。Training the VQGAN model, using a self-supervised learning method, using the VQGAN model to process ECG image data, compressing the ECG image data into a lower-dimensional latent space, and decompressing it back to the original space from the latent space, in order to deeply learn the core features of the data. This not only improves the efficiency of data compression, but also provides a basis for the subsequent generation process.

如图1所示，具体地，ECG自编码器学习采用VQGAN进行处理。VQGAN结合了向量量化（VQ）技术和生成对抗网络（GANs）的原理，通过离散的潜在表示学习稳定且高质量的图像重建，特别适用于对保真度要求极高的医疗信息处理。利用VQGAN模型作为自动编码器来处理ECG图像数据，VQGAN模型主要包括编码器、代码本、解码器和鉴别器四个部分。在本申请中，VQGAN模型仅用于编码和解码心电图信号，不涉及生成组件，当将ECG信号转换为图像，其中H和W代表ECG信号转换成图像后的长度和宽度，并通过编码器处理时，产生一个中间变量，其中h，w，n_z代表中间变量的长度宽度和隐空间中的特征维度，然后在代码本中搜索最近的编码。As shown in Figure 1, specifically, ECG autoencoder learning is processed using VQGAN. VQGAN combines the principles of vector quantization (VQ) technology and generative adversarial networks (GANs) to learn stable and high-quality image reconstruction through discrete potential representations, which is particularly suitable for medical information processing with extremely high fidelity requirements. The VQGAN model is used as an automatic encoder to process ECG image data. The VQGAN model mainly includes four parts: encoder, code book, decoder, and discriminator. In this application, the VQGAN model is only used to encode and decode electrocardiogram signals, and does not involve generation components. When converting ECG signals into images , where H and W represent the length and width of the ECG signal after it is converted into an image, and when processed by the encoder, an intermediate variable is generated , where h, w, n_z represent the length and width of the intermediate variables and the feature dimension in the latent space, and then search for the nearest encoding in the code book .

该过程可描述为：The process can be described as:

； ;

对公式内变量解释如下：The variables in the formula are explained as follows:

: 这是从量化字典中查找到的向量，它将被用来替换输入的向量化表示。 : This is the vector looked up from the quantization dictionary that will be used to replace the vectorized representation of the input.

: 这是输入的向量化表示，通常是由编码器生成的连续特征向量。 : This is a vectorized representation of the input, typically a continuous feature vector produced by the encoder.

: 这表示要最小化的目标函数，其中是输入特征图在位置 (i, j) 的特征向量，是预先训练好的离散编码字典中的向量。目标是找到字典Z中与最接近的向量。 : This represents the objective function to be minimized, where is the feature vector of the input feature map at position (i, j), is a vector in the pre-trained discrete coding dictionary. The goal is to find the vector in the dictionary Z that corresponds to The closest vector .

Z: 这是一个预训练的编码向量集合，也称为“码本”，包含了所有可能的离散向量。 : 这表示量化向量的形状，其中 h和 w 分别是特征图的高度和宽度，是向量的维度。Z: This is a pre-trained set of encoding vectors, also called a "codebook", which contains all possible discrete vectors . : This represents the quantized vector The shape is, where h and w are the height and width of the feature map respectively, is the dimension of the vector.

整体重建损失为：The overall reconstruction loss is:

； ;

其中E，G，Z分别是编码器，生成器，码本，x是原始输入，是模型输出，是量化误差，用于衡量编码器E输出的停止梯度（stop gradient，记为 sg）版本与量化向量之间的差异，是提交误差（commitment error），其中β是一个加权系数，调节这一项的重要性。这一项用于衡量从码本中选出的向量（使用停止梯度技术）与编码器E输出的实际向量之间的差异。Where E, G, Z are encoder, generator, codebook respectively, x is the original input, is the model output, is the quantization error, which is used to measure the difference between the stop gradient (denoted as sg) version of the encoder E output and the quantized vector difference between, is the commitment error, where β is a weighting coefficient that adjusts the importance of this term. This term is used to measure the vector selected from the codebook. The difference between the actual vector output by encoder E (using the stop gradient technique) and the actual vector output by encoder E.

在自监督的训练过程中还融合了GAN组件的对抗损失，总损失如下：The adversarial loss of the GAN component is also integrated in the self-supervised training process, and the total loss is as follows:

； ;

训练CLIP模型，对PPG和ECG图像数据进行对比学习，采用CLIP模型的对比学习结构，优化配对的ECG和PPG图像数据在隐空间内的特征对齐。The CLIP model is trained to perform comparative learning on PPG and ECG image data. The comparative learning structure of the CLIP model is used to optimize the feature alignment of paired ECG and PPG image data in the latent space.

如图2所示，ECG和PPG对齐的方法采用CLIP模型架构，旨在建立这些信号在潜在空间中的对应关系。通过对比学习，可以使用PPG信号作为条件，在扩散过程的去噪阶段指导ECG信号的生成。具体地，使用（Vision Transformer）视觉变换器作为PPG和ECG图像数据的编码器，将这些数据编码到128维的空间中，训练过程中采用InfoNCE损失，定义如下：As shown in Figure 2, the method for aligning ECG and PPG uses the CLIP model architecture, which aims to establish the correspondence between these signals in the latent space. Through contrastive learning, the PPG signal can be used as a condition to guide the generation of the ECG signal in the denoising stage of the diffusion process. Specifically, the (Vision Transformer) visual transformer is used as the encoder of the PPG and ECG image data, and these data are encoded into a 128-dimensional space. The InfoNCE loss is used during the training process, which is defined as follows:

； ;

其中，使用点积来表示编码表示和之间的相似性，和分别代表ECG和PPG图像数据在隐空间中的特征，τ是温度参数，用于调整相似性得分，N是批次中负样本的数量。in, Using dot product to represent the encoded representation and The similarities between and represent the features of ECG and PPG image data in the latent space, τ is the temperature parameter used to adjust the similarity score, and N is the number of negative samples in the batch.

该方法通过精确地调整和对齐ECG和PPG信号的特征表示，有效地提高了从PPG到ECG的信号生成质量和准确性，适用于医疗健康监测和诊断领域。This method effectively improves the quality and accuracy of signal generation from PPG to ECG by accurately adjusting and aligning the feature representations of ECG and PPG signals, and is suitable for the fields of medical health monitoring and diagnosis.

S5：第二阶段：对扩散模型进行训练。使用预训练的VQGAN模型和CLIP模型中的PPG编码器的参数，在训练过程中，VQGAN模型的参数被冻结以保持其编码能力稳定，而PPG编码器的参数保持活跃，以便进一步优化和学习，在扩散模型的实施中，首先通过VQGAN模型将ECG图像数据特征编码到隐空间，随后，这些编码后的特征经历扩散模型的正向步骤，其中加入高斯噪声，模拟数据生成的随机性，接着是反向去噪训练，这一过程中引入了CLIP模型中的PPG 编码器提供的PPG特征，通过交叉注意力机制，这些PPG特征被用来指导ECG数据的再生，确保生成数据的精确性和相关性，去噪训练完成后，再次利用VQGAN模型的解码器将隐空间中的特征解码，重构出真实空间中的ECG图像，最后，通过一个从图像数据到一维时序数据的逆转换过程，完成ECG时序数据的重建。S5: The second stage: training the diffusion model. The parameters of the PPG encoder in the pre-trained VQGAN model and the CLIP model are used. During the training process, the parameters of the VQGAN model are frozen to keep its encoding ability stable, while the parameters of the PPG encoder remain active for further optimization and learning. In the implementation of the diffusion model, the ECG image data features are first encoded into the latent space through the VQGAN model. Subsequently, these encoded features undergo the forward step of the diffusion model, in which Gaussian noise is added to simulate the randomness of data generation, followed by reverse denoising training. In this process, the PPG features provided by the PPG encoder in the CLIP model are introduced. Through the cross-attention mechanism, these PPG features are used to guide the regeneration of ECG data to ensure the accuracy and relevance of the generated data. After the denoising training is completed, the decoder of the VQGAN model is used again to decode the features in the latent space and reconstruct the ECG image in the real space. Finally, through an inverse conversion process from image data to one-dimensional time series data, the reconstruction of ECG time series data is completed.

第二阶段的扩散模型训练中，采用了潜在扩散模型进行生成，使用了第一阶段训练的向量量化生成对抗网络（VQGAN）和CLIP模型中的脉搏波（PPG）信号编码器来控制ECG信号的生成。In the second stage of diffusion model training, the latent diffusion model was used for generation, and the vector quantization generative adversarial network (VQGAN) trained in the first stage and the pulse wave (PPG) signal encoder in the CLIP model were used to control the generation of ECG signals.

如图3所示，在扩散的前向步骤中，当输入一个ECG信号时，ECG信号经过VQGAN模型的CNN编码器变为潜在空间向量，假设在潜在空间中遵循分布，则扩散前向过程随后在T 时间步内逐渐向添加高斯噪声，转化为，过程可表示为：As shown in Figure 3, in the forward step of diffusion, when an ECG signal is input When the ECG signal The CNN encoder of the VQGAN model becomes a latent space vector , assuming Following the distribution in latent space , then the diffusion forward process gradually moves to Adding Gaussian noise, it is transformed into , the process can be expressed as:

； ;

其中，,，噪声调度参数随着时间步t的增加而增加，是用于控制噪声强度的量，是单位向量，t是时间步。在本申请中，从 0.0001 线性插值到 0.02。随着t的增加，逐渐接近纯噪声。随着t趋向于无穷，完全无法与高斯噪声区分。in, , , noise scheduling parameters As the time step t increases, is the amount used to control the noise intensity, is a unit vector and t is the time step. In this application, Linear interpolation from 0.0001 to 0.02. As t increases, Gradually approaches pure noise. As t tends to infinity, Completely indistinguishable from Gaussian noise.

一旦扩散前向步骤完成，如果可以逐步推导出反向分布，则可以直接从高斯分布中采样以重构原始的。然而没有直接方法可以实现这一点，但当已知时，可以使用贝叶斯公式导出：Once the forward diffusion step is complete, if the reverse distribution can be derived step by step , we can directly sample from the Gaussian distribution to reconstruct the original There is no direct way to do this, but when it is known When , the Bayesian formula can be used to derive:

； ;

其中假设与相同，而μ~(zt,z0) 可表示为：in Assumptions and The same, and μ~(zt,z0) can be expressed as:

； ;

其中，由神经网络预测（在本申请中为具有注意力的Unet模型），c是从PPG编码器得到的控制条件，，，都是用于强度的量。最终采样的通过VQGAN的CNN解码器转换为像素空间。in, predicted by a neural network (in this application, a Unet model with attention), c is the control condition obtained from the PPG encoder, , , are all used for intensity. The final sampled Converted to pixel space through VQGAN’s CNN decoder .

整个过程的优化目标为：The optimization goal of the whole process is:

； ;

其中，x表示ECG信号，y是PPG信号，表示VQGAN模型的编码器，是使用具有注意力机制的Unet模型预测的噪声，而是来自CLIP模型的PPG信号的编码器。这种方法允许精确地从PPG输入生成ECG信号，适用于高度精确的医疗监测和诊断应用。Among them, x represents ECG signal, y is PPG signal, represents the encoder of the VQGAN model, is the noise predicted using the Unet model with attention mechanism, and is an encoder for the PPG signal from the CLIP model. This approach allows accurate generation of ECG signals from PPG inputs, suitable for highly accurate medical monitoring and diagnostic applications.

S6：定义总损失函数。具体地，包含前述所有训练过程的损失函数。S6: Define the total loss function. Specifically, it includes the loss functions of all the aforementioned training processes.

S7：使用超参数搜索，在训练集上采用不同的超参数组合进行训练，并在验证集上验证其性能，挑选出最佳超参数组合，在测试集上测试采取最佳超参数训练的深度学习模型。S7: Use hyperparameter search to train with different hyperparameter combinations on the training set, verify its performance on the validation set, select the best hyperparameter combination, and test the deep learning model trained with the best hyperparameters on the test set.

具体地，将数据集按照一定比例划分为训练集、测试集和验证集。以不同的超参数组合构建模型，并输入训练集样本，通过Adam优化器最小化损失函数进行训练，将训练得到的模型分别在验证集上验证性能。在验证集上筛选出最优超参数组合后，将对应的模型在测试集上检验性能，最终得到该方法的性能。Specifically, the data set is divided into training set, test set and validation set according to a certain ratio. The model is constructed with different hyperparameter combinations, and the training set samples are input. The training is performed by minimizing the loss function through the Adam optimizer, and the performance of the trained model is verified on the validation set. After the optimal hyperparameter combination is selected on the validation set, the performance of the corresponding model is tested on the test set, and the performance of the method is finally obtained.

通过本申请得到的模型，当输入新的PPG数据后，模型能够输出生成的ECG数据，可以用于各种下游任务的判断，利于在仅有PPG的情况下可以得到ECG数据并对人体心血管状态进行评估。The model obtained by the present application can output generated ECG data after inputting new PPG data, which can be used for judging various downstream tasks, and is conducive to obtaining ECG data and evaluating the cardiovascular state of the human body when only PPG is available.

本申请的深度学习模型通过Adam优化器最小化所述各项函数进行训练，超参数通过网格搜索方法确定。The deep learning model of the present application is trained by minimizing the functions described above through the Adam optimizer, and the hyperparameters are determined through a grid search method.

具体而言，以本实验数据集为例（27255集，6814个用于验证集，3984个用于测试集）。通过表1可知，本发明的预测方法的性能更好。Specifically, taking the experimental data set as an example (27255 sets, 6814 for validation set, and 3984 for test set), it can be seen from Table 1 that the prediction method of the present invention has better performance.

表1 数据集上本发明的预测方法与当前先进模型的表现Table 1 Performance of the prediction method of the present invention and current advanced models on the dataset

模型\评价指标Model\Evaluation IndicatorsHR_meHR_meRmseRmseFidFidMaeMaeRDDM（先进模型）RDDM (Advanced Model)2.232.230.240.246.726.720.150.15本发明方法Method of the present invention1.581.580.240.242.132.130.110.11

其中对各评价指标解释如下：The explanations of each evaluation index are as follows:

HR_mae：每分钟心跳数 (bpm) 为单位的平均绝对误差 (MAE)；HR_mae: mean absolute error (MAE) in beats per minute (bpm);

Rmse：生成ECG和真实ECG的均方根误差；Rmse: Root mean square error between generated ECG and true ECG;

Fid：Frechet Inception 距离，是评估生成数据质量的度量标准；Fid: Frechet Inception distance, a metric for evaluating the quality of generated data;

Mae：生成ECG和真实ECG的平均绝对误差。Mae: The mean absolute error between the generated ECG and the true ECG.

以上显示和描述了本发明的基本原理、主要特征和优点。本行业的技术人员应该了解，上述实施例不以任何形式限制本发明，凡采用等同替换或等效变换的方式所获得的技术方案，均落在本发明的保护范围内。The above shows and describes the basic principles, main features and advantages of the present invention. Those skilled in the art should understand that the above embodiments do not limit the present invention in any form, and any technical solution obtained by equivalent replacement or equivalent transformation falls within the protection scope of the present invention.