CN114550033A

Movatterモバイル変換

Info

Publication number: CN114550033A
Application number: CN202210110680.7A
Authority: CN
Inventors: 王澄; 滕皋军; 朱建军
Original assignee: Zhuhai Hengle Medical Technology Co ltd
Current assignee: Zhuhai Hengle Medical Technology Co ltd
Priority date: 2022-01-29
Filing date: 2022-01-29
Publication date: 2022-05-27

Abstract

The invention relates to a video sequence guide wire segmentation method, a device, electronic equipment and a readable medium, which comprises the following steps: responding to the segmentation request, and acquiring a picture frame of a video sequence guide wire; dividing a picture frame into a plurality of image blocks; coding the image block through a CNN convolutional neural network and a Transformer neural network to obtain a plurality of coding blocks, wherein the coding blocks are used for representing the global feature information of the image frame; and performing up-sampling, convolution and cascade processing on the coding block to obtain a segmentation result of the picture frame. The invention has the beneficial effects that: the method solves the problem that the CNN is insufficient in acquiring the global information of the image, captures the global information by combining a Transformer and utilizing a self-attention mechanism, realizes advantage complementation by combining the CNN and the Transformer, and improves the segmentation precision of the video sequence guide wire catheter.

Description

Translated fromChinese

视频序列导丝分割方法、装置、电子设备及可读介质Video sequence guide wire segmentation method, device, electronic device and readable medium

技术领域technical field

本发明涉及计算机领域，具体涉及了一种视频序列导丝分割方法、装置、电子设备及可读介质。The present invention relates to the field of computers, in particular to a video sequence guide wire segmentation method, device, electronic device and readable medium.

背景技术Background technique

导丝分割作为心血管机器人介入系统中至关重要的组成部分，一直以来是研究热点。然而设计一个实时且准确的导丝分割模型具有很大的难度。因为导丝一直在进行不规律的运动，另外，X射线图像的低信噪比以及图像中目标和背景像素数目的极大不平衡也给加大了导丝分割的难度。Guide wire segmentation, as a vital part of cardiovascular robotic interventional systems, has always been a research hotspot. However, it is very difficult to design a real-time and accurate guide wire segmentation model. Because the guide wire has been moving irregularly, in addition, the low signal-to-noise ratio of X-ray images and the large imbalance in the number of target and background pixels in the image also increase the difficulty of guide wire segmentation.

迄今为止，已经有很多导丝分割方法被提出。早期阶段的导丝分割模型主要是基于一些传统的方法，例如曲线拟合。这种方法在特定的情况下能够取得一定的效果，但是当导丝的长度和形状发生很大的变化时，导丝的分割精度会受到很大的影响。现有的基于深度学习的导丝分割模型主要是利用卷积神经网络(Convolutional Neural Networks，CNN)，并没有很好的利用全局特征信息。另外，这些模型大都是独立地去处理每一帧，没有利用导丝序列中的时序信息。然而有效的利用导丝序列的全局特征信息和时序信息能够帮助模型更好的分割导丝。So far, many guide wire segmentation methods have been proposed. The guide wire segmentation models in the early stage were mainly based on some traditional methods, such as curve fitting. This method can achieve certain effects in certain situations, but when the length and shape of the guide wire change greatly, the segmentation accuracy of the guide wire will be greatly affected. Existing deep learning-based guide wire segmentation models mainly use Convolutional Neural Networks (CNN), and do not make good use of global feature information. In addition, most of these models process each frame independently, without using the timing information in the guide wire sequence. However, the effective use of the global feature information and timing information of the guide wire sequence can help the model to better segment the guide wire.

发明内容SUMMARY OF THE INVENTION

本发明的目的在于至少解决现有技术中存在的技术问题之一，提供了一种视频序列导丝分割方法、装置、电子设备及可读介质，提高了视频序列导丝导管的分割精度。The purpose of the present invention is to solve at least one of the technical problems existing in the prior art, and to provide a video sequence guide wire segmentation method, device, electronic device and readable medium, which improves the segmentation accuracy of the video sequence guide wire catheter.

本发明的技术方案包括一种视频序列导丝分割方法，其特征在于，所述方法包括：响应于分割请求，获取视频序列导丝的图片帧；将所述图片帧划分为多个图像块；通过CNN卷积神经网络和Transformer神经网络对所述图像块进行编码处理，得到多个编码块，所述编码块用于表征所述图片帧的全局特征信息；对所述编码块执行上采样、卷积和级联处理，得到所述图片帧的分割结果。The technical solution of the present invention includes a video sequence guide wire segmentation method, characterized in that the method includes: in response to a segmentation request, acquiring a picture frame of a video sequence guide wire; dividing the picture frame into a plurality of image blocks; The image block is encoded by the CNN convolutional neural network and the Transformer neural network to obtain a plurality of encoding blocks, and the encoding blocks are used to represent the global feature information of the picture frame; perform upsampling, Convolution and concatenation processing are performed to obtain the segmentation result of the picture frame.

根据所述的视频序列导丝分割方法，其中获取视频序列导丝的图片帧还包括：获取所述视频序列导丝的所述图片帧和上一图片帧；通过所述图片帧和所述上一图片帧确定所述图片帧的时序关系。According to the video sequence guide wire segmentation method, obtaining the picture frame of the video sequence guide wire further comprises: obtaining the picture frame and the previous picture frame of the video sequence guide wire; A picture frame determines the timing relationship of the picture frames.

根据所述的视频序列导丝分割方法，其中将所述图片帧划分为多个图像块包括：通过RNN卷积神经网络将所述图片帧变形为多个扁平化的图像块，得到图像块

其中(P×P)表示每个块的分辨率，N＝HW/P²表示所述图像块的个数，也就是序列的长度,(H×W)表示所述图片帧的分辨率。According to the video sequence guide wire segmentation method, wherein dividing the picture frame into a plurality of image blocks comprises: deforming the picture frame into a plurality of flattened image blocks through an RNN convolutional neural network to obtain the image blocks

(P×P) represents the resolution of each block, N=HW/P² represents the number of the image blocks, that is, the length of the sequence, and (H×W) represents the resolution of the picture frame.

根据所述的视频序列导丝分割方法，其中通过CNN卷积神经网络和Transformer神经网络对所述图像块进行编码处理，得到多个编码块包括：通过RNN卷积神经网络采用可训练的线性投影将所述图像块映射到嵌入空间，得到保留空间信息的所述编码块，其计算公式为According to the video sequence guide wire segmentation method, wherein the image blocks are encoded through a CNN convolutional neural network and a Transformer neural network, and obtaining a plurality of encoding blocks includes: using a trainable linear projection through an RNN convolutional neural network The image block is mapped to the embedded space to obtain the coding block that retains the spatial information, and its calculation formula is

其中

表示块嵌入投影，E_pos∈R^N×D表示位置嵌入；in

represents the block embedding projection, and E_pos ∈ R^N×D represents the position embedding;

以所述编码块作为输入，通过Transformer神经网络训练得到Taking the coding block as input, it is obtained through Transformer neural network training

z'_l＝MSA(LN(z'_l))+z'_l-1z'_l =MSA(LN(z'_l ))+z'_l-1

z_l＝MLP(LN(z'_l))+z'_l，z_l =MLP(LN(z'_l ))+z'_l ,

其中LN(·)表示层正则化操作，z_l是编码的图像表征，其中MSA(·)和MLP(·)分别表示多头注意力处理和感知机处理。where LN( ) denotes the layer regularization operation and_zl is the encoded image representation, where MSA( ) and MLP( ) denote multi-head attention processing and perceptron processing, respectively.

根据所述的视频序列导丝分割方法，其中对所述编码块执行上采样、卷积和级联处理，得到所述图片帧的分割结果包括：所述多头注意力处理和所述感知机处理分别通过所述Transformer神经网络交互层实现，所述交互层包括多头注意力块和多层感知机块，所述多头注意力块和多层感知机块。According to the video sequence guide wire segmentation method, performing upsampling, convolution and concatenation processing on the coding block, and obtaining the segmentation result of the picture frame includes: the multi-head attention processing and the perceptron processing It is respectively realized through the Transformer neural network interaction layer, and the interaction layer includes a multi-head attention block and a multi-layer perceptron block, and the multi-head attention block and the multi-layer perceptron block.

根据所述的视频序列导丝分割方法，其中方法还包括：对所述上一图片帧通过所述CNN卷积神经网络获取所述图片帧的低级特征信息。According to the video sequence guide wire segmentation method, the method further comprises: acquiring the low-level feature information of the picture frame through the CNN convolutional neural network for the previous picture frame.

7、根据权利要求6所述的视频序列导丝分割方法，其特征在于，所述对所述编码块执行上采样、卷积和级联处理，得到所述图片帧的分割结果包括：对所述全局特征信息执行特征转换，得到特征转换结果；将所述编码块和所述低级特征信息执行级联处理，得到混合特征，通过所述特征与所述特征转换结果执行多次上采样，得到分割结果。7. The video sequence guide wire segmentation method according to claim 6, wherein the performing upsampling, convolution and concatenation processing on the coding block to obtain the segmentation result of the picture frame comprises: Perform feature conversion on the global feature information to obtain a feature conversion result; perform cascade processing on the coding block and the low-level feature information to obtain a mixed feature, and perform multiple upsampling through the feature and the feature conversion result to obtain Split result.

本发明的技术方案还包括一种视频序列导丝分割装置，包括：第一模块，用于根据分割请求，获取视频序列导丝的图片帧；第二模块，用于将所述图片帧划分为多个图像块；第三模块，用于通过CNN卷积神经网络和Transformer神经网络对所述图像块进行编码处理，得到多个编码块，所述编码块用于表征所述图片帧的全局特征信息；第四模块，用于对所述编码块执行上采样、卷积和级联处理，得到所述图片帧的分割结果。The technical solution of the present invention also includes a video sequence guide wire segmentation device, including: a first module, used for obtaining a picture frame of a video sequence guide wire according to a segmentation request; a second module, used for dividing the picture frame into a plurality of image blocks; the third module is used for encoding the image blocks through the CNN convolutional neural network and the Transformer neural network to obtain multiple encoding blocks, and the encoding blocks are used to represent the global features of the picture frame information; a fourth module, configured to perform upsampling, convolution and concatenation processing on the coding block to obtain the segmentation result of the picture frame.

本发明的技术方案还包括一种电子设备，包括处理器以及存储器；所述存储器用于存储程序；所述处理器执行所述程序实现如任一项所述的视频序列导丝分割方法。The technical solution of the present invention also includes an electronic device including a processor and a memory; the memory is used for storing a program; the processor executes the program to implement the method for dividing a video sequence guide wire according to any one of the above.

本发明的技术方案还包括一种计算机可读存储介质，其特征在于，所述存储介质存储有程序，所述程序被处理器执行实现如任一项所述的视频序列导丝分割方法。The technical solution of the present invention also includes a computer-readable storage medium, wherein the storage medium stores a program, and the program is executed by a processor to implement the method for dividing a video sequence guide wire according to any one of the above.

本发明的有益效果为：解决了CNN在获取图像的全局信息上存在不足，结合Transformer利用自注意力机制来捕获全局信息，通过将CNN和Transformer结合，实现优势互补，提高了视频序列导丝导管的分割精度。The beneficial effects of the invention are as follows: the deficiency of CNN in acquiring the global information of the image is solved, and the self-attention mechanism is used to capture the global information in combination with the Transformer. segmentation accuracy.

附图说明Description of drawings

下面结合附图和实施例对本发明进一步地说明；Below in conjunction with accompanying drawing and embodiment, the present invention is further described;

图1所示为根据本发明实施方式的视频序列导丝分割方法流程图。FIG. 1 is a flowchart of a method for dividing a video sequence guide wire according to an embodiment of the present invention.

图2所示为根据本发明实施方式的CNN卷积神经网络和Transformer神经网络的导丝分割示意图。FIG. 2 is a schematic diagram of guide wire segmentation of a CNN convolutional neural network and a Transformer neural network according to an embodiment of the present invention.

图3所示为根据本发明实施方式的Transformer神经网络示意图。FIG. 3 is a schematic diagram of a Transformer neural network according to an embodiment of the present invention.

图4是本发明实施例的视频序列导丝分割装置图。FIG. 4 is a diagram of a video sequence guide wire dividing device according to an embodiment of the present invention.

具体实施方式Detailed ways

下面详细描述本发明的实施例，所述实施例的示例在附图中示出，其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。在后续的描述中，使用用于表示元件的诸如“模块”、“部件”或“单元”的后缀仅为了有利于本发明的说明，其本身没有特有的意义。因此，“模块”、“部件”或“单元”可以混合地使用。“第一”、“第二”等只是用于区分技术特征为目的，而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量或者隐含指明所指示的技术特征的先后关系。在本后续的描述中，对方法步骤的连续标号是为了方便审查和理解，结合本发明的整体技术方案以及各个步骤之间的逻辑关系，调整步骤之间的实施顺序并不会影响本发明技术方案所达到的技术效果。下面通过参考附图描述的实施例是示例性的，仅用于解释本发明，而不能理解为对本发明的限制。The following describes in detail the embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. In the ensuing description, suffixes such as 'module', 'component' or 'unit' used to represent elements are used only to facilitate the description of the present invention, and have no specific meaning per se. Thus, "module", "component" or "unit" may be used interchangeably. "First", "second", etc. are only used for the purpose of distinguishing technical features, and should not be understood as indicating or implying relative importance, or implicitly indicating the number of indicated technical features or implicitly indicating the indicated technical features. successive relationship. In the following description, the consecutive numbers on the method steps are for the convenience of review and understanding. Combined with the overall technical solution of the present invention and the logical relationship between the various steps, adjusting the execution order of the steps will not affect the technology of the present invention. The technical effect achieved by the program. The embodiments described below with reference to the accompanying drawings are exemplary and are only used to explain the present invention, and should not be construed as a limitation of the present invention.

图1所示为根据本发明实施方式的视频序列导丝分割方法流程图。该流程包括：FIG. 1 is a flowchart of a method for dividing a video sequence guide wire according to an embodiment of the present invention. The process includes:

S100,响应于分割请求，获取视频序列导丝的图片帧；S100, in response to the segmentation request, obtain the picture frame of the video sequence guide wire;

S200,将图片帧划分为多个图像块；S200, dividing the picture frame into multiple image blocks;

S300,通过CNN卷积神经网络和Transformer神经网络对图像块进行编码处理，得到多个编码块，编码块用于表征图片帧的全局特征信息；S300, the image block is encoded through the CNN convolutional neural network and the Transformer neural network to obtain a plurality of encoding blocks, and the encoding blocks are used to represent the global feature information of the picture frame;

S400,对编码块执行上采样、卷积和级联处理，得到图片帧的分割结果。S400, perform upsampling, convolution and concatenation processing on the coding block to obtain a segmentation result of the picture frame.

在一些实施方案中，本实施的技术方案还可以用于导管的分割方法。In some embodiments, the technical solution of the present implementation can also be used for a method of dividing a catheter.

图2所示为根据本发明实施方式的CNN卷积神经网络和Transformer神经网络的导丝分割示意图。图中Curent frame和Previous frame为视频序列的当前帧和上一帧，2个CNN卷积神经网络的作用图片帧的图像块处理和低级特征信息；图中Reshape为矩阵转换，Conv3x3,ReLU为3x3卷积核执行卷积处理，并通过ReLU激活函数执行激活处理，Upsample为上采样，Feature Concatenation表示图中虚框内为特征提取，Segmentation head为分段头。FIG. 2 is a schematic diagram of guide wire segmentation of a CNN convolutional neural network and a Transformer neural network according to an embodiment of the present invention. In the figure, Current frame and Previous frame are the current frame and previous frame of the video sequence, and the two CNN convolutional neural networks are used for image block processing and low-level feature information of the picture frame; Reshape in the figure is matrix conversion, Conv3x3, ReLU is 3x3 The convolution kernel performs convolution processing, and performs activation processing through the ReLU activation function. Upsample is up-sampling, Feature Concatenation means feature extraction in the virtual box in the figure, and Segmentation head is the segmentation head.

其主要包括编码和解码，编码包括：It mainly includes encoding and decoding. The encoding includes:

编码器主要由CNN和Transformer模块构成。由于Transformer是以序列为输入，首先需要将导丝图像X变形为一系列扁平化的块The encoder is mainly composed of CNN and Transformer modules. Since the Transformer takes a sequence as input, it first needs to deform the guidewire image X into a series of flattened blocks

其中(P×P)表示每个块的分辨率，where (P×P) denotes the resolution of each block,

表示图像块的个数，也就是序列的长度,(H×W)表示原始图像的分辨率。接下来，采用可训练的线性投影将这些块映射到一个D维的嵌入空间。为了编码块的空间信息，位置嵌入被添加到块嵌入中以保留位置信息。计算公式如下：Represents the number of image blocks, that is, the length of the sequence, and (H×W) represents the resolution of the original image. Next, these blocks are mapped to a D-dimensional embedding space using a trainable linear projection. To encode the spatial information of the block, a positional embedding is added to the block embedding to preserve the positional information. Calculated as follows:

其中

表示块嵌入投影，E_pos∈R^N×D表示位置嵌入。in

represents the block embedding projection, and E_pos ∈ R^N×D represents the position embedding.

Transformer模块由多个交互层的多头注意力(Multi-head Self-Attention，MSA)和多层感知机块(Multi-Layer Perceptron，MLP)组成。Transformer层的结构如图3所示。The Transformer module consists of Multi-head Self-Attention (MSA) and Multi-Layer Perceptron (MLP) blocks with multiple interaction layers. The structure of the Transformer layer is shown in Figure 3.

第L层的输出表示如下：The output of layer L is represented as follows:

z'_l＝MSA(LN(z'_l))+z'_l-1z'_l =MSA(LN(z'_l ))+z'_l-1

z_l＝MLP(LN(z'_l))+z'_lz_l =MLP(LN(z'_l ))+z'_l

其中LN(·)表示层正则化操作，z_l是编码的图像表征。where LN( ) denotes the layer regularization operation and_zl is the encoded image representation.

解码器包括：Decoders include:

解码器主要由上采样，卷积和级联操作组成。编码器生成的特征表示可以直接上采样来生成最终的分割结果。但是这样简单的操作并不能得到理想的分割效果，因为Transformer生成的特征表示大小The decoder mainly consists of upsampling, convolution and concatenation operations. The feature representation generated by the encoder can be directly upsampled to generate the final segmentation result. But such a simple operation cannot get the ideal segmentation effect, because the feature representation size generated by Transformer

和原始图像的大小(H×W)之间差距太大，直接上采样会损失很多像形状和轮廓等低级信息。因此，为了获得这些低级信息，除了Transformer生成的特征表示，本发明的实施例还利用了CNN直接生成的特征表示来获取低级信息。另外，为了获取时序信息，本发明的实施例不仅利用当前帧的导丝信息，还利用了前一帧的导丝信息，通过这些信息进行级联来生成混合的特征表示，然后将这个混合的特征表示进行上采样后与上一级特征进行级联，采取自顶向下的方式来生成最终的特征表示，最后经过一个分割头生成最终的分割结果。The gap between the size of the original image (H×W) is too large, and direct upsampling will lose a lot of low-level information such as shape and contour. Therefore, in order to obtain these low-level information, in addition to the feature representation generated by the Transformer, the embodiment of the present invention also utilizes the feature representation directly generated by the CNN to obtain the low-level information. In addition, in order to obtain timing information, the embodiment of the present invention not only uses the guide wire information of the current frame, but also uses the guide wire information of the previous frame, and the mixed feature representation is generated by cascading these information, and then the mixed feature representation is generated. After the feature representation is up-sampled, it is cascaded with the upper-level feature, and the final feature representation is generated in a top-down manner, and finally a segmentation head is used to generate the final segmentation result.

参照图4是本发明实施例的视频序列导丝分割装置图，该装置包括了第一处理模块401、第二处理模块402、第三处理模块403及第四处理模块404；4 is a diagram of a video sequence guide wire segmentation device according to an embodiment of the present invention, the device includes afirst processing module 401, asecond processing module 402, athird processing module 403 and afourth processing module 404;

其中，第一模块，用于根据分割请求，获取视频序列导丝的图片帧；第二模块，用于将图片帧划分为多个图像块；第三模块，用于通过CNN卷积神经网络和Transformer神经网络对图像块进行编码处理，得到多个编码块，编码块用于表征图片帧的全局特征信息；第四模块，用于对编码块执行上采样、卷积和级联处理，得到图片帧的分割结果。本实施例解决了CNN在获取图像的全局信息上存在不足，结合Transformer利用自注意力机制来捕获全局信息，通过将CNN和Transformer结合，实现优势互补，提高了视频序列导丝导管的分割精度。Among them, the first module is used to obtain the picture frame of the video sequence guide wire according to the segmentation request; the second module is used to divide the picture frame into multiple image blocks; the third module is used to pass the CNN convolutional neural network and The Transformer neural network encodes the image blocks to obtain multiple encoding blocks. The encoding blocks are used to represent the global feature information of the picture frame; the fourth module is used to perform upsampling, convolution and concatenation processing on the encoding blocks to obtain the picture. The segmentation result of the frame. This embodiment solves the shortcomings of CNN in acquiring global information of an image, and uses a self-attention mechanism to capture global information in combination with Transformer. By combining CNN and Transformer, the advantages are complemented, and the segmentation accuracy of the video sequence guide wire catheter is improved.

本发明实施例还提供了一种电子设备，该电子设备包括处理器以及存储器；The embodiment of the present invention also provides an electronic device, the electronic device includes a processor and a memory;

存储器存储有程序；处理器执行程序以执行前述的视频序列导丝方法；该电子设备具有搭载并运行本发明实施例提供的视频序列导丝的软件系统的功能，例如，个人计算机、手机、智能手机、平板电脑等。The memory stores a program; the processor executes the program to execute the aforementioned video sequence guide wire method; the electronic device has the function of carrying and running the software system for the video sequence guide wire provided by the embodiment of the present invention, for example, a personal computer, a mobile phone, a smart Mobile phones, tablets, etc.

本发明实施例还提供了一种计算机可读存储介质，所述存储介质存储有程序，所述程序被处理器执行实现如前面所述的视频序列导丝方法。An embodiment of the present invention further provides a computer-readable storage medium, where a program is stored in the storage medium, and the program is executed by a processor to implement the aforementioned method for guiding a video sequence.

应当认识到，本发明实施例中的方法步骤可以由计算机硬件、硬件和软件的组合、或者通过存储在非暂时性计算机可读存储器中的计算机指令来实现或实施。所述方法可以使用标准编程技术。每个程序可以以高级过程或面向对象的编程语言来实现以与计算机系统通信。然而，若需要，该程序可以以汇编或机器语言实现。在任何情况下，该语言可以是编译或解释的语言。此外，为此目的该程序能够在编程的专用集成电路上运行。It should be appreciated that the method steps in the embodiments of the present invention may be implemented or implemented by computer hardware, a combination of hardware and software, or by computer instructions stored in a non-transitory computer-readable memory. The method can use standard programming techniques. Each program may be implemented in a high-level procedural or object-oriented programming language to communicate with a computer system. However, if desired, the program can be implemented in assembly or machine language. In any case, the language can be a compiled or interpreted language. Furthermore, the program can be run on a programmed application specific integrated circuit for this purpose.

此外，可按任何合适的顺序来执行本文描述的过程的操作，除非本文另外指示或以其他方式明显地与上下文矛盾。本文描述的过程(或变型和/或其组合)可在配置有可执行指令的一个或多个计算机系统的控制下执行，并且可作为共同地在一个或多个处理器上执行的代码(例如，可执行指令、一个或多个计算机程序或一个或多个应用)、由硬件或其组合来实现。所述计算机程序包括可由一个或多个处理器执行的多个指令。Furthermore, the operations of the processes described herein may be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The processes (or variations and/or combinations thereof) described herein can be performed under the control of one or more computer systems configured with executable instructions, and as code that executes collectively on one or more processors (eg, , executable instructions, one or more computer programs or one or more applications), implemented in hardware, or a combination thereof. The computer program includes a plurality of instructions executable by one or more processors.

进一步，所述方法可以在可操作地连接至合适的任何类型的计算平台中实现，包括但不限于个人电脑、迷你计算机、主框架、工作站、网络或分布式计算环境、单独的或集成的计算机平台、或者与带电粒子工具或其它成像装置通信等等。本发明的各方面可以以存储在非暂时性存储介质或设备上的机器可读代码来实现，无论是可移动的还是集成至计算平台，如硬盘、光学读取和/或写入存储介质、RAM、ROM等，使得其可由可编程计算机读取，当存储介质或设备由计算机读取时可用于配置和操作计算机以执行在此所描述的过程。此外，机器可读代码，或其部分可以通过有线或无线网络传输。当此类媒体包括结合微处理器或其他数据处理器实现上文所述步骤的指令或程序时，本文所述的发明包括这些和其他不同类型的非暂时性计算机可读存储介质。当根据本发明所述的方法和技术编程时，本发明还包括计算机本身。Further, the methods may be implemented in any type of computing platform operably connected to a suitable, including but not limited to personal computer, minicomputer, mainframe, workstation, network or distributed computing environment, stand-alone or integrated computer platform, or communicate with charged particle tools or other imaging devices, etc. Aspects of the invention may be implemented in machine-readable code stored on a non-transitory storage medium or device, whether removable or integrated into a computing platform, such as a hard disk, an optically read and/or written storage medium, RAM, ROM, etc., such that it can be read by a programmable computer, when a storage medium or device is read by a computer, it can be used to configure and operate the computer to perform the processes described herein. Furthermore, the machine-readable code, or portions thereof, may be transmitted over wired or wireless networks. The invention described herein includes these and other various types of non-transitory computer-readable storage media when such media includes instructions or programs that implement the steps described above in conjunction with a microprocessor or other data processor. The invention also includes the computer itself when programmed according to the methods and techniques described herein.

上面结合附图对本发明实施例作了详细说明，但是本发明不限于上述实施例，在技术领域普通技术人员所具备的知识范围内，还可以在不脱离本发明宗旨的前提下做出各种变化。The embodiments of the present invention have been described in detail above in conjunction with the accompanying drawings, but the present invention is not limited to the above-mentioned embodiments, and within the scope of knowledge possessed by those of ordinary skill in the technical field, various Variety.

Claims

Translated fromChinese

1.一种视频序列导丝分割方法，其特征在于，包括：1. a video sequence guide wire segmentation method, is characterized in that, comprises:

响应于分割请求，获取视频序列导丝的图片帧；in response to the segmentation request, obtaining a picture frame of the video sequence guidewire;

将所述图片帧划分为多个图像块；dividing the picture frame into a plurality of image blocks;

通过CNN卷积神经网络和Transformer神经网络对所述图像块进行编码处理，得到多个编码块，所述编码块用于表征所述图片帧的全局特征信息；The image block is encoded by the CNN convolutional neural network and the Transformer neural network to obtain a plurality of encoding blocks, and the encoding blocks are used to represent the global feature information of the picture frame;

对所述编码块执行上采样、卷积和级联处理，得到所述图片帧的分割结果。Upsampling, convolution and concatenation processing are performed on the coding block to obtain the segmentation result of the picture frame.

2.根据权利要求1所述的视频序列导丝分割方法，其特征在于，所述获取视频序列导丝的图片帧还包括：2. The video sequence guide wire segmentation method according to claim 1, wherein the acquiring the picture frame of the video sequence guide wire further comprises:

获取所述视频序列导丝的所述图片帧和上一图片帧；obtaining the picture frame and the previous picture frame of the video sequence guide wire;

通过所述图片帧和所述上一图片帧确定所述图片帧的时序关系。The timing relationship of the picture frame is determined by the picture frame and the previous picture frame.

3.根据权利要求2所述的视频序列导丝分割方法，其特征在于，所述将所述图片帧划分为多个图像块包括：3. The video sequence guide wire segmentation method according to claim 2, wherein the dividing the picture frame into a plurality of image blocks comprises:

通过RNN卷积神经网络将所述图片帧变形为多个扁平化的图像块，得到图像块The image frame is deformed into a plurality of flattened image blocks through the RNN convolutional neural network to obtain image blocks

其中(P×P)表示每个块的分辨率，N＝HW/P²表示所述图像块的个数，也就是序列的长度,(H×W)表示所述图片帧的分辨率。(P×P) represents the resolution of each block, N=HW/P² represents the number of the image blocks, that is, the length of the sequence, and (H×W) represents the resolution of the picture frame.

4.根据权利要求3所述的视频序列导丝分割方法，其特征在于，所述通过CNN卷积神经网络和Transformer神经网络对所述图像块进行编码处理，得到多个编码块包括：4. video sequence guide wire segmentation method according to claim 3, is characterized in that, described image block is encoded by CNN convolutional neural network and Transformer neural network, and obtaining a plurality of encoding blocks comprises:

通过RNN卷积神经网络采用可训练的线性投影将所述图像块映射到嵌入空间，得到保留空间信息的所述编码块，其计算公式为The image block is mapped to the embedding space by using a trainable linear projection through the RNN convolutional neural network to obtain the coding block that retains the spatial information. The calculation formula is as follows:

其中

表示块嵌入投影，E_pos∈R^N×D表示位置嵌入；in

z'_l＝MSA(LN(z'_l))+z'_l-1z'_l =MSA(LN(z'_l ))+z'_l-1

z_l＝MLP(LN(z'_l))+z'_l，z_l =MLP(LN(z'_l ))+z'_l ,

5.根据权利要求4所述的视频序列导丝分割方法，其特征在于，所述对所述编码块执行上采样、卷积和级联处理，得到所述图片帧的分割结果包括：5. The video sequence guide wire segmentation method according to claim 4, wherein the performing upsampling, convolution and concatenation processing on the coding block to obtain the segmentation result of the picture frame comprises:

所述多头注意力处理和所述感知机处理分别通过所述Transformer神经网络交互层实现，所述交互层包括多头注意力块和多层感知机块，所述多头注意力块和多层感知机块。The multi-head attention processing and the perceptron processing are respectively implemented by the Transformer neural network interaction layer, and the interaction layer includes a multi-head attention block and a multi-layer perceptron block, the multi-head attention block and the multi-layer perceptron block. piece.

6.根据权利要求4所述的视频序列导丝分割方法，其特征在于，所述方法还包括：6. The video sequence guide wire segmentation method according to claim 4, wherein the method further comprises:

对所述上一图片帧通过所述CNN卷积神经网络获取所述图片帧的低级特征信息。Obtain low-level feature information of the picture frame through the CNN convolutional neural network for the previous picture frame.

7.根据权利要求6所述的视频序列导丝分割方法，其特征在于，所述对所述编码块执行上采样、卷积和级联处理，得到所述图片帧的分割结果包括：7. The video sequence guide wire segmentation method according to claim 6, wherein the performing upsampling, convolution and concatenation processing on the coding block to obtain the segmentation result of the picture frame comprises:

对所述全局特征信息执行特征转换，得到特征转换结果；Perform feature transformation on the global feature information to obtain a feature transformation result;

将所述编码块和所述低级特征信息执行级联处理，得到混合特征，通过所述特征与所述特征转换结果执行多次上采样，得到分割结果。Perform cascade processing on the coding block and the low-level feature information to obtain a mixed feature, and perform multiple upsampling on the feature and the feature conversion result to obtain a segmentation result.

8.一种视频序列导丝分割装置，其特征在于，包括：8. A video sequence guide wire dividing device is characterized in that, comprising:

第一模块，用于根据分割请求，获取视频序列导丝的图片帧；The first module is used to obtain the picture frame of the video sequence guide wire according to the segmentation request;

第二模块，用于将所述图片帧划分为多个图像块；a second module, configured to divide the picture frame into a plurality of image blocks;

第三模块，用于通过CNN卷积神经网络和Transformer神经网络对所述图像块进行编码处理，得到多个编码块，所述编码块用于表征所述图片帧的全局特征信息；The third module is used to encode the image block through the CNN convolutional neural network and the Transformer neural network to obtain a plurality of encoding blocks, and the encoding blocks are used to represent the global feature information of the picture frame;

第四模块，用于对所述编码块执行上采样、卷积和级联处理，得到所述图片帧的分割结果。The fourth module is configured to perform up-sampling, convolution and concatenation processing on the coding block to obtain the segmentation result of the picture frame.

9.一种电子设备，其特征在于，包括处理器以及存储器；9. An electronic device, comprising a processor and a memory;

所述存储器用于存储程序；the memory is used to store programs;

所述处理器执行所述程序实现如权利要求1-7中任一项所述的视频序列导丝分割方法。The processor executes the program to implement the video sequence guide wire segmentation method according to any one of claims 1-7.

10.一种计算机可读存储介质，其特征在于，所述存储介质存储有程序，所述程序被处理器执行实现如权利要求1-7中任一项所述的视频序列导丝分割方法。10. A computer-readable storage medium, wherein the storage medium stores a program, and the program is executed by a processor to implement the method for dividing a video sequence guide wire according to any one of claims 1-7.