CN113362314A

Movatterモバイル変換

Info

Publication number: CN113362314A
Application number: CN202110680322.5A
Authority: CN
Inventors: 王兆玮; 杨叶辉; 尚方信; 黄海峰; 王磊
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-06-18
Filing date: 2021-06-18
Publication date: 2021-09-07
Anticipated expiration: 2041-06-18
Also published as: CN113362314B

Abstract

Translated fromChinese

本申请公开了医学图像识别方法、识别模型训练方法及装置，涉及人工智能领域，进一步涉及智慧医疗、计算机视觉技术领域。具体实现方案为：确定医学图像对应的目标候选点集合；确定目标候选点集合中每个目标候选点对应的局部目标图像和全局目标图像；基于预设的三维识别模型确定局部目标图像的局部信息，以及基于预设的二维识别模型确定全局目标图像的全局信息；基于局部信息、全局信息和预设的图像识别模型，确定医学图像的识别结果。本实现方式可以平衡图像识别精准度与识别效率，在确保图像识别精准度的前提下能够提高识别效率。

The present application discloses a medical image recognition method, a recognition model training method and device, and relates to the field of artificial intelligence, and further relates to the technical fields of smart medical care and computer vision. The specific implementation scheme is: determine the target candidate point set corresponding to the medical image; determine the local target image and the global target image corresponding to each target candidate point in the target candidate point set; determine the local information of the local target image based on the preset three-dimensional recognition model , and determine the global information of the global target image based on the preset two-dimensional recognition model; determine the recognition result of the medical image based on the local information, the global information and the preset image recognition model. This implementation method can balance the image recognition accuracy and the recognition efficiency, and can improve the recognition efficiency on the premise of ensuring the image recognition accuracy.

Description

Medical image recognition method, recognition model training method and device

Technical Field

The disclosure relates to the field of artificial intelligence, and further relates to the technical field of intelligent medical treatment and computer vision, in particular to a medical image recognition method, a recognition model training method and a device.

Background

At present, with the continuous development of computer technology, computer technology is often applied to the field of medical image recognition to improve the efficiency and accuracy of medical image recognition.

In practice, it has been found that in the identification of computed tomography images to determine the pulmonary nodule class, a two-dimensional identification model or a three-dimensional identification model is typically used. However, the problem of poor recognition accuracy exists only by adopting a two-dimensional recognition model for recognition, and the problem of low recognition efficiency exists only by adopting a three-dimensional recognition model for recognition.

Disclosure of Invention

The disclosure provides a medical image recognition method, a recognition model training method and a device.

According to a first aspect, there is provided a medical image recognition method comprising: determining a target candidate point set corresponding to the medical image; determining a local target image and a global target image corresponding to each target candidate point in a target candidate point set; determining local information of a local target image based on a preset three-dimensional recognition model, and determining global information of a global target image based on a preset two-dimensional recognition model; and determining the recognition result of the medical image based on the local information, the global information and a preset image recognition model.

According to a second aspect, there is provided a recognition model training method, comprising: acquiring a sample image; marking each sample candidate point and a real identification result corresponding to each sample candidate point in the sample image; for each sample candidate point, determining local sample information and global sample information corresponding to the sample candidate point; determining a sample identification result output by the initial identification model based on the local sample information, the global sample information and the initial identification model; and adjusting model parameters of the initial recognition model based on the sample recognition result and the real recognition result until the initial recognition model is converged to obtain the trained image recognition model.

According to a third aspect, there is provided a medical image recognition apparatus comprising: a candidate point determining unit configured to determine a target candidate point set corresponding to the medical image; a target determination unit configured to determine a local target image and a global target image corresponding to each target candidate point in a set of target candidate points; an information determination unit configured to determine local information of a local target image based on a preset three-dimensional recognition model and determine global information of a global target image based on a preset two-dimensional recognition model; and the image recognition unit is configured to determine a recognition result of the medical image based on the local information, the global information and a preset image recognition model.

According to a fourth aspect, there is provided a recognition model training apparatus comprising: a sample acquisition unit configured to acquire a sample image; marking each sample candidate point and a real identification result corresponding to each sample candidate point in the sample image; a sample information determination unit configured to determine, for each sample candidate point, local sample information and global sample information corresponding to the sample candidate point; the sample recognition unit is configured to determine a sample recognition result output by the initial recognition model based on the local sample information, the global sample information and the initial recognition model; and the model training unit is configured to adjust model parameters of the initial recognition model based on the sample recognition result and the real recognition result until the initial recognition model converges to obtain a trained image recognition model.

According to a fifth aspect, there is provided an electronic device for performing a medical image recognition method or a recognition model training method, comprising: one or more processors; a memory for storing one or more programs; when executed by one or more processors, cause the one or more processors to implement a medical image recognition method or a recognition model training method as described above.

According to a sixth aspect, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to execute the medical image recognition method or the recognition model training method as any one of the above.

According to a seventh aspect, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a medical image recognition method or a recognition model training method as any one of the above.

According to the technology of the application, a medical image recognition method is provided, which can determine local information of a local target image of each target candidate point in a medical image based on a preset three-dimensional recognition model, determine global information of a global target image of each target candidate point in the medical image based on a preset two-dimensional recognition model, and determine a recognition result of the medical image based on the local information, the global information and the preset image recognition model. The image recognition is carried out based on the preset three-dimensional recognition model and the two-dimensional recognition model in the process, the image recognition accuracy can be improved, the local target image is recognized by the preset three-dimensional recognition model, the recognition efficiency can be improved, and therefore the image recognition accuracy and the image recognition efficiency are balanced.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is an exemplary system architecture diagram in which one embodiment of the present application may be applied;

FIG. 2 is a flow diagram of one embodiment of a medical image recognition method according to the present application;

FIG. 3 is a schematic diagram of an application scenario of a medical image recognition method according to the present application;

FIG. 4 is a flow chart of another embodiment of a medical image recognition method according to the present application;

FIG. 5 is a flow diagram of one embodiment of a recognition model training method according to the present application;

FIG. 6 is a schematic structural diagram of an embodiment of a medical image recognition apparatus according to the present application;

FIG. 7 is a schematic diagram of an embodiment of a recognition model training apparatus according to the present application;

fig. 8 is a block diagram of an electronic device for implementing a medical image recognition method or a recognition model training method according to an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

As shown in fig. 1, thesystem architecture 100 may include

terminal devices

101, 102, 103, anetwork 104, and aserver 105. Thenetwork 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and theserver 105.Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with theserver 105 via thenetwork 104 to receive or send messages or the like. The

terminal devices

101, 102, and 103 may be electronic devices such as a mobile phone, a computer, and a tablet, and the

terminal devices

101, 102, and 103 may acquire a medical image based on a camera or locally stored information, and transmit the medical image to theserver 105 based on thenetwork 104, so that theserver 105 returns a recognition result of the medical image. Still alternatively, in training the image recognition model, the

terminal apparatuses

101, 102, 103 may acquire a sample image based on the camera or the locally stored information, and transmit the sample image to theserver 105 based on thenetwork 104, so that theserver 105 trains the initial recognition model.

The

terminal apparatuses

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices including, but not limited to, televisions, smart phones, tablet computers, e-book readers, car-mounted computers, laptop portable computers, desktop computers, and the like. When the

terminal apparatuses

101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.

Theserver 105 may be a server providing various services, for example, after the

terminal devices

101, 102, 103 transmit a medical image to be subjected to medical image recognition, theserver 105 may determine a target candidate point set corresponding to the medical image, determine a local target image and a global target image of each target candidate point in the target candidate point set, determine local information of the local target image based on a three-dimensional recognition model, determine global information of the global target image based on a two-dimensional recognition model, determine a recognition result based on the local information, the global information and a preset image recognition model, and return the recognition result to the

terminal devices

101, 102, 103 based on thenetwork 104. Still alternatively, theserver 105 may also receive sample images transmitted by the

terminal devices

101, 102, and 103, and train an initial recognition model based on the sample images to obtain an image recognition model.

Theserver 105 may be hardware or software. When theserver 105 is hardware, it may be implemented as a distributed server cluster composed of a plurality of servers, or may be implemented as a single server. When theserver 105 is software, it may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.

It should be noted that the medical image recognition method or the recognition model training method provided in the embodiment of the present application may be executed by the

terminal devices

101, 102, and 103, or may be executed by theserver 105. Accordingly, the medical image recognition apparatus or the recognition model training apparatus may be provided in the

terminal devices

101, 102, 103, or may be provided in theserver 105.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to fig. 2, aflow 200 of one embodiment of a medical image recognition method according to the present application is shown. The medical image recognition method of the embodiment comprises the following steps:

step 201, a target candidate point set corresponding to the medical image is determined.

In the embodiment, an executing subject (such as theserver 105 or the

terminal devices

101, 102, 103 in fig. 1) may acquire a medical image that needs to be subjected to image recognition, wherein the medical image may be stored locally in advance, and the locally stored data is directly read for acquisition. The medical image may also be stored in other electronic devices which establish connection in advance, and the execution subject acquires the medical image from the other electronic devices based on the connection relationship with the other electronic devices. Optionally, the executing body may be provided with a camera device or a scanning device in advance, and the target object is photographed or scanned based on the camera device or the scanning device to obtain the medical image. The medical image refers to an internal tissue image acquired from a human body or a part of a human body for medical treatment or medical research, such as a lung electron computed tomography image, a thyroid electron computed tomography image, a breast electron computed tomography image, and the like. The medical image includes an object to be identified, which may include, but is not limited to, a nodule, a stone, and other objects belonging to an abnormal state. After the execution main body obtains the medical image, rough image analysis can be performed firstly to determine the positions of targets in the medical image, each target candidate point is obtained based on the positions of the targets, and a target candidate point set is obtained based on each target candidate point.

Step 202, determining a local target image and a global target image corresponding to each target candidate point in the target candidate point set.

In this embodiment, the execution subject may extract, for a position of each target candidate point in the medical image, a local target image and a global target image corresponding to the position. The local target image is an image of a first size intercepted aiming at the position of each target in the medical image, and the global target image is an image of a second size intercepted aiming at the position of each target in the medical image, wherein the second size is larger than the first size. Optionally, the executing subject may use the coordinates of the position of each target in the medical image as the extracted center of the target image, and then extract the medical images around the center as the local target image according to the size of the first size. And extracting the medical images around the center as a global target image according to the size of the second size.

Step 203, determining local information of the local target image based on the preset three-dimensional recognition model, and determining global information of the global target image based on the preset two-dimensional recognition model.

In this embodiment, the preset three-dimensional recognition model is used for recognizing the three-dimensional features of the image, and the preset two-dimensional recognition model is used for recognizing the two-dimensional features of the image. Optionally, the preset three-dimensional recognition model and the two-dimensional recognition model may both be obtained by performing model training using an effective-net (a convolutional neural network) as a basic model, or by performing model training using other existing deep learning models as a basic model, which is not limited in this embodiment. The execution main body can obtain local information which is output by the preset three-dimensional recognition model and corresponds to the local target image by inputting the local target image into the preset three-dimensional recognition model, and can obtain global information which is output by the preset two-dimensional recognition model and corresponds to the global target image by inputting the global target image into the preset two-dimensional recognition model. The local information is used for describing the image characteristics of the local target image, the global information is used for describing the image characteristics of the global target image, and the expression forms of the local information and the global information can be the form of a feature map. The characteristic diagram refers to the characteristic diagram of the convolutional layer output in the convolutional neural network.

And step 204, determining the recognition result of the medical image based on the local information, the global information and the preset image recognition model.

In this embodiment, the preset image recognition model is used for recognizing image information, and obtaining recognition results of each target in the medical image. For example, where the target is a nodule, the recognition result may be a probability that the nodule is positive. Specifically, after the execution main body acquires the local information and the global information, the execution main body may integrate the local information and the global information to obtain integrated information. And inputting the integrated information into a preset image recognition model to obtain a recognition result output by the preset image recognition model. For example, when the expression form of the local information and the global information is the form of the feature map, convolution calculation may be performed based on the feature map corresponding to the local information and the feature map corresponding to the global information to realize feature fusion, and the data after the feature fusion may be used as the integrated information.

With continued reference to fig. 3, a schematic illustration of an application scenario of the medical image recognition method according to the present application is shown. In the application scenario of fig. 3, an executing subject may first acquire lung CT data (lung computed tomography data) 301, use thelung CT data 301 as the medical image, and then determine suspected nodule candidate points 302 that are suspected to be positive in thelung CT data 301, where the number of the suspected nodule candidate points 302 is usually multiple. For each suspectednodule candidate 302, a correspondinglocal target image 303 and aglobal target image 304 may be extracted, and thelocal target image 303 is input into the three-dimensional recognition model 305 to obtainlocal information 307. And inputting theglobal target image 304 into the two-dimensional recognition model 306 to obtainglobal information 308. Thelocal information 307 in this scenario may be a local feature map of a node suspected to be positive, and theglobal information 308 may be a global feature map of a node suspected to be positive. Then, the execution subject may perform feature fusion on thelocal information 307 and theglobal information 308 to obtain feature-fused information. Based on the feature-fused information and theimage recognition model 309, arecognition result 310 can be obtained. Therecognition result 310 at this time is the probability that each node suspected to be positive is true positive.

The medical image recognition method provided by the above embodiment of the application can determine the local information of the local target image in the medical image based on the preset three-dimensional recognition model, determine the global information of the global target image in the medical image based on the preset two-dimensional recognition model, and determine the recognition result of the medical image based on the local information, the global information and the preset image recognition model. The image recognition is carried out based on the preset three-dimensional recognition model and the two-dimensional recognition model in the process, the image recognition accuracy can be improved, the local target image is recognized by the preset three-dimensional recognition model, the recognition efficiency can be improved, and therefore the image recognition accuracy and the image recognition efficiency are balanced.

With continued reference to fig. 4, aflow 400 of another embodiment of a medical image recognition method according to the present application is shown. As shown in fig. 4, the medical image recognition method of the present embodiment may include the following steps:

step 401, determining a target candidate point set corresponding to the medical image.

In this embodiment, for the detailed description ofstep 401, please refer to the detailed description ofstep 201, which is not described herein again.

Step 402, determining a target candidate point set corresponding to the medical image based on the medical image and a preset slice detection model.

In this embodiment, a preset slice detection model is used to detect a slice image in a medical image, and determine target candidate points of a suspected target in the medical image, where each target candidate point constitutes a target candidate point set. The medical image at this time is an electronic computed tomography image, the electronic computed tomography image corresponds to a plurality of adjacent slice images, and the adjacent slice images can be used as input data of a preset slice detection model, so that the slice detection model outputs position information of target candidate points of each suspected target in the medical image. Based on the position information, each target candidate point is determined and obtained in the medical image, and a target candidate point set is formed. And when constructing the target candidate point set based on the position information, the target candidate point set can be obtained by combining the existing three-dimensional reconstruction technology.

In some optional implementations of the present embodiment, determining the target candidate point set corresponding to the medical image may further include: determining a target slice image based on a preset number of adjacent two-dimensional slices in the medical image; and determining a target candidate point set corresponding to the medical image based on the target slice image and a preset slice detection model.

In this implementation, the execution body may be preset with a preset number, such as 5. When determining the target candidate point set based on the preset slice detection model, a preset number of adjacent two-dimensional slices, such as 5 adjacent two-dimensional slices above and below, may be determined from the medical image. Then, the execution subject may use a preset number of adjacent two-dimensional slices as a group of multi-channel pictures to obtain a target slice image. And inputting the target slice image into a preset slice detection model to obtain a target candidate point set output by the slice detection model. The process can enable the multi-channel picture to reflect the interlayer information among the two-dimensional slices, thereby reducing the probability of losing the interlayer information and improving the determination accuracy of the target candidate point set.

In other optional implementations of this embodiment, the preset slice detection model is trained based on the following steps: acquiring a sample slice image and sample labeling information; inputting a preset number of adjacent two-dimensional slices in the sample slice image into an initial detection model to obtain a detection result output by the initial detection model; and adjusting the model parameters of the initial detection model based on the detection result and the sample labeling information until the initial detection model converges to obtain the trained slice detection model.

In this implementation, when training the slice detection model, the executing subject may use a sample slice image with a target as a training sample, and the target may be a positive lung nodule. And the execution subject may also label the sample slice image, such as labeling positive lung nodule locations in the sample slice image. Then, a preset number of adjacent two-dimensional slices, such as 5 two-dimensional slices adjacent up and down, are selected from the sample slice image, the adjacent two-dimensional slices form a multi-channel picture, and the multi-channel picture is input into an initial detection model, so that the initial detection model outputs a detection result, such as the position of a positive lung nodule. Wherein, Cascade R-CNN (a target detection model) can be adopted as the initial detection model. The execution subject can continuously adjust the model parameters based on the detection result and the pre-labeled sample labeling information until the model converges to obtain the trained slice detection model.

Further, please refer to table one, which is a corresponding relationship between different preset numbers and the pulmonary nodule detection rate, as shown in table one:

watch 1

As shown in table one, the lung nodule detection rate of the slice detection models with the preset number of 5, 9 and 3 layers is respectively tested, wherein the lung nodule detection rate refers to the probability of detecting lung nodules in the medical image by using the slice detection model. The first table contains a plurality of different detection rate thresholds, and under various detection rate thresholds, it can be found that the detection rate of lung nodules in a preset number of 5 layers is generally greater than the detection rate of lung nodules in 9 layers and 3 layers.

Please refer to table two, which shows the correspondence between the preset number and the Average Recall (ACR), as follows:

watch two

As shown in table two, the ACR conditions of slice inspection models with the preset number of 5 layers, 9 layers and 3 layers are respectively tested, and for each preset number, the recall rate of 8, 4, 2, 1, 1/2, 1/4 and 1/8 false positive nodules appearing on the medical image by the slice inspection model corresponding to the preset number is determined, and finally the ACR is calculated. It can be seen that the predetermined number of ACRs of 5 layers is significantly greater than the ACRs of 9 and 3 layers. Therefore, in this embodiment, the preset number is preferably 5, and 5 adjacent layers of two-dimensional slices are selected as input data of the slice detection model.

Step 403, determining a local target image and a global target image corresponding to each target candidate point in the target candidate point set.

In this embodiment, for each target candidate point in the target candidate point set, the executing entity may determine the position coordinates of the target candidate point in the medical image, take the position coordinates as the central coordinates of the local target image and the global target image, and extract the local target image and the global target image with preset sizes. For example, based on the position coordinates as center coordinates, an image of size 36 × 36 is extracted as a local target image, and an image of size 5 × 512 is extracted as a global target image.

Instep 404, local information of the local target image is determined based on the preset three-dimensional recognition model, and global information of the global target image is determined based on the preset two-dimensional recognition model.

In this embodiment, please refer to the detailed description ofstep 203 for the detailed description ofstep 404, which is not repeated herein.

Step 405, for each target candidate point in the target candidate point set, performing image feature fusion on the local information and the global information of the target candidate point to obtain target image information.

In this embodiment, the execution subject may perform image feature fusion on the local information and the global information of each target candidate point, and specifically, may adopt the existing channel stacking technology and convolution calculation technology to perform channel stacking on the local feature map corresponding to the local information and the global feature map corresponding to the global information, and then perform convolution calculation to implement image feature fusion, so as to obtain the target image information after image feature fusion.

And step 406, determining nodule category information and nodule offset information corresponding to the target candidate point based on the target image information and a preset image recognition model.

In this embodiment, the recognition result of the medical image includes nodule category information and nodule offset information corresponding to each target candidate point in the target candidate point set. Wherein the nodule class information refers to a probability that the nodule type is positive or a probability that the nodule type is negative. The nodule offset information refers to an offset between the position coordinates of the target candidate point and the predicted position coordinates. The nodule offset information may be embodied in the offset values and offset directions corresponding to the respective coordinate dimensions. And if the node type information indicates that the probability that the node type is positive is smaller than a preset threshold value, determining that the node type is negative, and removing the target candidate point corresponding to the node type. And if the node category information indicates that the probability that the node type is positive is greater than a preset threshold value, determining that the node category is positive.

Further, please refer to table three, which is based on the Tianchi test set, and tests the comparison of ACR index conditions using the image recognition model and not using the image recognition model.

Watch III

As shown in table three, scheme 1 refers to image recognition directly based on the existing detection model, and scheme 2 refers to image recognition based on the preset image recognition model, local information and global information in this embodiment. And respectively determining the recall rate of 8, 4, 2, 1, 1/2, 1/4 and 1/8 false positive nodules appearing on the medical image by the scheme 1 and the scheme 2, and finally calculating the ACR index. It can be seen that, by using the image recognition based on the preset image recognition model, the local information and the global information in the embodiment to perform the image recognition, the ACR index value is higher.

Referring to table four, the comparison of ACR indices using and without an image recognition model was tested based on the LUNA-16(Lung Nodule Analysis 16, a pulmonary Nodule detection dataset derived in 16 years) dataset.

Watch four

As shown in table four, the test results on the LUNA-16 dataset also show that the ACR index value is higher when the image recognition is performed based on the preset image recognition model, the local information and the global information in the embodiment. Therefore, the image recognition effect in the application is better.

Step 407, for each target candidate point, in response to determining that the node category information corresponding to the target candidate point is a preset node category, correcting the position coordinate corresponding to the target candidate point based on the node offset information corresponding to the target candidate point to obtain a corrected position coordinate.

In this embodiment, the preset nodule category may be a positive category, and if the nodule category corresponding to the target candidate point is the positive category, the position coordinate corresponding to the target candidate point is corrected based on the nodule offset information corresponding to the target candidate point. The specific correction method may be that, for each coordinate dimension, the value in the coordinate dimension is corrected based on the value of the target candidate point in the coordinate dimension and the offset value and the offset direction in the coordinate dimension in the nodule offset information to obtain a corrected value in the coordinate dimension, and the corrected position coordinate is obtained based on the corrected values in the coordinate dimensions.

The medical image identification method provided by the embodiment of the application can also determine the target candidate point set by using a preset slice detection model, and then determine the local target image and the global target image corresponding to each target candidate point in the target candidate point set, so that the target candidate points are roughly determined, and then the final identification result is determined based on the local information and the global information of the target candidate points, and the identification result is more accurate. And the slice detection model can be based on the adjacent two-dimensional slices of predetermined quantity and detect as input data, because contain the information between the layer in the adjacent two-dimensional slice, consequently detect the precision higher. In addition, the finally identified and recorded data can include nodule type information and nodule offset information, and for target candidate points in a preset nodule type, position correction can be further performed on the basis of the nodule offset information, so that the accuracy of the finally acquired position is improved.

With continued reference to FIG. 5, aflow 500 of one embodiment of a recognition model training method according to the present application is shown. The recognition model training method of the embodiment comprises the following steps:

step 501, obtaining a sample image; the sample image is marked with each sample candidate point and the real recognition result corresponding to each sample candidate point.

In this embodiment, the sample image may be a medical image used as a sample, and the medical image includes a target to be identified, such as a positive lung nodule. Each sample candidate point labeled in the sample image is a point where each target is located, and the real recognition result may be a recognition result of a target corresponding to the sample candidate point, such as a lung nodule type and a nodule offset.

In some optional implementations of this embodiment, the sample candidate points may be labeled based on the following steps: determining a sample candidate point set corresponding to the sample image based on the sample image and a preset slice detection model; and marking each sample candidate point in the sample candidate point set and a real identification result corresponding to each sample candidate point in the sample image.

In this implementation, after the execution subject acquires the sample image, the sample candidate point set may be determined based on the sample image and a preset slice detection model. The specific steps for determining the sample candidate point set are similar to the steps for determining the target candidate point set, and please refer to the detailed description ofstep 402, which is not repeated herein. Further, the execution subject may mark each sample candidate point and a real recognition result corresponding to the sample candidate point in the sample image.

Step 502, for each sample candidate point, determining local sample information and global sample information corresponding to the sample candidate point.

In this embodiment, the specific steps for determining the local sample information and the global sample information corresponding to the sample candidate point are similar to the steps for determining the local information and the global information corresponding to the target candidate point, please refer to the detailed description ofsteps 403 to 404, and are not repeated herein.

In some optional implementations of this embodiment, for each sample candidate point, determining the local sample information and the global sample information corresponding to the sample candidate point may include: for each sample candidate point, determining a local sample image and a global sample image corresponding to the sample candidate point; the method comprises the steps of determining local sample information of each local sample image based on a preset three-dimensional recognition model, and determining global sample information of each global sample image based on a preset two-dimensional recognition model.

Step 503, determining a sample recognition result output by the initial recognition model based on the local sample information, the global sample information and the initial recognition model.

In this embodiment, the execution subject may perform image feature fusion on the local sample information and the global sample information, and input the image feature fused sample information into the initial recognition model, so that the initial recognition model outputs a sample recognition result. The initial recognition model can adopt various existing deep learning models. The sample recognition result is used to indicate nodule class information and nodule offset corresponding to each sample candidate point.

And step 504, based on the sample recognition result and the real recognition result, adjusting model parameters of the initial recognition model until the initial recognition model is converged to obtain the trained image recognition model.

In this embodiment, the execution subject may substitute a preset loss function to calculate a loss value based on the probability that each sample candidate point in the sample recognition result belongs to a positive lung nodule and the nodule offset of each sample candidate point, and whether each sample candidate point in the true recognition result belongs to a positive lung nodule and the nodule offset. And continuously adjusting parameters of the initial recognition model to enable the loss function to meet a preset convergence condition, so as to obtain the trained image recognition model. And the execution main body can be preset with a loss function corresponding to the lung nodule positive probability and a loss function corresponding to the offset, so that the loss function corresponding to the lung nodule positive probability and the loss function corresponding to the offset both meet preset convergence conditions, and a trained image recognition model is obtained. The loss function corresponding to the positive probability of the lung nodule can adopt focal loss (a loss function for solving imbalance of positive and negative sample proportions), and the loss function corresponding to the offset can adopt smooth L1 loss (a relatively smooth loss function).

According to the recognition model training method provided by the embodiment of the application, local information and global information can be acquired based on the preset three-dimensional recognition model and the two-dimensional recognition model in the training process of the image recognition model, the accuracy of image characteristics can be improved, the local target image is recognized by the preset three-dimensional recognition model, the recognition efficiency can be improved, and therefore the image recognition accuracy and the image recognition efficiency are balanced.

With further reference to fig. 6, as an implementation of the methods shown in the above figures, the present application provides an embodiment of a medical image recognition apparatus, which corresponds to the method embodiment shown in fig. 2, and which is particularly applicable to various servers or terminal devices.

As shown in fig. 6, the medical image recognition apparatus 600 of the present embodiment includes: a candidate point determining unit 601, a target determining unit 602, an information determining unit 603, an image recognizing unit 604.

A candidate point determining unit 601 configured to determine a target candidate point set corresponding to the medical image.

A target determining unit 602 configured to determine a local target image and a global target image corresponding to each target candidate point in the set of target candidate points.

An information determination unit 603 configured to determine local information of a local target image based on a preset three-dimensional recognition model, and determine global information of a global target image based on a preset two-dimensional recognition model.

An image recognition unit 604 configured to determine a recognition result of the medical image based on the local information, the global information, and a preset image recognition model.

In some optional implementations of this embodiment, the candidate point determining unit 601 is further configured to: and determining a target candidate point set corresponding to the medical image based on the medical image and a preset slice detection model.

In some optional implementations of this embodiment, the preset slice detection model is trained based on the following steps: acquiring a sample slice image and sample labeling information; inputting a preset number of adjacent two-dimensional slices in the sample slice image into an initial detection model to obtain a detection result output by the initial detection model; and adjusting the model parameters of the initial detection model based on the detection result and the sample labeling information until the initial detection model converges to obtain the trained slice detection model.

In some optional implementations of this embodiment, the candidate point determining unit 601 is further configured to: determining a target slice image based on a preset number of adjacent two-dimensional slices in the medical image; and determining a target candidate point set corresponding to the medical image based on the target slice image and a preset slice detection model.

In some optional implementation manners of this embodiment, the recognition result of the medical image includes nodule category information and nodule offset information corresponding to each target candidate point in the target candidate point set; and, the image recognition unit 604 is further configured to: for each target candidate point in the target candidate point set, carrying out image feature fusion on local information and global information of the target candidate point to obtain target image information; and determining nodule category information and nodule offset information corresponding to the target candidate point based on the target image information and a preset image recognition model.

In some optional implementations of this embodiment, the apparatus further includes: and the position correction unit is configured to respond to the determination that the node type information corresponding to the target candidate point is a preset node type for each target candidate point, correct the position coordinate corresponding to the target candidate point based on the node offset information corresponding to the target candidate point, and obtain the corrected position coordinate.

It should be understood that units 601 to 604 recited in the medical image recognition apparatus 600 correspond to respective steps in the method described with reference to fig. 2. Thus, the operations and features described above for the medical image recognition method are equally applicable to the apparatus 600 and the units comprised therein and will not be described in further detail herein.

With further reference to fig. 7, as an implementation of the methods shown in the above-mentioned figures, the present application provides an embodiment of a recognition model training apparatus, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 5, and the apparatus may be specifically applied to various servers or terminal devices.

As shown in fig. 7, the recognition model training apparatus 700 of the present embodiment includes: a sample acquisition unit 701, a sample information determination unit 702, a sample recognition unit 703, and a model training unit 704.

A sample acquisition unit 701 configured to acquire a sample image; the sample image is marked with each sample candidate point and the real recognition result corresponding to each sample candidate point.

A sample information determining unit 702 configured to determine, for each sample candidate point, local sample information and global sample information corresponding to the sample candidate point.

And a sample identification unit 703 configured to determine a sample identification result output by the initial identification model based on the local sample information, the global sample information, and the initial identification model.

A unit 704 configured to adjust model parameters of the initial recognition model based on the sample recognition result and the real recognition result until the initial recognition model converges, resulting in a trained image recognition model.

In some optional implementations of this embodiment, the sample information determining unit 702 is further configured to: for each sample candidate point, determining a local sample image and a global sample image corresponding to the sample candidate point; the method comprises the steps of determining local sample information of each local sample image based on a preset three-dimensional recognition model, and determining global sample information of each global sample image based on a preset two-dimensional recognition model.

In some optional implementations of this embodiment, the apparatus further includes: the sample labeling unit is configured to determine a sample candidate point set corresponding to the sample image based on the sample image and a preset slice detection model; and marking each sample candidate point in the sample candidate point set and a real identification result corresponding to each sample candidate point in the sample image.

It should be understood that the units 701 to 704 recited in the recognition model training apparatus 700 correspond to respective steps in the method described with reference to fig. 5. Thus, the operations and features described above with respect to the recognition model training method are also applicable to the apparatus 700 and the units included therein, and are not described herein again.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present application.

Fig. 8 shows a block diagram of anelectronic device 800 for implementing a medical image recognition method or a recognition model training method according to an embodiment of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 8, theapparatus 800 includes acomputing unit 801 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)802 or a computer program loaded from astorage unit 808 into a Random Access Memory (RAM) 803. In theRAM 803, various programs and data required for the operation of thedevice 800 can also be stored. Thecalculation unit 801, theROM 802, and theRAM 803 are connected to each other by abus 804. An input/output (I/O)interface 805 is also connected tobus 804.

A number of components in thedevice 800 are connected to the I/O interface 805, including: aninput unit 806, such as a keyboard, a mouse, or the like; anoutput unit 807 such as various types of displays, speakers, and the like; astorage unit 808, such as a magnetic disk, optical disk, or the like; and acommunication unit 809 such as a network card, modem, wireless communication transceiver, etc. Thecommunication unit 809 allows thedevice 800 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

Thecomputing unit 801 may be a variety of general purpose and/or specialized push information having processing and computing capabilities. Some examples of thecomputing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and the like. Thecalculation unit 801 performs the respective methods and processes described above, such as a medical image recognition method or a recognition model training method. For example, in some embodiments, the medical image recognition method or the recognition model training method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such asstorage unit 808. In some embodiments, part or all of the computer program can be loaded and/or installed ontodevice 800 viaROM 802 and/orcommunications unit 809. When the computer program is loaded into theRAM 803 and executed by thecomputing unit 801, one or more steps of the medical image recognition method or the recognition model training method described above may be performed. Alternatively, in other embodiments, thecomputing unit 801 may be configured to perform the medical image recognition method or the recognition model training method by any other suitable means (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel or sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A medical image recognition method, comprising:

determining a target candidate point set corresponding to the medical image;

determining a local target image and a global target image corresponding to each target candidate point in the target candidate point set;

determining local information of the local target image based on a preset three-dimensional recognition model, and determining global information of the global target image based on a preset two-dimensional recognition model;

and determining the recognition result of the medical image based on the local information, the global information and a preset image recognition model.

2. The method of claim 1, wherein the determining a set of target candidate points to which the medical image corresponds comprises:

and determining the target candidate point set corresponding to the medical image based on the medical image and a preset slice detection model.

3. The method of claim 2, wherein the preset slice detection model is trained based on the following steps:

acquiring a sample slice image and sample labeling information;

inputting the preset number of adjacent two-dimensional slices in the sample slice image into an initial detection model to obtain a detection result output by the initial detection model;

and adjusting the model parameters of the initial detection model based on the detection result and the sample labeling information until the initial detection model converges to obtain the trained slice detection model.

4. The method of claim 1, wherein the determining a set of target candidate points to which the medical image corresponds comprises:

determining a target slice image based on a preset number of adjacent two-dimensional slices in the medical image;

and determining the target candidate point set corresponding to the medical image based on the target slice image and a preset slice detection model.

5. The method according to claim 1, wherein the medical image recognition result includes nodule class information and nodule offset information corresponding to each target candidate point in the target candidate point set; and

the determining the recognition result of the medical image based on the local information, the global information and a preset image recognition model comprises:

for each target candidate point in the target candidate point set, performing image feature fusion on local information and global information of the target candidate point to obtain target image information;

and determining nodule category information and nodule offset information corresponding to the target candidate point based on the target image information and the preset image recognition model.

6. The method of claim 5, wherein the method further comprises:

and for each target candidate point, in response to the fact that the node type information corresponding to the target candidate point is determined to be a preset node type, correcting the position coordinate corresponding to the target candidate point based on the node offset information corresponding to the target candidate point to obtain a corrected position coordinate.

7. A recognition model training method, comprising:

acquiring a sample image; marking each sample candidate point and a real identification result corresponding to each sample candidate point in the sample image;

for each sample candidate point, determining local sample information and global sample information corresponding to the sample candidate point;

determining a sample identification result output by an initial identification model based on the local sample information, the global sample information and the initial identification model;

and adjusting the model parameters of the initial recognition model based on the sample recognition result and the real recognition result until the initial recognition model is converged to obtain a trained image recognition model.

8. The method of claim 7, wherein the determining, for each sample candidate point, local sample information and global sample information corresponding to the sample candidate point comprises:

for each sample candidate point, determining a local sample image and a global sample image corresponding to the sample candidate point;

the method comprises the steps of determining local sample information of each local sample image based on a preset three-dimensional recognition model, and determining global sample information of each global sample image based on a preset two-dimensional recognition model.

9. The method of claim 7, wherein the method further comprises:

determining a sample candidate point set corresponding to the sample image based on the sample image and a preset slice detection model;

and marking each sample candidate point in the sample candidate point set and a real identification result corresponding to each sample candidate point in the sample image.

10. A medical image recognition apparatus comprising:

a candidate point determining unit configured to determine a target candidate point set corresponding to the medical image;

a target determining unit configured to determine a local target image and a global target image corresponding to each target candidate in the target candidate point set;

an information determination unit configured to determine local information of the local target image based on a preset three-dimensional recognition model and determine global information of the global target image based on a preset two-dimensional recognition model;

an image recognition unit configured to determine a recognition result of the medical image based on the local information, the global information, and a preset image recognition model.

11. The apparatus of claim 10, wherein the candidate point determination unit is further configured to:

12. The apparatus of claim 11, wherein the preset slice detection model is trained based on:

acquiring a sample slice image and sample labeling information;

13. The apparatus of claim 10, wherein the candidate point determination unit is further configured to:

14. The apparatus according to claim 10, wherein the recognition result of the medical image includes nodule class information and nodule offset information corresponding to each target candidate point in the target candidate point set; and

the image recognition unit is further configured to:

15. The apparatus of claim 14, wherein the apparatus further comprises:

and the position correction unit is configured to respond to the determination that the node type information corresponding to the target candidate point is a preset node type for each target candidate point, correct the position coordinate corresponding to the target candidate point based on the node offset information corresponding to the target candidate point, and obtain the corrected position coordinate.

16. A recognition model training apparatus comprising:

a sample acquisition unit configured to acquire a sample image; marking each sample candidate point and a real identification result corresponding to each sample candidate point in the sample image;

a sample information determination unit configured to determine, for each sample candidate point, local sample information and global sample information corresponding to the sample candidate point;

a sample identification unit configured to determine a sample identification result output by an initial identification model based on the local sample information, the global sample information and the initial identification model;

and the model training unit is configured to adjust model parameters of the initial recognition model based on the sample recognition result and the real recognition result until the initial recognition model converges to obtain a trained image recognition model.

17. The apparatus of claim 16, wherein the sample information determination unit is further configured to:

18. The apparatus of claim 16, wherein the apparatus further comprises:

the sample labeling unit is configured to determine a sample candidate point set corresponding to the sample image based on the sample image and a preset slice detection model; and marking each sample candidate point in the sample candidate point set and a real identification result corresponding to each sample candidate point in the sample image.

19. An electronic device that performs a medical image recognition method or a recognition model training method, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-9.

20. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-9.

21. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-9.