Method and system for automatically learning online and intelligently assisting in labeling medical imagesTechnical Field
The invention relates to a method and a system for automatically learning online and intelligently assisting in labeling medical images, and belongs to the technical field of medical images.
Background
The percentage of the population of the aged people above 65 in China in 2002 is more than 7%, which marks that China enters an aging society and enters a deep aging stage in 2010. In the future, the ratio of the nourishment of the aged population is increased continuously, and in the aspect of medical resource supply, the number of doctors owned by every ten thousand in China is greatly different from that of doctors owned by developed countries, which directly means the increase of medical health requirements, so that the contradiction of imbalance between supply and demand needs to improve the diagnosis efficiency of doctors by means of technical means, and AI (artificial intelligence) medical treatment is greatly developed.
The concept of Artificial Intelligence (AI) has been known for a long time, but once the lack of data resources and the insufficient calculation power are prevented, the technology is difficult to be applied on a large scale due to the immature algorithm. However, with the great improvement of the GPU computing power, the effective integration of big data resources, the great improvement of the algorithm, and artificial intelligence, artificial intelligence is beginning to fall in all walks of life.
In the medical field, the total amount of medical data has seen explosive growth in the last decade. The construction of medical big data platforms developed around data storage, data security, data annotation and the like is also developed in order. At present, more than 90% of medical data come from medical images, including CT, PET, MR, DR and the like, and rapid development of AI medical images is promoted due to simple picture data structure and mature image identification technology.
The artificial intelligence algorithm is usually supervised learning in the field of image recognition, the algorithm needs to train massive data with artificial marks, however, medical image data often exist in medical institutions and are difficult to acquire compared with natural images, data labeling needs professional doctors, and different doctors may have different opinions on the same medical image. Although the AI medical field has a wide prospect due to various reasons, the development status is not optimistic, and the whole industry faces huge challenges of scarcity of medical data and difficulty in labeling only in the data relation.
At present, for image annotation, the following related patents, namely an interactive method and a system for semi-automatic image annotation, an artificial intelligent data annotation method and device, an image intelligent annotation method based on a Yolov3 deep learning network and the like are searched, all of which use an artificial intelligent method to assist in data annotation, but do not realize intellectualization on data acquisition and data set production and do not relate to the problem of data set scale.
The present application was made based on this.
Disclosure of Invention
In order to solve the defects in the prior art, the invention provides a method and a system for automatically and intelligently assisting in online learning to label medical images.
In order to achieve the purpose, the technical scheme adopted by the invention is as follows:
a method for automatically learning and intelligently assisting in labeling medical images on line comprises the following steps:
(1) the intelligent marking system is accessed to a PACS system of a hospital;
(2) selecting screening conditions according to the data and the labeling requirements;
(3) automatically scanning a dicom file, a diagnosis report and a pathology report according to the screening condition, and automatically exporting the dicom sequence meeting the screening condition, the diagnosis report and the pathology report to an intelligent labeling system after desensitization;
(4) marking by doctors according to the diagnosis report and the pathology report;
(5) when the number of data labels reaches a certain specified threshold value, dividing the data into a deep learning model test set, and dividing the data labeled later into a training set;
(6) when the size of the data set reaches another specified threshold value, starting a deep learning model training program;
(7) extracting a part of data as a model tuning set, saving model parameters when a model training loss function does not continuously decrease any more, and intelligently generating an auxiliary recommendation marking program to generate an auxiliary recommendation marking;
(8) doctors can add, delete, modify and check on the auxiliary recommendation labels according to the diagnosis reports and the pathological reports;
(9) starting a model test program, testing the stored model parameters in a test set, and calculating the concerned test set indexes;
(10) and if the indexes of the test set are unstable, the doctor continues to label the data, the labeled data next received by the doctor is divided into a training set, the optimization set is adjusted, the test set continues to carry out migration training until the indexes of the test set are stable, the labeling is finished, and the data set is established.
In the step (7), the data volume of the tuning set accounts for one fourth of the data volume of the data set.
In the step (10), the data proportion of the training set, the tuning set and the testing set is 3: 1: 1.
the invention relates to a system for automatically learning on line and intelligently assisting in labeling medical images, which comprises
The data screening module: the system is used for selecting screening conditions according to data and marking requirements;
a data cleansing and export module: the system is used for automatically scanning the dicom file, the diagnosis report and the pathology report according to the screening condition, and automatically exporting the dicom sequence meeting the screening condition to an intelligent labeling system after desensitization and the diagnosis report and the pathology report are conducted;
a data dividing module: the system is used for dividing data into a deep learning model test set and dividing the data labeled later into a training set;
a model training module: the training program is used for starting a deep learning model training program;
and an auxiliary recommendation marking generation module: the method comprises the steps of extracting a part of data to serve as a model tuning set, saving model parameters when a model training loss function does not continuously decrease any more, starting an intelligent auxiliary recommendation marking program, and generating an auxiliary recommendation marking;
the model testing and index calculating module comprises: the system is used for starting a model test program, testing the saved model parameters in a test set and calculating the concerned test set index;
a data set perfecting module: and the method is used for dividing the labeling data received by the doctor into a training set, a tuning set and a testing set to continue migration training until the indexes of the testing set are stable, and then the labeling is finished and the data set is established.
The test model and calculation index module is specifically a deep neural network model test module: the device is used for testing the marked data; and the test set evaluation index module is used for calculating and evaluating the marked data.
The working principle of the invention is as follows: doctors often cannot meet the labeling requirements through diagnosis reports, pathological reports and clinical experience of the doctors, and the method for generating the recommended labels by using the artificial intelligent algorithm can make up for the defects. The intelligent marking needs to predict the image by using a deep neural network model, the prediction result is provided for a doctor, and the doctor can mark the image by combining the model prediction result, the scanning report result and the clinical experience of the doctor for many years. However, the deep neural network model requires data to be trained and learned to predict the result, and it is not known how much data needs to be trained. The system can train the model, label intelligently, generate the data set, and determine the scale of the data set.
The invention can realize the following technical effects:
(1) the system of the invention is directly accessed into a hospital system, and can automatically screen the patient cases to be labeled according to conditions.
(2) The invention can automatically scan the dicom file, the diagnosis report and the pathology report according to the screening condition, and automatically export the dicom sequence meeting the screening condition and the diagnosis report and the pathology report into the intelligent labeling system after desensitization, thereby realizing automatic cleaning of case images and overcoming the defects that the data needed by manual acquisition and data cleaning in the prior art.
(3) The invention can automatically scan the examination report and the pathology report, generate auxiliary marking information and make up the defect that manual marking is needed and marking prompt cannot be provided in the prior art.
(4) The method can intelligently prompt the recommended annotation in the annotation process of the doctor, continuously optimizes the training model in the annotation process of the doctor, realizes the online training of the model, and is different from the prior art that the model is trained after the annotation of the data set is completed.
(5) The invention can determine the scale of the data set according to the model training result and overcome the difficulty that the scale of the data set needing to be marked cannot be known in the prior art.
Drawings
FIG. 1 is a diagram illustrating the steps executed by the system for automatically learning and intelligently assisting in labeling medical images on line according to the present embodiment;
FIG. 2 is a diagram illustrating an execution procedure of a labeling-only module in the automatic online learning intelligent auxiliary labeling medical image system according to the embodiment;
fig. 3 is a diagram illustrating steps executed by the model training module in the system for automatically learning online and intelligently assisting in labeling medical images according to the embodiment.
Detailed Description
In order to make the technical means and technical effects achieved by the technical means of the present invention more clearly and more perfectly disclosed, the following embodiments are provided, and the following detailed description is made with reference to the accompanying drawings:
as shown in fig. 1, the method for automatically learning and intelligently assisting labeling medical images online includes the following steps:
(1) the intelligent labeling system is accessed into a PACS system of a hospital:
(2) selecting screening conditions (e.g., pulmonary nodules, cords, arteriosclerosis, calcification, etc.) based on the data and labeling requirements;
(3) the system automatically scans the dicom files, the diagnosis reports and the pathological reports according to the screening conditions, and the dicom sequences meeting the screening conditions are desensitized and the diagnosis reports and the pathological reports are automatically exported to the intelligent labeling system.
In order to make a data set, case data needs to be acquired, a labeling system is accessed into a PACS system of a hospital, and data of the hospital for nearly ten years can be consulted, however, medical image data in the hospital are various and are full of professions, single patient data often also comprises data such as a positioning sheet and an information sheet which are irrelevant to model training, and deep learning model training data often needs to be screened for data meeting certain conditions, such as certain disease type data, certain organ data, data layer thickness requirements and the like. The system comprises a plurality of data screening conditions, including image types (CT, MR, PET and the like), image scanning parts (head, chest, abdomen and the like), image scanning protocols (lung windows, longitudinal windows and the like), focus types (cerebral apoplexy, nodules, pneumonia and the like), and the system scans dicom head files, diagnosis reports and pathological reports according to the screening conditions, leads the qualified dicom data out of the system after desensitization, and leads the diagnosis reports and pathological report scanning results out of the system as auxiliary labels.
(4) And the annotation doctor starts to annotate according to the diagnosis report and the pathology report.
(5) When the number of data labels reaches a certain specified threshold value N, the N data are classified into a deep learning model test set, and the 4N data labeled later are classified into a training set
(6) When the data set size reaches a specified threshold of 5N, a deep learning model training procedure is initiated.
(7) Model training 4N data were divided into 3: and 1, taking N data as a model tuning set, saving model parameters when a model training loss function does not continuously decline any more, and starting a model test program and an intelligent labeling recommendation program.
(8) After the intelligent annotation recommending program is started, the deep learning recommendation annotation is provided when the doctor performs annotation, and the doctor can perform addition, deletion, modification and check on the recommendation annotation according to the diagnosis report, the pathological report and the clinical experience of the doctor.
(9) Starting a model test program, testing the parameters just stored in the test set, and calculating the indexes of interest (such as sensitivity, specificity, etc.)
(10) In the process of continuously marking data by doctors, randomly dividing the data into a training set, an optimizing set and a testing set, and keeping the data volume ratio as 3: 1: 1.
(11) and under the condition that the data volume is continuously increased, continuously transferring the learning model until the indexes of the concerned test set are balanced and do not change any more, and ending the data set marking. The data set establishment is complete.
Doctor passes the diagnosisReports and pathological reports and the clinical experience of the patients often cannot meet the labeling requirements, and the method for generating the recommended labels by using the artificial intelligent algorithm can make up for the defects. The intelligent marking needs to predict the image by using a deep neural network model, and the prediction result is provided for a doctor, so that the doctor can pass the prediction resultDieAnd (4) marking the type prediction result, the scanning report result and the clinical experience of the user for years by combining. However, the deep neural network model requires data to be trained and learned to predict the result, and it is not known how much data needs to be trained. The system can train the model, label intelligently, generate the data set, and determine the scale of the data set.
A doctor needs to label a certain amount of N data, the N data are divided into model test sets, when the number of the test sets is larger than N, the labeled data sets are divided into training sets, when the number of the training sets is larger than 4, a model training program is started as shown in figure 3, the N training set data are segmented as tuning sets, and a loss function of the model in the tuning sets is set as flossContinuously recording and updating the minimum loss value min in the training processloss) When continuing to train f obtained by calculation in the k-round tuning setlossAre all greater than minloss) And explaining the model to be not optimized any more, saving the parameters of the model, and starting an intelligent generation auxiliary recommendation marking program and a test set evaluation index program. The intelligent generation auxiliary recommendation marking program can provide a prediction result for reference for marking of a doctor, and the doctor can directly perform addition, deletion, modification and check on the prediction result. In the model test program, the model test evaluation index is set to be T, T is changed along with the size of the data set and finally tends to be stable along with the increase of the data set and the continuous migration training of the model, and the stability coefficient S is set to be Tmax-TminRepresenting the difference between the maximum value and the minimum value in a certain change interval of the data set, and setting a threshold value SthresholdAssuming S > SthresholdAnd if the indexes of the test set are unstable, dividing the labeled data next by the doctor into the data set to continue the migration training, and keeping the training set, the tuning set and the test set in proportion of 3: 1: 1, when S is less than or equal to SthresholdShi, doctorAnd finishing the label generation and finishing the data set establishment.
In the step (7), the intelligent labeling module, that is, the intelligent labeling recommendation program (including labeling of 2D images and labeling of 3D images), as shown in fig. 2, includes common labeling function modules, including target frames, target shape drawing, target attribute addition, modification, deletion, and the like. If a diagnosis report or a pathological report exists, the report is prompted to a labeling doctor, if the deep neural network learns the parameters, the prediction result of the neural network is prompted to the labeling doctor, and the intelligent auxiliary labeling prompt is given to the doctor.
The above description is provided for the purpose of further elaboration of the technical solutions provided in connection with the preferred embodiments of the present invention, and it should not be understood that the embodiments of the present invention are limited to the above description, and it should be understood that various simple deductions or substitutions can be made by those skilled in the art without departing from the spirit of the present invention, and all such alternatives are included in the scope of the present invention.