wherein n is the number of types of the recognition result; z is a radical of_i Predicting a probability for each category for the model; z is a radical of_y Identifying for each preprocessed data an actual value;

(2.6) reversely transmitting the Loss value to obtain the updating amount of each network parameter, updating other parameters except the convolutional layer by using a gradient descent algorithm, and adding 1 to the iteration number;

and (2.7) judging whether the iteration number reaches a set value, if not, repeating the steps (2.5) - (2.6), if so, finishing training to obtain a neural network recognition model, and storing the neural network recognition model in a recognition model unit.

Further, after the step (2.7), the method further comprises:

step (2.8) of extracting identification preprocessing data from the test set, importing the identification preprocessing data into the neural network identification model obtained in the step (2.7) for identification, and comparing the identification result with the real result of the patient to obtain four values of TP, FP, FN and TN, wherein TP represents the number of positive examples predicted to be positive, FP represents the number of negative examples predicted to be positive, FN represents the number of positive examples predicted to be negative, and TN represents the number of negative examples predicted to be negative;

and (2.9) drawing an ROC curve of the classification result through four values of TP, FP, FN and TN, wherein the abscissa of the curve is a false positive rate FPR, the ordinate is a true positive rate TPR, and the calculation modes of the false positive rate FPR and the true positive rate TPR are shown as the following formula:

wherein, FPR represents the ratio of all real cases which are predicted to be positive but real cases are negative, TPR represents the ratio of all real cases which are predicted to be positive and real cases are positive, and TPR represents the ratio of all real cases which are predicted to be positive;

step (2.10) drawing an ROC curve through continuous threshold values of TPR and FPR between 0 and 1, and calculating the area under the ROC curve;

and (2.11) repeating the steps (2.2) to (2.10) to obtain the ROC curve areas of a plurality of neural network identification models, selecting the neural network identification model with the largest ROC curve area, and storing the neural network identification model into an identification model unit.

Further, in the step (3), the method includes the following steps:

(3.1) converting the return visit and review chest image information of the patient to be predicted in 1 st to 4 th months after the lung nodule is confirmed into an image in a jpg format;

(3.2) carrying out standardization processing on the converted image obtained in the step (3.1);

(3.3) cutting out the partial image of the pulmonary tuberculosis focus body from the image obtained after standardization in the step (3.2), and adjusting to obtain a unified image of the pulmonary tuberculosis focus body;

and (3.4) combining the unified map of the lung nodule focus body part obtained in the step (3.3) and the real result of the patient into pre-processing data to be predicted and identified.

Compared with the prior art, the invention has the following beneficial effects:

1. the lung nodule identification system based on the time sequence image comprises a data preprocessing module and an identification model module, wherein the data preprocessing module is connected with the identification model module. Obtaining the revisit and recheck chest image information of the patient from 1 to 4 months after the lung nodule is diagnosed by the patient through an interactive mode, preprocessing the revisit and recheck chest image information to obtain recognition preprocessing data, then constructing an initialized neural network recognition model, training the neural network recognition model by using the recognition preprocessing data of each patient in a case, and recognizing the lesion property of the lung nodule by the trained neural network recognition model according to the revisit and recheck chest image information of the patient from 1 to 4 months after the lung nodule is diagnosed by the patient. The recognition model module includes: the system comprises a case base, an identification model unit and a training unit, wherein the identification model unit is respectively connected with the case base and the training unit; the identification model unit comprises a CNN-based feature extractor group, a concatemate module and an RNN module, the CNN-based feature extractor is connected with the RNN module through the concatemate module, the CNN-based feature extractor group is used for extracting lung nodule image features of each period from identification preprocessing data, the concatemate module is used for splicing into a two-dimensional tensor, the RNN module is used for converting the two-dimensional tensor into a time sequence and outputting an identification result after outputting an information stream, so that an end-to-end deep learning model based on a convolutional neural network and a cyclic neural network is constructed from the angle of analyzing a lung nodule patient time sequence medical image, the benign and malignant variable properties of the lung nodule are predicted by capturing the change rule of the lung nodule, the identification speed is high, and the accuracy is high.

2. The invention relates to a pulmonary nodule identification method based on time sequence images, which comprises the steps of firstly obtaining the revisit and recheck chest image information of a patient in 1 to 4 months after the patient confirms the diagnosis of pulmonary nodules, preprocessing the obtained image information to obtain identification preprocessing data; storing the obtained identification preprocessing data into a case library, and training a neural network identification model by using a training unit; calling and preprocessing the revisit and rechecking chest image information of the patient to be predicted in 1 to 4 months after the lung nodule is diagnosed, and obtaining preprocessing data to be predicted and identified; and importing the pre-processing data to be predicted into a neural network recognition model to obtain a recognition result, so that the lesion property of the lung nodule can be effectively recognized.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

FIG. 1 is a schematic of the present system;

FIG. 2 is a schematic diagram of the structure of the recognition model unit in the present system;

fig. 3 is a schematic diagram of the structure of the feature extractor in the present system.

Detailed Description

The technical solution of the present invention will be described in further detail with reference to the following embodiments, but the present invention is not limited thereto.

As shown in fig. 1 to 3, a lung nodule recognition system based on time series images of the present invention includes:

and the data preprocessing module is used for acquiring the revisit and revisit chest image information of the patient from 1 st month to 4 th month after the lung nodule is confirmed through an interactive mode, and preprocessing the revisit and revisit chest image information to obtain identification preprocessing data.

And the recognition model module is used for constructing a training neural network recognition model from the obtained recognition preprocessing data or recognizing a judgment result from the obtained recognition preprocessing data.

The data preprocessing module is connected with the identification model module. Obtaining the revisit and recheck chest image information of the patient from 1 to 4 months after the lung nodule is diagnosed by the patient through an interactive mode, preprocessing the revisit and recheck chest image information to obtain recognition preprocessing data, then constructing an initialized neural network recognition model, training the neural network recognition model by using the recognition preprocessing data of each patient in a case, and recognizing the lesion property of the lung nodule by the trained neural network recognition model according to the revisit and recheck chest image information of the patient from 1 to 4 months after the lung nodule is diagnosed by the patient.

The identification model module comprises:

and the case base is used for storing the identification preprocessing data of each patient.

And the recognition model unit is used for storing the trained neural network recognition model or analyzing and recognizing the input recognition preprocessing data of the patient and then outputting a result.

the recognition model unit is respectively connected with the case library and the training unit.

The identification model unit comprises:

the Concatenate module: the two-dimensional tensor is used for splicing the extracted lung nodule image characteristics of each period;

an RNN module: the system is used for converting the two-dimensional tensor into a time sequence and then outputting an information stream;

the CNN-based feature extractor is connected with the RNN module through the Concatenate module so as to construct an end-to-end deep learning model based on a convolutional neural network and a cyclic neural network from the viewpoint of analyzing a time sequence medical image of a pulmonary nodule patient, and the benign and malignant lesion properties of the pulmonary nodule are predicted by capturing the change rule of the pulmonary nodule, so that the recognition speed is high, and the accuracy is high. In order to capture the change rule of the lung nodule, the invention provides a neural network identification model for processing time sequence image data based on the characteristics that a Convolutional Neural Network (CNN) can extract image features and a Recurrent Neural Network (RNN) can process time sequence data, wherein the model consists of a CNN-based feature extractor group, a Concatenate module and an RNN module.

The data preprocessing module comprises:

the interaction unit is used for acquiring the revisit and review chest image information of the patient in 1 st to 4 th months after the lung nodule is confirmed through the interaction interface;

the image cutting unit is used for cutting out a picture of a focus body part of the sarcoidosis after carrying out standardization treatment on the acquired revisit and rechecked chest image information of the patient from 1 st month to 4 th months after the lung nodule is diagnosed and then carrying out unification treatment;

the image enhancement unit is used for enhancing the images of the nodule focus body part subjected to the unified processing to obtain identification preprocessing data;

the interaction unit is connected with the image enhancement unit through the image cutting unit.

The CNN-based feature extractor group comprises 4 feature extractors which are arranged in parallel, each feature extractor comprises a convolution-pooling layer and a full-connection layer, the output end of each convolution-pooling layer is connected with the input end of each full-connection layer, and the output end of each full-connection layer outputs 2048-dimensional feature vectors.

The feature extractor further comprises an input judgment layer, and the output end of the input judgment layer is respectively connected with the input end of the convolution-pooling layer and the input end of the Concatenate module.

The invention discloses a lung nodule identification method based on a time sequence image, which comprises the following steps:

(1) And obtaining the revisit and recheck chest image information of the patient from 1 st to 4 th months after the lung nodule is diagnosed, and preprocessing the revisit and recheck chest image information to obtain identification preprocessing data. The patient will make a revisit examination regularly after the lung nodule is diagnosed, the frequency is usually once a month, so that the chest image data of the patient within a certain time range can be obtained, and after the chest image data is screened, preprocessed and sorted, a time sequence image data set of the patient can be formed. Pulmonary nodule patients need to be screened prior to collecting their cases, with the following criteria: 1) The lung nodule property of the patient is determined to be malignant or benign through pathological examination and diagnosis; 2) There are at least month 1, 2, 3 and 4 chest exam images after the lung nodules are diagnosed. After the above conditions are met, the case can be pretreated, and the method comprises the following steps:

(1.1) converting the image information of the back-visit and re-visit chests of 1 to 4 months after the patient has diagnosed the pulmonary nodules into images in a jpg format. The revisit and review chest image generally comprises a series of thick-layer images, thin-layer images, coronary images, sagittal images, arterial images, venous images and the like, the collected original image of the revisit and review chest image is generally stored in a dicom format, the dicom format is a standard format of medical images, the raw image contains image pixels, and some additional auxiliary information such as image types, image time and the like, and the image stored in a case base for training a neural network recognition model is in a jpg format, so that the image in the dicom format needs to be converted into the image in the jpg format, and the pydicom toolkit of python can be used for converting the dicom images in batches.

(1.2) normalizing the converted image obtained in the step (1.1). Since the collected revisit breast image is generally collected by different devices, there will be a difference in pixel intensity, which may affect the training effect of the neural network recognition model, and it is necessary to perform a normalization process on the image pixels by using a Z-Score normalization method, where the normalization formula is as follows:

in the above formula, x is the original pixel, μ is the mean of all pixels in the image, σ is the standard deviation of all pixels in the image, and x^* Is a normalized pixel value.

And (1.3) cutting out the lung nodule focus body part image from the image obtained after standardization in the step (1.2), and adjusting to obtain a lung nodule focus body part unified map. If the whole image is input into the neural network recognition model for training, irrelevant information outside the pulmonary nodule focus can be learned, so that an interested region of the image needs to be extracted, the process needs to be finished under the guidance of a professional physician, and the specific method comprises the steps of firstly finding out the part of the pulmonary nodule focus body in the image, then completely surrounding the part by using a rectangular frame, and then cutting the rectangular frame. Because the lung nodule parts in different slices are different in size, and the neural network recognition model requires the input images to be consistent in size, the cut lung nodule lesion body part images are uniformly adjusted to be 64 multiplied by 64 in size by using a bilinear interpolation method. Then, each image is subjected to operations such as horizontal turning, rotation, gaussian blur, elastic distortion and the like, so that the purpose of expanding the case library is achieved.

Marking the identification preprocessing data, which comprises the following specific steps: 1) Respectively taking an image from the revisit and review chest image of each period, wherein the images can reflect the change rule of the lung nodule to a certain extent, so that a group of time sequence images can be formed; 2) The number of image slices in each period is not necessarily equal, and the corresponding position of the time sequence image is empty by taking the period with the largest number of images as a reference and the number of images in other periods is missing; 3) And (3) performing classification labeling on the group of images, wherein 0 represents malignant and 1 represents benign, so as to obtain a group of identification preprocessing data.

According to the above rules, the format of the obtained set of identification preprocessing data is (img 1, img2, img3, img4, label), wherein img1 represents the 1 st month lung nodule image; img2 represents month 2 lung nodule images; img3 represents month 3 lung nodule images; img4 represents month 4 lung nodule images; label represents the corresponding label, values of 0 and 1,0 represent that the property of the pulmonary nodule is malignant, and 1 represents that the property of the pulmonary nodule is benign. If there is image missing in a certain period, the corresponding data format is (img 1, null, img3, img4, label), where null indicates missing lung nodule images in the 2 nd month.

(2) And storing the obtained recognition preprocessing data into a case library, and training a neural network recognition model by using a training unit.

The method comprises the following steps:

and (2.1) storing the obtained identification preprocessing data into a case library.

(2.2) randomly dividing the identification preprocessing data in the case library into two groups, namely a training set and a test set, wherein the identification preprocessing data proportion of patients in the training set and the test set is 8: and 2, the training set is used for learning and training the model, and the testing set is used for evaluating and screening the qualified model.

(2.3) setting a neural network recognition model, initializing all convolutional layer weight parameters in the neural network recognition model by using pre-trained ResNet model parameters of a large data set ImageNet, setting the trackable parameters of the convolutional layers as false, namely fixed convolutional layer parameters, wherein the weighted parameters of all convolutional layers in the neural network recognition model are initialized by using Gaussian random numbers with the mean value of 0 and the variance of 1, and ignoring the parameters when updating the model parameters by using a gradient descent algorithm.

The set neural network recognition model consists of a CNN-based feature extractor group and an RNN module, wherein the CNN-based feature extractor group comprises 4 feature extractors which are arranged in parallel, each feature extractor comprises a convolution-pooling layer and a full-connection layer, the output end of the convolution-pooling layer is connected with the input end of the full-connection layer, and the output end of the full-connection layer outputs 2048-dimensional feature vectors. The feature extractor further comprises an input discriminant layer, an output end of the input discriminant layer and an input end of the convolution-pooling layer. The construction process is as follows:

(2.3.1) for the input image of each period, using a CNN-based feature extractor group to extract the features of the image, and using a partial network structure of ResNet as a backbone network, and improving the structure on the basis of the backbone network, specifically: 1) Removing all full connection layers of ResNet, and only using a network structure of a convolution-pooling part of ResNet, because most weight parameters of ResNet are concentrated in the full connection layers, the method can greatly reduce the parameter number of the model, thereby reducing the training difficulty; 2) Flattening the neuron at the last layer of the trunk network through a Flatten layer, then connecting the neuron with a full connection layer of 2048 neurons, and outputting 2048-dimensional feature vectors through the full connection layer, wherein the feature vectors represent feature information extracted by a feature extractor to a pulmonary nodule image; 3) Setting an input judgment layer, wherein the input judgment layer is positioned at the initial position of the feature extractor and is used for processing the condition of missing image data in a certain period; if the result is null, the convolution-pooling layer and the full link layer are masked, and null (null) is directly output from the decision layer, and the structure of the feature extractor can be obtained as shown in fig. 3.

(2.3.2) extracting the lung nodule image features of 4 periods by using 4 feature extractors arranged in parallel to obtain 4 2048-dimensional feature vectors, wherein the feature vectors are connected on a time axis, and a group of time sequence feature vectors can be formed and are spliced into a two-dimensional tensor in the format of ((2), (3) and (2)) through a concatelate layer₁ ,2,...,₂₀₄₈ )₁ ,(x₁ ,2,...,₂₀₄₈ )₂ ,(x₁ ,2,...,₂₀₄₈ )₃ ,(x₁ ,2,...,₂₀₄₈ )₄ ) If an input image is missing in a certain period, an all-zero vector is spliced at the corresponding position of the period, such as (, (e.g.)₁ ,2,...,₂₀₄₈ )₁ ,(x₁ ,2,...,₂₀₄₈ )₂ ,(0,0,...,0)₃ ,(x₁ ,2,...,₂₀₄₈ )₄ ) Pulmonary nodule patterns of the third month of deletionLike this.

(2.3.3) regarding the two-dimensional tensor obtained in the step (2.3.2), the two-dimensional tensor can be regarded as a time sequence with the length of 4, so that a 4-time-step RNN module is constructed to receive the time sequence, the LSTM network is used as a main body part of the RNN module, and the construction process of the RNN module is as follows: 1) Constructing a Masking layer at the initial position of a module, setting a parameter mask _ value to be 0, wherein the layer is used for processing the condition that the input is a variable-length sequence, and the specific method is that when the input of a certain time step is detected to be all 0, the input of the position is shielded, and the neural network parameter at the position is ignored in the subsequent calculation; 2) Adding an LSTM logic unit with 4 time steps behind a Masking layer, wherein each time step correspondingly processes a feature vector of a period, an LSTM network comprises 3 middle hidden layers, each hidden layer comprises 512 neurons, the hidden layers and the upper layer are all connected, and the function of the LSTM logic unit is further to extract features; 3) Constructing a full connection layer of 256 neurons, connecting the full connection layer with the output of the last time step of the LSTM network, and converging the abstract information output by the LSTM network; 4) Constructing a 2-neuron Softmax layer, connecting with the full-connection layer in the step 3), wherein the layer is the last layer of the whole neural network recognition model and is used for outputting a final recognition result.

In summary, the main architecture of the neural network recognition model is that 4 feature extractors are connected in parallel with 1 RNN module, input information streams are transmitted in the 4 feature extractors at the same time, and then are respectively transmitted to 4 time step inputs of the RNN module, and output information streams are obtained at the last time step of the RNN module for output.

The training mode is transfer learning, the transfer learning refers to transferring the knowledge or pattern learned in the source domain to different but related target domains, and the invention realizes the transfer learning by using a fine-tuning mode.

(2.4) setting the initial learning rate of training to 10^-4 And extracting the number of the identification preprocessing data from the training set, wherein the iteration number is 100000epoch.

wherein n is the number of types of the recognition result, the value of n is 2, and the prediction result is only two types (benign and malignant); z is a radical of_i Predicting a probability for each category for the model; z is a radical of formula_y The true value of the preprocessed data, i.e., the true value of the sample, is identified for each.

And extracting the quantity value of the identification preprocessing data from the case library to define the quantity value as batch _ size, taking batch _ size identification preprocessing data from the case library as training samples to be input into a neural network identification model, respectively transmitting 4 input images of each sample into 4 feature extractors of the model for forward propagation, and finally outputting an identification result from the last time step of the RNN module.

And (2.6) reversely transmitting the Loss value to obtain the updating quantity of each network parameter, updating other parameters except the convolutional layer by using a gradient descent algorithm, and adding 1 to the iteration number.

Further, the method also comprises the following steps:

(2.8) extracting and identifying the neural network identification model obtained in the identification preprocessing data importing step (2.7) from the test set, identifying, and comparing the identification result with the real result of the patient to obtain four values of TP, FP, FN and TN, wherein TP represents the number of positive examples predicted to be positive, FP represents the number of negative examples predicted to be positive, FN represents the number of positive examples predicted to be negative, and TN represents the number of negative examples predicted to be negative.

(2.9) drawing an ROC curve of the classification result through four values of TP, FP, FN and TN, wherein the abscissa of the curve is a false positive rate FPR, the ordinate of the curve is a true positive rate TPR, and the calculation modes of the false positive rate FPR and the true positive rate TPR are shown as the following formula:

wherein, FPR represents the ratio of all real situations which are predicted to be positive but real situations are negative, TPR represents the ratio of all real situations which are predicted to be positive and real situations are positive, and the ratio of all real situations is positive example;

and (2.10) drawing an ROC Curve through continuous threshold values of TPR and FPR between 0 and 1, and calculating Area size AUC (Area Under dark) Under the ROC Curve, wherein the Area size AUC is used for measuring the classification performance of the model. Setting a threshold value between 0 and 1, if the model prediction result is greater than the threshold value, predicting the sample as a positive example, otherwise, predicting the sample as a negative example, calculating values of TPR and FPR through the formula in the step (2.8) to obtain a point on the ROC curve, and setting a series of continuous threshold values between 0 and 1 to draw the ROC curve.

And (2.11) repeating the steps (2.2) to (2.10) to obtain the ROC curve areas of a plurality of neural network identification models, selecting the neural network identification model with the largest ROC curve area, and storing the neural network identification model in the identification model unit. And selecting the neural network identification model with the maximum ROC curve area as a well-behaved model.

(3) And calling the revisit and rechecking chest image information of the patient to be predicted in 1 to 4 months after the lung nodule is diagnosed, and preprocessing to obtain preprocessing data to be predicted and identified.

The method comprises the following steps:

(3.1) converting the back-visit and back-check chest image information of the patient to be predicted in 1 to 4 months after the lung nodule is diagnosed into an image in a jpg format;

The above description is only exemplary of the invention, and any modification, equivalent replacement, and improvement made within the spirit and scope of the present invention should be considered within the scope of the present invention.

Claims

1. A pulmonary nodule identification system based on time series images, comprising:

the data preprocessing module is used for acquiring the revisit and revisit chest image information of the patient from 1 st month to 4 th month after the lung nodule is confirmed through an interactive mode, and preprocessing the revisit and revisit chest image information to obtain identification preprocessing data;

the data preprocessing module is connected with the identification model module;

the identification model module comprises:

the recognition model unit is used for storing the trained neural network recognition model or outputting a result after analyzing and recognizing the input recognition preprocessing data of the patient;

the training unit is used for training the recognition preprocessing data in the case base into a neural network recognition model in the recognition model unit;

the recognition model unit is respectively connected with the case library and the training unit;

the identification model unit comprises:

2. The system of claim 1, wherein the data preprocessing module comprises:

3. The pulmonary nodule identification system based on time series images as claimed in claim 1, wherein the CNN-based feature extractor group comprises 4 feature extractors arranged in parallel, the feature extractors comprise a convolution-pooling layer and a full-connected layer, an output end of the convolution-pooling layer is connected with an input end of the full-connected layer, and an output end of the full-connected layer outputs 2048-dimensional feature vectors.

4. The time-series image-based lung nodule identification system of claim 3, wherein the feature extractor further comprises an input discriminant layer, and an output of the input discriminant layer is connected to an input of the convolution-pooling layer and an input of the configure module, respectively.

5. The method for lung nodule identification based on time series images as claimed in claim 1, characterized in that, the method comprises the following steps:

(1) Obtaining the revisit and recheck chest image information of the patient from 1 st to 4 th months after the lung nodule is diagnosed, and preprocessing the revisit and recheck chest image information to obtain recognition preprocessing data;

(3) Calling and preprocessing the revisit and rechecking chest image information of the patient to be predicted in 1 to 4 months after the lung nodule is diagnosed, and obtaining preprocessing data to be predicted and identified;

6. The method for lung nodule identification based on time series image according to claim 5, wherein the step (1) comprises the following steps:

and (1.4) combining the unified picture of the lung nodule focus body part obtained in the step (1.3) and the real result of the patient into identification preprocessing data.

7. The method for lung nodule identification based on time series image according to claim 5, wherein the step (2) comprises the following steps:

(2.1) storing the obtained identification preprocessing data into a case base;

(2.5) introducing the extracted recognition preprocessing data into a neural network recognition model to obtain a recognition result, using a cross entropy Loss function as a trained optimization objective function, and calculating a Loss value of the network output and a true value, wherein the trained optimization objective function formula is as follows:

wherein n is the type number of the recognition result; z is a radical of formula_i Predicting a probability for each category for the model; z is a radical of_y Identifying for each preprocessed data an actual value;

8. The method as claimed in claim 7, wherein the step (2.7) is further followed by:

step (2.8) of extracting identification preprocessing data from the test set, importing the identification preprocessing data into the neural network identification model obtained in the step (2.7) for identification, and comparing an identification result with a real result of a patient to obtain four values of TP, FP, FN and TN, wherein TP represents the number of positive examples predicted to be positive, FP represents the number of negative examples predicted to be positive, FN represents the number of negative examples predicted to be negative, and TN represents the number of negative examples predicted to be negative;

and (2.9) drawing an ROC curve of the classification result through four values of TP, FP, FN and TN, wherein the abscissa of the curve is false positive rate FPR, the ordinate of the curve is true positive rate TPR, and the calculation modes of the false positive rate FPR and the true positive rate TPR are shown as the following formula:

and (2.11) repeating the steps (2.2) - (2.10) to obtain the ROC curve areas of a plurality of neural network identification models, selecting the neural network identification model with the largest ROC curve area, and storing the neural network identification model with the largest ROC curve area into the identification model unit.

9. The method for lung nodule identification based on time series image according to claim 5, wherein the step (3) comprises the following steps:

and (3.4) combining the unified picture of the lung nodule focus body part obtained in the step (3.3) and the real result of the patient into to-be-predicted identification preprocessing data.