CN107832801B

Movatterモバイル変換

Info

Publication number: CN107832801B
Application number: CN201711180099.8A
Authority: CN
Inventors: 王岩; 宋建锋; 秦鑫龙; 蒋均; 苗启广; 李东升
Original assignee: Xidian University; Urit Medical Electronic Co Ltd
Current assignee: Xidian University; Urit Medical Electronic Co Ltd
Priority date: 2017-11-23
Filing date: 2017-11-23
Publication date: 2021-03-05
Anticipated expiration: 2037-11-23
Also published as: CN107832801A

Abstract

Translated fromChinese

本发明涉及一种细胞图像分类用模型构建方法。针对现有尿沉渣识别技术在真实使用场景中的局限性，尤其是多种类时准确率不高、识别速度不足以应对实际生产需要等方面的问题提出了基于多模型混合识别的尿液有形成分细胞高效分类算法。在本发明中，基于多模型混合识别的尿液有形成分细胞高效分类算法及软件实现的内容包括了四部分：数据预处理、获得训练模型、阈值控制判断后获得识别结果、图片识别与加速。The invention relates to a model construction method for cell image classification. Aiming at the limitations of the existing urine sediment recognition technology in real use scenarios, especially the low accuracy rate when there are multiple types, the recognition speed is not enough to meet the actual production needs, etc. Cell-by-cell efficient classification algorithm. In the present invention, the high-efficiency classification algorithm and software of urine formed component cells based on multi-model hybrid recognition includes four parts: data preprocessing, obtaining training model, obtaining recognition results after threshold control and judgment, image recognition and acceleration .

Description

Model construction method for cell image classification

Technical Field

The invention belongs to the field of deep learning application research, relates to a multi-model hybrid recognition algorithm and matched recognition software of a deep convolutional neural network, can be applied to urine detection instruments of different models and can meet real-time requirements.

Background

Urine examination is a clinical examination means with wide application because of its simplicity, rapidness and easy availability of specimens, and is one of the current clinical routine examination items in hospitals.

The type and morphology of urine formed component cells can reflect, to some extent, the substantive changes in kidney function and the objective representation of some accumulated lesions. In a long time, people all rely on a mode of shooting by a machine and then manually selecting medical images, the mode has the problems of low efficiency, high working intensity, large error difference along with the level of technicians and the like, and methods based on a support vector machine and a BP neural network are generated later, so that the method has great limitations in the aspects of accuracy, recall rate, identification category and the like.

At present, the technologies applied to clinical urine visible component detectors are mainly of two types: one uses flow cytometry and electrical impedance measurement. The working principle of the method is that cells to be detected are placed into a sample tube after being dyed by specific fluorescent dye, and enter a flowing chamber filled with sheath fluid under the pressure of gas. The cells are arranged in a single row under the constraint of the sheath fluid and are sprayed out from a nozzle of the flow chamber to form a cell column, the cell column is vertically intersected with the incident laser beam, and the cells in the liquid column are excited by the laser to generate fluorescence. The optical system in the instrument collects signals of fluorescence, cell impedance and the like, the computer system collects, stores, displays and analyzes various signals to be measured, and the other type of the instrument is a morphological detection method by using an optical lens.

The second method has been mainly studied, and the following works have been mainly made at present.

Ressom, D Wang et al, in "Ressom H, Wang D, Natarajan P." Adaptive double self-organizing maps for clustering gene expression profiles [ J ]'. Neural Networks,2003,16(5): 633-.

In Huang C L, Wang C J. 'A GA-based feature selection and parameters optimization for super vector machines [ J ]. Ext Systems with applications,2006,31(2):231 and 240', Cheng-Lung Huang et al adopts SVM to perform pattern recognition, firstly selects features by using genetic algorithm, uses the selected features for SVM input, outputs the features as classified categories, and has satisfactory recognition accuracy, but slow recognition speed and difficult practical application.

Qin Yingbo et al analyzed and compared the effect of using a support vector machine to identify and classify urine cells under two different color coordinate systems of RGB and HIS in 'Qin Yingbo, Sun Jie, Chenping' research on identification and classification of urine cell images based on a support vector machine [ J ] '. computer engineering and design, 2013,34(6): 2185-2189', analyzed and compared the effect of using a color characteristic parameter and a space characteristic parameter to comprehensively identify and classify urine cells, and proposed that the parameters of the support vector machine are optimized by using a grid search cross-validation method, so that the effect is good in identification and classification of urine cells, but the identification types are few.

As described above, although conventional technologies for identifying urine formed component cells have achieved certain results, most of them have certain limitations, and further research is required in terms of accuracy and identification efficiency particularly in cases where a large number of samples, a large number of types, and a large difference in illumination conditions exist.

Disclosure of Invention

Aiming at the defects or shortcomings of the prior art, the invention provides a model construction method for cell image classification.

The model construction method for cell image classification provided by the invention comprises the following steps:

step one, image preprocessing

(1) Dividing the plurality of images in the cell image set A into two groups, wherein the images with the width w and the height h satisfying the formula (1) form a first group of image sets, and the images with the width w and the height h satisfying the formula (2) form a second group of image sets:

(2) classifying a plurality of images in each group of image sets according to biological characteristics of cells to obtain a plurality of rough classifications, and finely classifying each rough classification to obtain a finely classified fine classification of each rough classification;

(3) respectively carrying out amplification processing on a plurality of images in the cell image set A:

the method for amplifying any image in the first group of image sets comprises the following steps: amplifying the width w of an original image to M pixels, wherein M is less than or equal to 100, then amplifying the height h of the same image by (M/w) times to obtain a preparation image of the image, then filling a blank between the original image and the preparation image by taking the center of the preparation image as a base point, and using pixel points for filling as the average value of pixel values of four corners of the original image;

the method for amplifying any image in the second group of image sets comprises the following steps: amplifying the width w of an original image to N pixels, wherein 140 is less than or equal to N is less than or equal to 526, then amplifying the height h of the same image by (N/w) times to obtain a preparation image of the image, then filling a blank between the original image and the preparation image by taking the center of the preparation image as a base point, and using pixel points for filling as the average value of pixel values of four corners of the original image;

(4) respectively training the first group of image set and the second group of image set which are amplified in the first step by using an Alexnet network to obtain an identification model file of the first group of image set and an identification model file of the second group of image set;

(5) calculating the distance between any two sub-categories of all the sub-categories of the cell image set A; obtaining two sub-categories with the nearest distance, which are respectively marked as a theta 1 sub-category and a theta 2 sub-category,

step two, constructing a cell image set B, wherein all the fine categories of the cell image set B are the same as the fine categories of the cell image set A, the number of images in the theta 1 fine category of the cell image set B is the same as the number of images in the theta 1 fine category of the cell image set A, the number of images in the theta 2 fine category of the cell image set B is the same as the number of images in the theta 2 fine category of the cell image set A, and the number of images in other fine categories of the cell image set B is one fourth of the number of images in the same fine category of the cell image set A;

step three, processing the cell image set B in the steps one (1) and (3) to obtain a third group of image sets and a fourth group of image sets; and (3) training the image set B processed in the steps (1) and (3) on the basis of the identification model file of the first group of image set and the identification model file of the second group of image set by utilizing an Alexnet network to obtain the identification model file of a third group of image set and the identification model file of a fourth group of image set, wherein the identification model file of the third group of image set and the identification model of the fourth group of image set are models for cell image classification.

Further, in the step one (4) of the present invention, the basic initial learning rate of the training parameters is 0.01, and the number of iterations is 20000.

Further, the distance between two sub-categories in step one (5) of the present invention is calculated by using equation (3):

wherein:

(x_i,y_i) Is the pixel value of the ith pixel point in one subdivision, I ═ 1,2,3_j,y_j) The pixel value of the J-th pixel point in another subdivision, J is 1,2, 3.

Further, the basic initial learning rate of the training parameters in the third step of the invention is 0.001, and the iteration times are 4000.

And identifying the images to be identified processed in the first step (1) and the second step (3) by using the identification model file of the third group image set and the identification model file of the fourth group image set.

Compared with the prior art, the invention has the following advantages:

firstly, the invention researches the real image of urine formed component cells existing in the actual detection, greatly improves the scale and cell types of a data set, greatly improves the classification accuracy after utilizing a deep learning method, and has good performance in the aspect of the actual application of the algorithm.

Secondly, the dual-model mixed identification mode adopted by the invention not only makes full use of the result of statistical data, but also takes the requirements of practical application on recall rate into consideration, divides cells into two groups, and avoids the influence of some impurities on the result to a certain extent.

Thirdly, the invention adopts a method of filling with parameters in equal proportion in the pretreatment step, thereby not only ensuring the minimum deformation of the graph when the graph is converted into the square shape, but also reducing a plurality of times of pretreatment, and having practical value on the operation efficiency.

Fourthly, the GPU is used for accelerating, cutting and optimizing the recognition model in the process of calculating the model parameters to obtain the categories, and the original cell classification algorithm is optimized, so that the algorithm realizes the real-time performance on the premise of keeping higher classification accuracy. The calculation is accelerated by 176 times compared to the original algorithm.

Drawings

Fig. 1(a) is one of the original images during the first group image set enlargement processing in step one (3) of embodiment 1, and fig. 1(b) is a diagram after the filling of fig. 1 (a).

Fig. 2(a) is one of the original images during the second group image set enlargement processing in step one (3) of embodiment 1, and fig. 2(b) is a diagram after the filling of fig. 2 (a).

Fig. 3(a) is an original to be classified in example 2, and fig. 3(b) is an enlarged view of fig. 3 (a).

Detailed Description

Urine examination is a clinical examination means with wide application because of its simplicity, rapidness and easy availability of specimens, and is one of the current clinical routine examination items in hospitals. The type and morphology of urine formed component cells can reflect, to some extent, the substantive changes in kidney function and the objective representation of some accumulated lesions. A morphological detection method by using optical lens is characterized in that the morphological of each component in urine is observed under an optical microscope after the section of the visible component in urine is dyed, the morphological detection method is characterized in that the morphological of each component in urine can be detected and accurately identified, and the morphological detection method is characterized in that the morphological detection method can detect and accurately identify each component in urine, but the detection speed is relatively slow, automation and standard are difficult to realize, and the morphological detection method is often influenced by various aspects such as focal length, focusing point, illumination, concentration and the like in actual detection. In a long time before, people process medical images in a mode of shooting by a machine and then manually selecting, the mode has the problems of low efficiency, high working intensity, large error difference along with the level of technicians and the like, and later methods based on a support vector machine and a BP neural network have great limitations in the aspects of accuracy, recall rate, identification category and the like.

The data set used by the invention is the first data acquired from each large hospital in a certain city, and the automatic urine analyzer of a certain model is used, so that the authenticity and the reliability of the sample are ensured, and particularly, the data set has research value for various positive samples.

The following are specific examples provided by the inventors to further explain the technical solutions of the present invention.

Example 1:

step one, image preprocessing

(1) Dividing a plurality of images in a cell image set A (52000) into two groups, wherein images with width w and height h satisfying formula (1) constitute a first group of image sets (28000), and images with width w and height h satisfying formula (2) constitute a second group of image sets (10000):

(2) classifying a plurality of images in each group of image sets according to biological characteristics of cells to obtain 14 rough classifications, and performing fine classification on each rough classification to obtain fine classifications of each rough classification, wherein the total number of the fine classifications is 26; the basis for the fine classification includes morphology of the cells, cell color contrast, degree of aggregation, for example, for the coarse classification of 0-red blood cells, its fine classification is: normal red blood cells, wattle-type red blood cells, and crinkle red blood cells. The specific classification results are shown in tables 1 and 2.

TABLE 1

TABLE 2

the method for amplifying any image in the first group of image sets comprises the following steps: the method comprises the steps of amplifying the width w of an original image to M pixels, then amplifying the height h of the same image by (M/w) times to obtain a preparation image of the image, then filling a blank between the original image and the preparation image by taking the center of the preparation image as a base point, wherein pixel points for filling are the average values of pixel values of four corners of the original image, and M is less than or equal to 100; as shown in fig. 1.

The method for amplifying any image in the second group of image sets comprises the following steps: magnifying the width w of an original image to N pixels, magnifying the height h of the same image by N/w times to obtain a preparation image of the image, filling a blank between the original image and the preparation image by taking the center of the preparation image as a base point, wherein pixel points used for filling are the average value of pixel values of four corners of the original image, and 140 ≦ N ≦ 526; as shown in fig. 2.

(4) Calculating the distance between any two sub-categories of all the sub-categories of the cell image set A; two fine categories with the nearest distance are obtained and are respectively marked as theta 1 fine category bud red blood cells and theta 2 fine category abnormal white blood cells.

(5) Respectively training the first group of image set and the second group of image set which are amplified in the first step by using an Alexnet network to obtain an identification model file of the first group of image set and an identification model file of the second group of image set; the basic initial learning rate of the training parameters is 0.01, and the iteration times are 20000 times;

step two, the cell image set B is processed in the step one to obtain a third group of image set and a fourth group of image set and a plurality of coarse classifications and fine classifications of each coarse classification, all the fine classifications of the cell image set B are the same as the fine classification types of the cell image set A, the number of images in the theta 1 fine classification and the number of images in the theta 2 fine classification of the cell image set B are the same as those of the cell image set A, and the number of images in other fine classifications is one fourth of the number of images in the corresponding fine classification in the cell image set A;

and step three, training the image set B processed in the step one on the basis of the identification model files of the first group of image set and the identification model files of the second group of image set by utilizing an Alexnet network to obtain the identification model files of the third group of image set and the identification model files of the fourth group of image set, wherein the basic initial learning rate of the training parameters is 0.001, and the iteration times are 4000.

Example 2:

the image to be identified (as shown in fig. 3) processed through steps one (1) and (3) of the present invention is identified by using the identification model file of the third group of image set and the identification model file of the fourth group of image set in embodiment 1, and a specific identification method can use the CLASSIFICATION example item in the CAFFE framework, and the cell shown in fig. 3 belongs to a non-squamous epithelial cell.

Claims

Translated fromChinese

1.一种细胞图像分类用模型构建方法，其特征在于，方法包括：1. a kind of model construction method for cell image classification, is characterized in that, method comprises:

步骤一、图像预处理Step 1. Image preprocessing

(1)将细胞图像集A中的多幅图像分为两组，其中宽w和高h满足式(1)的图像组成第一组图像集，宽w和高h满足式(2)的图像组成第二组图像集：(1) Divide the multiple images in the cell image set A into two groups, wherein the images whose width w and height h satisfy formula (1) form the first group of images, and the width w and height h satisfy formula (2) The images make up the second set of images:

(2)根据细胞的生物学特征对各组图像集中的多幅图像进行分类，得到多个粗分类，对各粗分类进行细分类，得到各粗分类的细分类；(2) classifying multiple images in each group of image sets according to the biological characteristics of the cells to obtain multiple coarse classifications, and subdividing the coarse classifications to obtain the fine classifications of the coarse classifications;

(3)对细胞图像集A中的多幅图像分别进行放大处理：(3) Enlarging the multiple images in the cell image set A respectively:

对于第一组图像集中的任一图像放大处理方法为：将原图像的宽w放大至M像素，M≦100,接着将同一图像的高h放大(M/w)倍得到该图像的预备图像，然后以预备图像的中心为基点，对原图像与预备图像之间的空白进行填充，填充用的像素点为原图像四个角的像素值平均值；For any image in the first set of images, the enlargement processing method is: enlarge the width w of the original image to M pixels, M≦100, and then enlarge the height h of the same image by (M/w) times to obtain the preparation of the image image, and then use the center of the preparatory image as the base point to fill the blank between the original image and the preparatory image, and the pixel points used for filling are the average value of the pixel values of the four corners of the original image;

对于第二组图像集中的任一图像放大处理方法为：将原图像的宽w放大至N像素，140≦N≦526,接着将同一图像的高h放大(N/w)倍得到该图像的预备图像，然后以预备图像的中心为基点，对原图像与预备图像之间的空白进行填充，填充用的像素点为原图像四个角的像素值平均值；For any image enlargement processing method in the second group of images, the width w of the original image is enlarged to N pixels, 140≦N≦526, and then the height h of the same image is enlarged (N/w) times to obtain the image Then take the center of the preparatory image as the base point, fill the blank between the original image and the preparatory image, and the pixel points used for filling are the average value of the pixel values of the four corners of the original image;

(4)利用Alexnet网络对步骤一放大处理后的第一组图像集和第二组图像集分别进行训练，得到第一组图像集的识别模型文件和第二组图像集的识别模型文件；(4) Use the Alexnet network to train the first group of image sets and the second group of image sets after the enlargement process in step 1, respectively, to obtain the recognition model file of the first group of image sets and the recognition of the second group of image sets. model file;

(5)计算细胞图像集A的所有细分类中任意两个细分类之间的距离；得到距离最近的两个细分类，分别记为θ1细分类和θ2细分类，(5) Calculate the distance between any two sub-categories in all sub-categories of the cell image set A; obtain the two sub-categories with the closest distance, which are respectively recorded as θ1 sub-categories and θ2 sub-categories,

步骤二，构建细胞图像集B，所述细胞图像集B的所有细分类与细胞图像集A的细分类种类相同，并且细胞图像集B的θ1细分类中的图像数量与细胞图像集A的θ1细分类的图像数量相同,细胞图像集B的θ2细分类的图像数量与细胞图像集A的θ2细分类的图像数量相同,细胞图像集B的其他细分类中的图像数量为细胞图像集A中相同细分类的图像数量的四分之一；Step 2: Construct a cell image set B. All subdivisions of the cell image set B are of the same type as the subdivisions of the cell image set A, and the number of images in the θ1 subdivision of the cell image set B is the same as the θ1 of the cell image set A. The number of images in the subdivision is the same, the number of images in the θ2 subdivision of cell image set B is the same as the number of images in the θ2 subdivision of cell image set A, and the number of images in other subdivisions of cell image set B is the same as that in cell image set A. a quarter of the number of images of the same subclass;

步骤三，对细胞图像集B进行步骤一(1)、(3)处理，得到第三组图像集和第四组图像集；利用Alexnet网络在第一组图像集的识别模型文件和第二组图像集的识别模型文件的基础上对经步骤一(1)、(3)处理后的图像集B进行训练，得到第三组图像集的识别模型文件和第四组图像集的识别模型文件，所述第三组图像集的识别模型文件和第四组图像集的识别模型为细胞图像分类用模型。Step 3: Perform steps 1 (1) and (3) on the cell image set B to obtain the third set of images and the fourth set of images; use the Alexnet network to identify the model files of the first set of images and On the basis of the recognition model files of the second set of images, the image set B processed in steps 1 (1) and (3) is trained to obtain the recognition model files of the third set of images and the fourth set of images. The recognition model files of the third group of images and the recognition models of the fourth group of images are models for cell image classification.

2.权利要求1所述细胞图像分类用模型构建方法，其特征在于，所述步骤一(4)中训练参数的基础初始学习率为0.01，迭代次数为20000次。2 . The method for constructing a model for cell image classification according to claim 1 , wherein the basic initial learning rate of the training parameters in the step 1 (4) is 0.01, and the number of iterations is 20,000. 3 .

3.权利要求1所述细胞图像分类用模型构建方法，其特征在于，所述步骤一(5)中两个细分类之间的距离采用式(3)计算:3. the described cell image classification method for building a model according to claim 1, is characterized in that, in described step one (5), the distance between two sub-classifications adopts formula (3) to calculate:

其中：in:

(x_i,y_i)为其中一个细分类中第i像素点的像素值，i＝1,2,3,......,I，(x_j,y_j)为另一个细分类中第j像素点的像素值，j＝1,2,3,......,J。(x_i , y_i ) is the pixel value of the i-th pixel in one of the sub-categories, i=1,2,3,...,I, (x_j ,y_j ) is the other sub-category The pixel value of the jth pixel, j=1, 2, 3, ..., J.

4.权利要求1所述细胞图像分类用模型构建方法，其特征在于，所述步骤三中训练参数的基础初始学习率为0.001，迭代次数为4000次。4 . The method for constructing a model for cell image classification according to claim 1 , wherein the basic initial learning rate of the training parameters in the third step is 0.001, and the number of iterations is 4000. 5 .

5.一种细胞图像分类方法，其特征在于，利用权利要求1-4中任一权利要求所述第三组图像集的识别模型文件和第四组图像集的识别模型文件对经过权利要求1所述步骤一(1)、(3)处理的待识别图像进行识别。5. A cell image classification method, characterized in that, using the recognition model file of the third group of image sets and the recognition model file of the fourth group of image sets according to any one of claims 1-4, The to-be-recognized images processed in steps 1 (1) and (3) of step 1 are required to be recognized.