Movatterモバイル変換

Part of the book series:Lecture Notes in Computer Science ((LNIP,volume 11768))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

8990Accesses
44Citations

Abstract

Datasets for training gastric cancer detection models are usually imbalanced, because the number of available images showing lesions is limited. This imbalance can be a serious obstacle to realizing a high-performance automatic gastric cancer detection system. In this paper, we propose a method that lessens this dataset bias by generating new images using a generative model. The generative model synthesizes an image from two images in a dataset. The synthesis network can produce realistic images, even if the dataset of lesion images is small. In our experiment, we trained gastric cancer detection models using the synthesized images. The results show that the performance of the system was improved.

You have full access to this open access chapter, Download conference paper PDF

Application of artificial intelligence using a convolutional neural network for detecting gastric cancer in endoscopic images

Article15 January 2018

Identification of gastric cancer with convolutional neural networks: a systematic review

Article18 February 2022

Convolutional neural network for the diagnosis of early gastric cancer based on magnifying narrow band imaging

ArticleOpen access22 July 2019

Keywords

1Introduction

The performance of computer vision systems has been dramatically improved because of the recent development of deep learning techniques. The automatic detection of gastric cancer in endoscopic images is one of the most important applications of these techniques. This detection task consists of detecting a cancerous tumor, regardless of its size. The automatic detection in images of the cancer’s location is expected to decrease the diagnostic burden on doctors.

However, datasets for training the detection models are usually imbalanced, because the number of images showing lesions is limited. This is because the number of patients who have lesions is small and the cost of annotation for indicating the location of lesions in the images is high. This dataset imbalance can be a serious obstacle to realizing a high-performance automatic gastric cancer detection system.

In this paper, we propose a method that lessens the bias by using generative models and thus improves the performance of gastric cancer detection models, even when the dataset includes bias.

2Related Work

The detection of objects in general images has been widely explored in recent years [12,13]. Research has also been conducted on object detection in medical images. For example, in [6] the detection of tumors from endoscopic images using the Singe Shot Detector (SSD) was described [9]. In [10], a system for glomerulus detection in light microscopic images using a faster region-based convolutional neural network (Faster R-CNN) was presented [13]. The task of detecting anomalies in images is similar to object detection tasks. In this task, an entire image is divided into regions using a grid, and the model recognizes whether the region contains anomalies. This method is advantageous in situations where the target image has stepwise anomalies. Xiao et al., for example, proposed a high performance unsupervised lesion detection system that uses a spatio-temporal pyramid that utilizes not only local but also global features [16,17]. Studies have also been conducted on detecting anomalies, i.e., lesions, in medical images. As for general images, methods that improve the performance of the system for medical images by utilizing global context information have been proposed [5,8,14]. Hayakawa et al., for example, detected lesions in endoscopic images by extracting multi-scale features using two types of convolutional neural network (CNN) [5].

A main problem that arises in the case of medical images in particular is that the number of available images is limited. To address this problem, methods for expanding datasets using generative adversarial networks (GANs) [4] have been widely explored [1,2,3,18]. In the study in [3], for example, computed tomography (CT) images of livers were generated by using deep convolutional GANs (DCGANs) [11]. The study’s results showed that the image classification performance was improved by using the generated images. In [1], a network that can generate high resolution images using a limited dataset was proposed, and experiments on skin images were described. On the other hand, in the context of general image synthesis, GANs which use both local and global information are proposed in [15] and [19]. In [15], the authors proposed the method to synthesize images guided by sketch, color, and texture. In [19], the authors dealt with the task of generating photographic images which were conditioned on image description expressed in natural language. Both methods require relatively large datasets compared with medical image settings.

As described above, in research studies GANs have been applied to augment datasets for image classification tasks involving medical images. These studies, however, were focused on generating entire images, and thus did not consider the lesion detection task. Therefore, images generated using these methods cannot be used for supervised lesion detection tasks, because they do not indicate the location of lesions. The studies were, furthermore, focused on reproducing the distribution in the original dataset, which cannot lessen the dataset bias mentioned in the previous section. When these generated data are used to train lesion detection models, the models detect only bright regions. In this research study, we focused on the gastric lesion detection task in endoscopic images. We propose a data augmentation method that improves the performance of the models by using generative models to produce additional images and thus lessen the dataset bias.

3Method

Figure 1 shows an overview of the gastric cancer image synthesizing system using GAN. The system consists of three networks: a synthesizer and global and local discriminators. A normal image, i.e., an image with no lesions, and an image showing a lesion are input into the synthesizer, which outputs a new image in which the two input images are synthesized smoothly. The global discriminator determines whether the synthesized image is consistent. The local discriminator, however, has two roles. The first is to determine whether the lesion part in the generated image is realistic, and the second is to determine whether the lesion part and the normal part are connected smoothly. When designing this architecture, we used the architecture proposed in [7] as a reference.

The input into the synthesizer is a normal image and a padded patch showing a lesion. To obtain a padded lesion patch, first the patch is cropped from the part of the image that shows the lesion and then it is zero-padded such that it is the same size as the normal image (Fig. 2). The position of the lesion patch relative to the normal image is represented by changing the position of the zero padding. The position of the padding is determined randomly. In other words, both$X_{n}$ and$X_{l}$, where$X_{n}$ is a normal image and$X_{l}$ is a padded lesion patch, are three-dimensional tensors, which have the same shape. The normal image$X_{n}$ is resized to the prescribed size in advance.

When the normal image$X_{n}$ and the padded lesion patch$X_{l}$ have been concatenated in the channel axis, the image is input into the synthesizer. The output of the synthesizer is a new image, in which the two images are synthesized smoothly. At this point, the position of the lesion in the synthesized image corresponds to the position of the input lesion patch ($x_{pad}, y_{pad}$).

The input into the global discriminator, however, is a lesion image that is either taken from the dataset or synthesized. Each image in the dataset is resized to a prescribed size in advance. The output of the global discriminator is a scalar within [0, 1], which indicates the probability that the input image belongs to the dataset. The local discriminator receives as input either a lesion patch from the dataset or the lesion part of the synthesized image. Here, the synthesized image is cropped from a region slightly larger than the padded region of the input image. The purpose is to ensure that lesion patch is synthesized to the same location as the padded position in the input lesion image, and furthermore, that the boundary of the normal part and the lesion part is smooth. The output, like that of the global discriminator, is a scalar within [0, 1].

Two types of loss functions are used to optimize the three networks: reconstruction and adversarial loss. The main importance of these loss functions is that the reconstruction loss function ensures the synthesized image is successfully reconstructed using the original input images and the adversarial loss function allows the boundary region between a normal image and a lesion patch to be generated flexibly. These two loss functions realize smoothly synthesized images, even when the size of the lesion image dataset is small.

The reconstruction loss is represented as

$$\begin{aligned} L_{rc}^{global} = \mathrm{Mean}(\{X_{n} - G(X_{n}, X_{l})\}^ 2 \odot F_1) \end{aligned}$$

(1)

$$\begin{aligned} L_{rc}^{local} = \mathrm{Mean}(\{X_{l} - G(X_{n}, X_{l})\} ^ 2 \odot F_2) \end{aligned}$$

(2)

$$\begin{aligned} L_{rc} = L_{rc}^{global} + \alpha L_{rc}^{local} \end{aligned}$$

(3)

where$L_{rc}^{global}$ and$L_{rc}^{local}$ are the reconstruction losses in the global and local discriminator, respectively,$L_{rc}$ is the final reconstruction loss,$X_n$ and$X_a$ are the images from the normal and the lesion image dataset, respectively, andG() is the output of the synthesizer. The squared values are the Hadamard product.$\mathrm{Mean()}$ is the function for calculating the mean of all the elements in the tensor.$\alpha $ is a hyperparameter for adjusting the weight of both the local and global reconstruction loss.$F_1$ and$F_2$ are tensors for weighting, the shape of which is the same as that of$X_{n}$. The adoption of a smoothing function such as the two-dimensional Gauss function as$F_1$ and$F_2$ makes the boundary between the normal and the lesion part become smooth.

Adversarial loss is related to the classification of the discriminator. The loss of the generator and the discriminator are respectively defined as

$$\begin{aligned} L_{adv}^{gen} = \mathrm{Softplus}(-P^{global}_{fake}) + \mathrm{Softplus}(-P^{local}_{fake}) \end{aligned}$$

(4)

$$\begin{aligned} \begin{aligned} L_{adv}^{dis}&= \mathrm{Softplus}(-P^{global}_{real}) + \mathrm{Softplus}(P^{global}_{fake})\\&\quad +\beta (\mathrm{Softplus}(-P^{local}_{real}) + \mathrm{Softplus}(P^{local}_{fake})) \end{aligned} \end{aligned}$$

(5)

where$L_{adv}^{gen}$ and$L_{adv}^{dis}$ are the adversarial losses of the generator and discriminator, respectively,$\beta $ is a hyperparameter for adjusting the weight between the global and local adversarial loss, and$\mathrm Softplus$ is a standard softplus function.$P^{global}_{real}$,$P^{global}_{fake}$,$P^{local}_{real}$, and$P^{local}_{fake}$ are respectively defined as

$$\begin{aligned} P^{global}_{real} = D_{global}(X'_*) \end{aligned}$$

(6)

$$\begin{aligned} P^{global}_{fake} = D_{global}(G(X_{n}, X_{l})) \end{aligned}$$

(7)

$$\begin{aligned} P^{local}_{real} = D_{local}(\lfloor X'_{a}\rfloor ) \end{aligned}$$

(8)

$$\begin{aligned} P^{local}_{fake} = D_{local}(\lfloor G(X_{n}, X_{l})\rfloor ) \end{aligned}$$

(9)

where$D_{global}()$ and$D_{local}()$ are the output of the global and local discriminator, respectively.$\lfloor $ and$\rfloor $ denote the cropping lesion part. BothX and$X'$ are images in the training data set, which in general are different.$X_*$ indicates whether the image is from the normal or the lesion image dataset.

Based on the reconstruction and adversarial loss above, the synthesizer minimizes$L_{rc} + \gamma L_{adv}^{gen}$, and both discriminators minimize$L_{adv}^{dis}$, where$\gamma $ is a hyperparameter for adjusting the weight between the reconstruction and adversarial loss. These optimizations are conducted simultaneously. After generating images, we replace the normal part with the original input image stepwise. The weight of the stepwise replacement is$F_1$ in Formula 1. This is effective, because the newly generated part is mainly around the lesion patch, and the original normal image can be reused in a part at a distance from the lesion part.

4Experiments

Condition. In this study, we used our original endoscopic image dataset, which was extracted from an electric medical record system. Each image was annotated by the patient’s attending doctor and the images showing lesions have bounding boxes on the lesion parts. This dataset contains 129,692 normal and 1,315 lesion images. The numbers of lesions by type are 1,309 tumors and 6 ulcers. The average height of the dataset is 458, and the average width is 405.

First, we divided the dataset as follows. The normal images were divided into 129,518 training images, 44 validation images, and 130 test images, and the lesion images were divided into 1,142 training images, 45 validation images, and 128 test images. Note that an individual patient’s image was assigned to a unique set (training, validation, and test).

We conducted two experiments. In the first experiment, we visualized and compared the images synthesized by our method with images generated by DCGANs [11] to ensure that our method can generate clear images when the number of lesion images is small. The optimizer used was Adam ($\alpha = 0.0002, \beta = 0.5$), and the ratio of weight decay was 0.0001. The size of a minibatch was 64, and the number of training iterations was 150,000. The values of the hyperparameters alpha, beta, and gamma were 7.0, 1.0, and 0.002, respectively. For the DCGANs, the optimizer and the ratio of weight decay was the same as above. The model was trained from scratch. The size of a minibatch was 16 and the number of training iterations was 80,000.

In the second experiment, we trained the gastric cancer detection model using the synthesized lesion images and compared the performance with that when only lesion images in the dataset were used for training. As the gastric cancer detection model, the model proposed by [5] was used. For this detection model, the optimizer used was MomentumSGD (momentum: 0.9) and the learning rate was 0.01. The ratio of weight decay was 0.0005. The model was trained from scratch. The size of a minibatch was 64, and the number of training iterations was 15,000. For data augmentation, we applied the flipping, rotation by 90 degrees, grayscale, and channel shuffle techniques. In the training phase, we cropped the classification target in normal images randomly. In the case of the lesion images, however, we cropped randomly from the entire image with 50% probability and from inside the annotated bounding box with 50% probability. When determining whether the cropped part contained lesions, we considered the part to contain a lesion when the intersection-over-union (IoU) value between the cropped part and the annotated bounding box was greater than 0.4. In order to lessen the imbalance between the number of images with and without lesions, we applied oversampling to lesion images. In other words, we adjusted the parameterk to ensure that

$$\begin{aligned} N_a \times k + N_g \simeq N_n \end{aligned}$$

(10)

wherek is the oversampling ratio and$N_a$,$N_g$, and$N_n$ are the numbers of lesion images, synthesized images, and normal images, respectively. In this experiment, we applied multiple numbers of synthesized images.

In the second experiment, the average precision (AP) metric was applied to evaluate the performance of the trained models. First, we divided each 258 images in the test dataset into 100 ($10 \times 10$) grids. Then, the probability that the part contained lesions was calculated by the trained model for 64 ($8 \times 8$, other than the edge) regions. Finally, we calculated the AP score according to 16,512 ($258 \times 64$) predictions and annotated labels. The AP score was considered the model’s score. We assigned labels as in the training phase. In the experiment, we trained the models from four different initial values and considered the mean AP values as the final score of the model.

Table 1. Quantitative evaluation of the gastric cancer detection models. “The ratio of gen images” column shows the percentage of synthesized lesion images of all the lesion images after augmentation.

Full size table

5Results

Figure 3 shows the results of the first experiment. Because the size of the lesion image dataset was small, mode collapse occurred in the conventional method and the resolution of the image is not very high. In contrast, our method can generate clear and various images as compared with the conventional method, because our method uses the original normal and lesion images effectively.

Table 1 shows the results of the second experiment. The use of images synthesized by our proposed method improved the scores of the gastric cancer detection models. This indicates that the dataset bias was lessened, because the method allows lesion patches to be attached to various parts in normal images. When we changed the number of synthesized images input to the training dataset, we observed that the model achieved the highest AP score when 20,000 synthesized images were added, and that the performance of the model was lowered when we added a larger number of images. This indicates that the synthesized images have biases and this causes poor effects when an excessive number of synthesized images is added.

6Summary

In this study, we focused on the imbalanced data problem for gastric cancer detection systems. To lessen the bias, we proposed a method to synthesize new lesion images by using GANs. Furthermore, we showed that the performance of a gastric cancer detection model was improved when the synthesized images were added to the training dataset.

References

Baur, C., Albarqouni, S., Navab, N.: MelanoGANs: high resolution skin lesion synthesis with GANs.arXiv:1804.04338 (2018)
Beers, A., et al.: High-resolution medical image synthesis using progressively grown generative adversarial networks.arXiv:1805.03144 (2018)
Frid-Adar, M., Diamant, I., Klang, E., Amitai, M., Goldberger, J., Greenspan, H.: GAN-based synthetic medical image augmentation for increased CNN performance in liver lesion classification. Neurocomputing (2018).http://www.sciencedirect.com/science/article/pii/S0925231218310749
Goodfellow, I., et al.: Generative adversarial nets. In: NIPS (2014).http://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf
Hayakawa, A., et al.: Gastric cancer detection for gastroenterological endoscopy with local and multi-scale global information. In: CARS (2019)
Google Scholar
Hirasawa, T., et al.: Application of artificial intelligence using a convolutional neural network for detecting gastric cancer in endoscopic images. Gastric Cancer21, 653–660 (2018)
Article Google Scholar
Iizuka, S., Simo-Serra, E., Ishikawa, H.: Globally and locally consistent image completion. ACM Trans. Graph.36, 107 (2017)
Article Google Scholar
Kawahara, J., Hamarneh, G.: Multi-resolution-tract CNN with hybrid pretrained and skin-lesion trained layers. In: MICCAI (2016)
Google Scholar
Liu, W., et al.: SSD: single shot MultiBox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016).https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Lo, Y.C., et al.: Glomerulus detection on light microscopic images of renal pathology with the faster R-CNN. In: Cheng, L., Leung, A.C.S., Ozawa, S. (eds.) Neural Information Processing (2018)
Chapter Google Scholar
Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. In: ICLR (2016)
Google Scholar
Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement.arXiv:1804.02767 (2018)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS (2015)
Google Scholar
Shen, W., Zhou, M., Yang, F., Yang, C., Tian, J.: Multi-scale convolutional neural networks for lung nodule classification. In: IPMI (2015)
Google Scholar
Xian, W., et al.: TextureGAN: controlling deep image synthesis with texture patches. In: CVPR (2018)
Google Scholar
Xiao, T., Zhang, C., Zha, H.: Learning to detect anomalies in surveillance video. IEEE Signal Process. Lett.22, 1477–1481 (2015)
Article Google Scholar
Xiao T., Zhang C., Z.H.W.F.: Factorization and spatio-temporal pyramid. In: ACCV (2014)
Google Scholar
Yi, X., Walia, E., Babyn, P.: Generative adversarial network in medical imaging: a review. Med. Syst. (2018)
Google Scholar
Zhang, Z., Xie, Y., Yang, L.: Photographic text-to-image synthesis with a hierarchically-nested adversarial network. In: CVPR (2018)
Google Scholar

Download references

Acknowledgements

This work was supported by a Grant for ICT infrastructure establishment and implementation of artificial intelligence for clinical and medical research from the Japan Agency of Medical Research and Development AMED (JP18lk1010028).

Author information

Authors and Affiliations

Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan
Teppei Kanayama, Yusuke Kurose & Tatsuya Harada
Kyoto Second Red Cross Hospital, Kyoto, Japan
Kiyohito Tanaka
Research Center for Medical Bigdata, National Institute of Informatics, Tokyo, Japan
Kento Aida, Shin’ichi Satoh & Tatsuya Harada
Institute of Industrial Science, The University of Tokyo, Tokyo, Japan
Masaru Kitsuregawa
National Institute of Informatics, Tokyo, Japan
Masaru Kitsuregawa

Authors

Teppei Kanayama
View author publications
Search author on:PubMed Google Scholar
Yusuke Kurose
View author publications
Search author on:PubMed Google Scholar
Kiyohito Tanaka
View author publications
Search author on:PubMed Google Scholar
Kento Aida
View author publications
Search author on:PubMed Google Scholar
Shin’ichi Satoh
View author publications
Search author on:PubMed Google Scholar
Masaru Kitsuregawa
View author publications
Search author on:PubMed Google Scholar
Tatsuya Harada
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence toTeppei Kanayama.

Editor information

Editors and Affiliations

University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Dinggang Shen
University of Georgia, Athens, GA, USA
Tianming Liu
Western University, London, ON, Canada
Terry M. Peters
Yale University, New Haven, CT, USA
Lawrence H. Staib
University of Strasbourg, Illkirch, France
Caroline Essert
United Imaging Intelligence, Shanghai, China
Sean Zhou
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Pew-Thian Yap
Western University, London, ON, Canada
Ali Khan

1Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 403 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kanayama, T.et al. (2019). Gastric Cancer Detection from Endoscopic Images Using Synthesis by GAN. In: Shen, D.,et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2019. MICCAI 2019. Lecture Notes in Computer Science(), vol 11768. Springer, Cham. https://doi.org/10.1007/978-3-030-32254-0_59

Download citation

DOI:https://doi.org/10.1007/978-3-030-32254-0_59
Published:10 October 2019
Publisher Name:Springer, Cham
Print ISBN:978-3-030-32253-3
Online ISBN:978-3-030-32254-0
eBook Packages:Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)