- Teppei Kanayama16,
- Yusuke Kurose16,
- Kiyohito Tanaka17,
- Kento Aida18,
- Shin’ichi Satoh18,
- Masaru Kitsuregawa19,20 &
- …
- Tatsuya Harada16,18
Part of the book series:Lecture Notes in Computer Science ((LNIP,volume 11768))
Included in the following conference series:
8876Accesses
Abstract
Datasets for training gastric cancer detection models are usually imbalanced, because the number of available images showing lesions is limited. This imbalance can be a serious obstacle to realizing a high-performance automatic gastric cancer detection system. In this paper, we propose a method that lessens this dataset bias by generating new images using a generative model. The generative model synthesizes an image from two images in a dataset. The synthesis network can produce realistic images, even if the dataset of lesion images is small. In our experiment, we trained gastric cancer detection models using the synthesized images. The results show that the performance of the system was improved.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1Introduction
The performance of computer vision systems has been dramatically improved because of the recent development of deep learning techniques. The automatic detection of gastric cancer in endoscopic images is one of the most important applications of these techniques. This detection task consists of detecting a cancerous tumor, regardless of its size. The automatic detection in images of the cancer’s location is expected to decrease the diagnostic burden on doctors.
However, datasets for training the detection models are usually imbalanced, because the number of images showing lesions is limited. This is because the number of patients who have lesions is small and the cost of annotation for indicating the location of lesions in the images is high. This dataset imbalance can be a serious obstacle to realizing a high-performance automatic gastric cancer detection system.
In this paper, we propose a method that lessens the bias by using generative models and thus improves the performance of gastric cancer detection models, even when the dataset includes bias.
2Related Work
The detection of objects in general images has been widely explored in recent years [12,13]. Research has also been conducted on object detection in medical images. For example, in [6] the detection of tumors from endoscopic images using the Singe Shot Detector (SSD) was described [9]. In [10], a system for glomerulus detection in light microscopic images using a faster region-based convolutional neural network (Faster R-CNN) was presented [13]. The task of detecting anomalies in images is similar to object detection tasks. In this task, an entire image is divided into regions using a grid, and the model recognizes whether the region contains anomalies. This method is advantageous in situations where the target image has stepwise anomalies. Xiao et al., for example, proposed a high performance unsupervised lesion detection system that uses a spatio-temporal pyramid that utilizes not only local but also global features [16,17]. Studies have also been conducted on detecting anomalies, i.e., lesions, in medical images. As for general images, methods that improve the performance of the system for medical images by utilizing global context information have been proposed [5,8,14]. Hayakawa et al., for example, detected lesions in endoscopic images by extracting multi-scale features using two types of convolutional neural network (CNN) [5].
A main problem that arises in the case of medical images in particular is that the number of available images is limited. To address this problem, methods for expanding datasets using generative adversarial networks (GANs) [4] have been widely explored [1,2,3,18]. In the study in [3], for example, computed tomography (CT) images of livers were generated by using deep convolutional GANs (DCGANs) [11]. The study’s results showed that the image classification performance was improved by using the generated images. In [1], a network that can generate high resolution images using a limited dataset was proposed, and experiments on skin images were described. On the other hand, in the context of general image synthesis, GANs which use both local and global information are proposed in [15] and [19]. In [15], the authors proposed the method to synthesize images guided by sketch, color, and texture. In [19], the authors dealt with the task of generating photographic images which were conditioned on image description expressed in natural language. Both methods require relatively large datasets compared with medical image settings.
As described above, in research studies GANs have been applied to augment datasets for image classification tasks involving medical images. These studies, however, were focused on generating entire images, and thus did not consider the lesion detection task. Therefore, images generated using these methods cannot be used for supervised lesion detection tasks, because they do not indicate the location of lesions. The studies were, furthermore, focused on reproducing the distribution in the original dataset, which cannot lessen the dataset bias mentioned in the previous section. When these generated data are used to train lesion detection models, the models detect only bright regions. In this research study, we focused on the gastric lesion detection task in endoscopic images. We propose a data augmentation method that improves the performance of the models by using generative models to produce additional images and thus lessen the dataset bias.
3Method
Figure 1 shows an overview of the gastric cancer image synthesizing system using GAN. The system consists of three networks: a synthesizer and global and local discriminators. A normal image, i.e., an image with no lesions, and an image showing a lesion are input into the synthesizer, which outputs a new image in which the two input images are synthesized smoothly. The global discriminator determines whether the synthesized image is consistent. The local discriminator, however, has two roles. The first is to determine whether the lesion part in the generated image is realistic, and the second is to determine whether the lesion part and the normal part are connected smoothly. When designing this architecture, we used the architecture proposed in [7] as a reference.
The input into the synthesizer is a normal image and a padded patch showing a lesion. To obtain a padded lesion patch, first the patch is cropped from the part of the image that shows the lesion and then it is zero-padded such that it is the same size as the normal image (Fig. 2). The position of the lesion patch relative to the normal image is represented by changing the position of the zero padding. The position of the padding is determined randomly. In other words, both\(X_{n}\) and\(X_{l}\), where\(X_{n}\) is a normal image and\(X_{l}\) is a padded lesion patch, are three-dimensional tensors, which have the same shape. The normal image\(X_{n}\) is resized to the prescribed size in advance.
When the normal image\(X_{n}\) and the padded lesion patch\(X_{l}\) have been concatenated in the channel axis, the image is input into the synthesizer. The output of the synthesizer is a new image, in which the two images are synthesized smoothly. At this point, the position of the lesion in the synthesized image corresponds to the position of the input lesion patch (\(x_{pad}, y_{pad}\)).
Padding a lesion image. (a) Lesion image with bounding box; (b) lesion patch cropped from the lesion part of the image; (c) padded lesion patch.
The input into the global discriminator, however, is a lesion image that is either taken from the dataset or synthesized. Each image in the dataset is resized to a prescribed size in advance. The output of the global discriminator is a scalar within [0, 1], which indicates the probability that the input image belongs to the dataset. The local discriminator receives as input either a lesion patch from the dataset or the lesion part of the synthesized image. Here, the synthesized image is cropped from a region slightly larger than the padded region of the input image. The purpose is to ensure that lesion patch is synthesized to the same location as the padded position in the input lesion image, and furthermore, that the boundary of the normal part and the lesion part is smooth. The output, like that of the global discriminator, is a scalar within [0, 1].
Two types of loss functions are used to optimize the three networks: reconstruction and adversarial loss. The main importance of these loss functions is that the reconstruction loss function ensures the synthesized image is successfully reconstructed using the original input images and the adversarial loss function allows the boundary region between a normal image and a lesion patch to be generated flexibly. These two loss functions realize smoothly synthesized images, even when the size of the lesion image dataset is small.
The reconstruction loss is represented as
where\(L_{rc}^{global}\) and\(L_{rc}^{local}\) are the reconstruction losses in the global and local discriminator, respectively,\(L_{rc}\) is the final reconstruction loss,\(X_n\) and\(X_a\) are the images from the normal and the lesion image dataset, respectively, andG() is the output of the synthesizer. The squared values are the Hadamard product.\(\mathrm{Mean()}\) is the function for calculating the mean of all the elements in the tensor.\(\alpha \) is a hyperparameter for adjusting the weight of both the local and global reconstruction loss.\(F_1\) and\(F_2\) are tensors for weighting, the shape of which is the same as that of\(X_{n}\). The adoption of a smoothing function such as the two-dimensional Gauss function as\(F_1\) and\(F_2\) makes the boundary between the normal and the lesion part become smooth.
Adversarial loss is related to the classification of the discriminator. The loss of the generator and the discriminator are respectively defined as
where\(L_{adv}^{gen}\) and\(L_{adv}^{dis}\) are the adversarial losses of the generator and discriminator, respectively,\(\beta \) is a hyperparameter for adjusting the weight between the global and local adversarial loss, and\(\mathrm Softplus\) is a standard softplus function.\(P^{global}_{real}\),\(P^{global}_{fake}\),\(P^{local}_{real}\), and\(P^{local}_{fake}\) are respectively defined as
where\(D_{global}()\) and\(D_{local}()\) are the output of the global and local discriminator, respectively.\(\lfloor \) and\(\rfloor \) denote the cropping lesion part. BothX and\(X'\) are images in the training data set, which in general are different.\(X_*\) indicates whether the image is from the normal or the lesion image dataset.
Based on the reconstruction and adversarial loss above, the synthesizer minimizes\(L_{rc} + \gamma L_{adv}^{gen}\), and both discriminators minimize\(L_{adv}^{dis}\), where\(\gamma \) is a hyperparameter for adjusting the weight between the reconstruction and adversarial loss. These optimizations are conducted simultaneously. After generating images, we replace the normal part with the original input image stepwise. The weight of the stepwise replacement is\(F_1\) in Formula 1. This is effective, because the newly generated part is mainly around the lesion patch, and the original normal image can be reused in a part at a distance from the lesion part.
4Experiments
Condition. In this study, we used our original endoscopic image dataset, which was extracted from an electric medical record system. Each image was annotated by the patient’s attending doctor and the images showing lesions have bounding boxes on the lesion parts. This dataset contains 129,692 normal and 1,315 lesion images. The numbers of lesions by type are 1,309 tumors and 6 ulcers. The average height of the dataset is 458, and the average width is 405.
First, we divided the dataset as follows. The normal images were divided into 129,518 training images, 44 validation images, and 130 test images, and the lesion images were divided into 1,142 training images, 45 validation images, and 128 test images. Note that an individual patient’s image was assigned to a unique set (training, validation, and test).
We conducted two experiments. In the first experiment, we visualized and compared the images synthesized by our method with images generated by DCGANs [11] to ensure that our method can generate clear images when the number of lesion images is small. The optimizer used was Adam (\(\alpha = 0.0002, \beta = 0.5\)), and the ratio of weight decay was 0.0001. The size of a minibatch was 64, and the number of training iterations was 150,000. The values of the hyperparameters alpha, beta, and gamma were 7.0, 1.0, and 0.002, respectively. For the DCGANs, the optimizer and the ratio of weight decay was the same as above. The model was trained from scratch. The size of a minibatch was 16 and the number of training iterations was 80,000.
In the second experiment, we trained the gastric cancer detection model using the synthesized lesion images and compared the performance with that when only lesion images in the dataset were used for training. As the gastric cancer detection model, the model proposed by [5] was used. For this detection model, the optimizer used was MomentumSGD (momentum: 0.9) and the learning rate was 0.01. The ratio of weight decay was 0.0005. The model was trained from scratch. The size of a minibatch was 64, and the number of training iterations was 15,000. For data augmentation, we applied the flipping, rotation by 90 degrees, grayscale, and channel shuffle techniques. In the training phase, we cropped the classification target in normal images randomly. In the case of the lesion images, however, we cropped randomly from the entire image with 50% probability and from inside the annotated bounding box with 50% probability. When determining whether the cropped part contained lesions, we considered the part to contain a lesion when the intersection-over-union (IoU) value between the cropped part and the annotated bounding box was greater than 0.4. In order to lessen the imbalance between the number of images with and without lesions, we applied oversampling to lesion images. In other words, we adjusted the parameterk to ensure that
wherek is the oversampling ratio and\(N_a\),\(N_g\), and\(N_n\) are the numbers of lesion images, synthesized images, and normal images, respectively. In this experiment, we applied multiple numbers of synthesized images.
In the second experiment, the average precision (AP) metric was applied to evaluate the performance of the trained models. First, we divided each 258 images in the test dataset into 100 (\(10 \times 10\)) grids. Then, the probability that the part contained lesions was calculated by the trained model for 64 (\(8 \times 8\), other than the edge) regions. Finally, we calculated the AP score according to 16,512 (\(258 \times 64\)) predictions and annotated labels. The AP score was considered the model’s score. We assigned labels as in the training phase. In the experiment, we trained the models from four different initial values and considered the mean AP values as the final score of the model.
Left: two images synthesized by our method with bounding boxes on the synthesized lesion patch. Right: images generated by deep convolutional generative adversarial network.
5Results
Figure 3 shows the results of the first experiment. Because the size of the lesion image dataset was small, mode collapse occurred in the conventional method and the resolution of the image is not very high. In contrast, our method can generate clear and various images as compared with the conventional method, because our method uses the original normal and lesion images effectively.
Table 1 shows the results of the second experiment. The use of images synthesized by our proposed method improved the scores of the gastric cancer detection models. This indicates that the dataset bias was lessened, because the method allows lesion patches to be attached to various parts in normal images. When we changed the number of synthesized images input to the training dataset, we observed that the model achieved the highest AP score when 20,000 synthesized images were added, and that the performance of the model was lowered when we added a larger number of images. This indicates that the synthesized images have biases and this causes poor effects when an excessive number of synthesized images is added.
6Summary
In this study, we focused on the imbalanced data problem for gastric cancer detection systems. To lessen the bias, we proposed a method to synthesize new lesion images by using GANs. Furthermore, we showed that the performance of a gastric cancer detection model was improved when the synthesized images were added to the training dataset.
References
Baur, C., Albarqouni, S., Navab, N.: MelanoGANs: high resolution skin lesion synthesis with GANs.arXiv:1804.04338 (2018)
Beers, A., et al.: High-resolution medical image synthesis using progressively grown generative adversarial networks.arXiv:1805.03144 (2018)
Frid-Adar, M., Diamant, I., Klang, E., Amitai, M., Goldberger, J., Greenspan, H.: GAN-based synthetic medical image augmentation for increased CNN performance in liver lesion classification. Neurocomputing (2018).http://www.sciencedirect.com/science/article/pii/S0925231218310749
Goodfellow, I., et al.: Generative adversarial nets. In: NIPS (2014).http://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf
Hayakawa, A., et al.: Gastric cancer detection for gastroenterological endoscopy with local and multi-scale global information. In: CARS (2019)
Hirasawa, T., et al.: Application of artificial intelligence using a convolutional neural network for detecting gastric cancer in endoscopic images. Gastric Cancer21, 653–660 (2018)
Iizuka, S., Simo-Serra, E., Ishikawa, H.: Globally and locally consistent image completion. ACM Trans. Graph.36, 107 (2017)
Kawahara, J., Hamarneh, G.: Multi-resolution-tract CNN with hybrid pretrained and skin-lesion trained layers. In: MICCAI (2016)
Liu, W., et al.: SSD: single shot MultiBox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016).https://doi.org/10.1007/978-3-319-46448-0_2
Lo, Y.C., et al.: Glomerulus detection on light microscopic images of renal pathology with the faster R-CNN. In: Cheng, L., Leung, A.C.S., Ozawa, S. (eds.) Neural Information Processing (2018)
Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. In: ICLR (2016)
Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement.arXiv:1804.02767 (2018)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS (2015)
Shen, W., Zhou, M., Yang, F., Yang, C., Tian, J.: Multi-scale convolutional neural networks for lung nodule classification. In: IPMI (2015)
Xian, W., et al.: TextureGAN: controlling deep image synthesis with texture patches. In: CVPR (2018)
Xiao, T., Zhang, C., Zha, H.: Learning to detect anomalies in surveillance video. IEEE Signal Process. Lett.22, 1477–1481 (2015)
Xiao T., Zhang C., Z.H.W.F.: Factorization and spatio-temporal pyramid. In: ACCV (2014)
Yi, X., Walia, E., Babyn, P.: Generative adversarial network in medical imaging: a review. Med. Syst. (2018)
Zhang, Z., Xie, Y., Yang, L.: Photographic text-to-image synthesis with a hierarchically-nested adversarial network. In: CVPR (2018)
Acknowledgements
This work was supported by a Grant for ICT infrastructure establishment and implementation of artificial intelligence for clinical and medical research from the Japan Agency of Medical Research and Development AMED (JP18lk1010028).
Author information
Authors and Affiliations
Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan
Teppei Kanayama, Yusuke Kurose & Tatsuya Harada
Kyoto Second Red Cross Hospital, Kyoto, Japan
Kiyohito Tanaka
Research Center for Medical Bigdata, National Institute of Informatics, Tokyo, Japan
Kento Aida, Shin’ichi Satoh & Tatsuya Harada
Institute of Industrial Science, The University of Tokyo, Tokyo, Japan
Masaru Kitsuregawa
National Institute of Informatics, Tokyo, Japan
Masaru Kitsuregawa
- Teppei Kanayama
You can also search for this author inPubMed Google Scholar
- Yusuke Kurose
You can also search for this author inPubMed Google Scholar
- Kiyohito Tanaka
You can also search for this author inPubMed Google Scholar
- Kento Aida
You can also search for this author inPubMed Google Scholar
- Shin’ichi Satoh
You can also search for this author inPubMed Google Scholar
- Masaru Kitsuregawa
You can also search for this author inPubMed Google Scholar
- Tatsuya Harada
You can also search for this author inPubMed Google Scholar
Corresponding author
Correspondence toTeppei Kanayama.
Editor information
Editors and Affiliations
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Dinggang Shen
University of Georgia, Athens, GA, USA
Tianming Liu
Western University, London, ON, Canada
Terry M. Peters
Yale University, New Haven, CT, USA
Lawrence H. Staib
University of Strasbourg, Illkirch, France
Caroline Essert
United Imaging Intelligence, Shanghai, China
Sean Zhou
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Pew-Thian Yap
Western University, London, ON, Canada
Ali Khan
1Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Kanayama, T.et al. (2019). Gastric Cancer Detection from Endoscopic Images Using Synthesis by GAN. In: Shen, D.,et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2019. MICCAI 2019. Lecture Notes in Computer Science(), vol 11768. Springer, Cham. https://doi.org/10.1007/978-3-030-32254-0_59
Download citation
Published:
Publisher Name:Springer, Cham
Print ISBN:978-3-030-32253-3
Online ISBN:978-3-030-32254-0
eBook Packages:Computer ScienceComputer Science (R0)
Share this paper
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative