Movatterモバイル変換

Part of the book series:Lecture Notes in Computer Science ((LNIP,volume 11861))

Included in the following conference series:

International Workshop on Machine Learning in Medical Imaging

5836Accesses
7Citations

Abstract

Automatic dense parcellation of brain MR image, which labels hundreds of regions of interest (ROIs), plays an important role for neuroimage analysis. Specifically, the brain image parcellation using deep learning technology has been widely recognized for its effective performance, but it remains limited in actual application due to its high demand for sufficient training data and intensive memory allocation of GPU resources. Due to the high cost of manual segmentation, it is usually not feasible to provide large dataset for training the network. On the other hand, it is relatively easy to transfer labeling information to many new unlabeled datasets and thus augment the training data. However, the augmented data can only be considered as weakly labeled for training. Therefore, in this paper, we propose a cascaded weakly super- vised confidence integration network (CINet). The main contributions of our method are two-folds. First, we propose the image registration-based data argumentation method, and evaluate the confidence of the labeling information for each augmented image. The augmented data, as well as the original yet small training dataset, contribute to the modeling of the CINet jointly for segmentation. Second, we propose the random crop strategy to handle the large amount of feature channels in the network, which are needed to label hundreds of neural ROIs. The demanding requirement to GPU memory is thus relieved, while better accuracy can also be achieved. In experiments, we use 37 manually labeled subjects and augment 96 images with weak labels for training. The testing result in overall Dice score over 112 brain regions reaches 75%, which is higher than using the original training data only.

You have full access to this open access chapter, Download conference paper PDF

An Approach to Building Foundation Models for Brain Image Analysis

Probabilistic Multilayer Regularization Network for Unsupervised 3D Brain Image Registration

Pair-Wise and Group-Wise Deformation Consistency in Deep Registration Network

Keywords

1Introduction

Whole brain segmentation toward structural MRI is of great significance for both brain researches and clinical applications. As the topic has been investigated for long time, there are several ways to implement whole brain segmentation, which can be generally grouped as follows:

(1)
Registration-based method [2,4]. One can label a brain image in two steps, by first registering the atlas image, as well as its label map, to the target image precisely, and then derive the segmentation result for the target by fusing the warped label maps (e.g., through voting). However, the registration-based methods may suffer from large structural variation and registration errors;
(2)
Patch-based method [5]. Atlas images are first roughly registered to the target image (e.g., by affine registration). Then, sophisticated patch-based label fusion algorithms are adopted to evaluate the patch-wise similarity between the target image and the atlases. If patches have similar intensity appearance in a spatially non-local area, they should have similar labels. Though the non- local label fusion has shown its superior accuracy and alleviated the demanding requirement to precise registration, the process could be very time-consuming due to the patch-by-patch processing;
(3)
Learning-based method. Supervised machine learning has long been ap- plied to solve brain image segmentation. The fully convolution neural network (FCN) [7] in particular has become state-of-the-art solution in many problems of medical image segmentation. We can directly learn a network from the atlas data set, in which every atlas comes with its labels for supervision. At the same time, the inference upon a target image is very fast by the FCN framework.

However, it is challenging if we are trying to automatically label the whole brain into a large number of small ROIs (e.g. >100 labels corresponding to different neural structural and functional areas) by FCN. The limited GPU memory has restricted the number of the channels in the later convolutional layers of the network, thus making it hard for the network to handle hundreds of ROIs simultaneously. Meanwhile, with many ROIs to label, it is more difficult to prepare sufficient training data, which obviously hurts the generalization capability of the trained network.

In this paper, we propose a cascaded weakly supervised confidence integration network (CINet) to address the problem of the limited GPU memory and limited training data. Specifically, we design the registration-based data augmentation method as well as the confidence evaluation mechanism upon the augmented data. The augmented images, with their evaluated confidences, can act as weak supervision and help optimize the network in addition to the original training dataset. Besides, we extend the V-Net [8] framework by the novel random crop strategy, such that hundreds of ROIs can be simultaneously handled in both training and testing. We apply our method to whole brain parcellation, and achieve superior performance in segmenting brain MR images.

2Method

In this section, we present the proposed data augmentation method and the details of CINet. Particularly we propose a registration-based approach to augment the training data with weak labels in Sect. 2.1. The confidence evaluation toward the weak labeling information of the augmented images is provided in Sect. 2.2. Then, we introduce CINet and apply it to brain image segmentation in Sect. 2.3.

2.1Augmentation Toward Weakly Labeled Images

An important factor that limits the accuracy of the trained segmentation network is the insufficient number of high-quality training data. Concerning the difficulty to generate many high-quality experts labeling, we propose to argument the limited number of initial training data. Although the augmented images own weak labels only, the segmentation network can benefit from incorporating more training images, i.e., by combing the augmented images with the initial high-quality training data.

The augmentation toward the weakly labeled data is attained through conventional registration-based segmentation, which can only label the ROIs roughly for an unlabeled image. Specifically, given an arbitrary image that is unlabeled, we first use SyN [1] to register the well labeled images (i.e., from the initial training dataset) and warp their label maps accordingly. Then, for label fusion, we adopt majority voting weighted by local patch-to-patch similarity. In our implementation, the patch size is set to 5*5*5.

Since the above registration and label fusion process introduces high errors, the new image can only be considered as weakly labeled. For convenience, the weakly labeled images form a dataset$ \left\{ {\left( {F_{1} ,W_{1} } \right),\left( {F_{2} ,W_{2} } \right), \cdots ,\left( {F_{s} ,W_{s} } \right)} \right\} $, as$ F_{i} $ has its weak label map asW_i. Since the unlabeled image can be arbitrarily selected, theoretically we can acquire unlimited number of augmented images and their weak labels in the way above.

2.2Confidence of the Weakly Labeled Data

In order to make proper use of the weakly labeled images in$ \left\{ {\left( {F_{1} ,W_{1} } \right),\left( {F_{2} ,W_{2} } \right), \cdots ,\left( {F_{s} ,W_{s} } \right)} \right\} $, we train a confidence network to predict the confidence that measures the quality of the label at each point of$ F_{j} $. The role of the confidence network is illustrated in Fig. 1. The confidence network is trained by using the initial training data set, which is designated by$ \left\{ {\left( {M_{1} ,L_{1} } \right),\left( {M_{2} ,L_{2} } \right), \cdots ,\left( {M_{k} ,L_{k} } \right)} \right\} $. For a certain image$ M_{i} $, we can apply the protocol in Sect. 2.1, and generate its corresponding weak label (i.e.,$ L_{i}^{'} $) by using other training images in the dataset. Then,$ M_{i} $ and$ L_{i}^{'} $ are input to the confidence network, which aims to predict the errors in$ L_{i}^{'} $ by referring to the supervision of$ L_{i} $. That is, for location x in the image space, the output of the confidence network is 1 if$ L_{i}^{'} \left( x \right) = L_{i} \left( x \right) $, or 0 if$ L_{i}^{'} \left( x \right) \ne L_{i} \left( x \right) $. The architecture of the confidence network, which has two dense.

Blocks, is shown in Fig. 1. The exemplar inputs and output of the network are also shown in Fig. 1.

Once the training of the confidence network is completed, it can be used to evaluate the weakly labeled data in Sect. 2.1. We apply it to the weakly labeled dataset and obtain the confidence map$ C_{i} $ for each$ W_{i} $. At each position,$ C_{i} $ gives$ W_{i} $ a confidence flag. If it is equal to 1, it means that the label in Wi can be used as a correct label for network training. If it is equal to 0, it means that the label in$ W_{i} $ is wrong which should not participate in the training process of the network. Thus, we have the set of$ \left\{ {\left( {F_{1} ,W_{1} ,C_{1} } \right),\left( {F_{2} ,W_{2} ,C_{2} } \right), \cdots ,\left( {F_{s} ,W_{s} ,C_{s} } \right)} \right\} $, which will be used to train CINet in the next.

2.3Confidence Integration Network (CINet) and Loss Function

Besides the insufficient number of high-quality training data problem, the net- work training also suffers from large memory consumption of GPU memory. The root of the problem lies in one-hot encoding. For example, if the image size is 208*208*160, which we use in this paper, the output of the network with 112 brain regions would be 208*208*160*113. This will take up to 20G GPU memory. To tackle the above dilemma, previous studies follow region-based method [6] which train more than one network to track different part of the brain. As a result, the whole process will become quite complex. Another way to track that problem is that train the network on a random patch of the image and test it on the whole brain image. However, the segmentation accuracy will accept a decrease as the field of view (FOV) at the test stage is different from that at the training stage. Therefore, it seems that there must always be an inevitably trade-off between GPU memory consumption and input image size.

Here we made three changes on the basic framework of V-Net to track that contradictions: (1) Decompose V-Net into two sub-networks as describe in Fig. 2. At any time, point in the training stage, only intermediate variables associated with one of the sub-networks are stored in the GPU memory. This can be easily implemented with the help of technology checkpoint provided by Pytorch; (2) Replace all the convolution operation in the second sub-network by 1*1*1 convolution. This allows the FOV to remain unchanged at this stage; (3) In the training stage, we randomly crop a patch from the output of the first sub-network as the input of the second sub-network. Naturally, the corresponding image patch in the ground truth was used to calculate the loss of the network. In the test stage, the crop operation will be ignored. We name this strategy Random crop mechanism.

It is clearly that, we can freely change the memory occupied by subnetwork 2 according to the GPU memory we have and the FOV of the whole network will make no difference between training and testing stage. Meanwhile, the proposed network can process the entire brain image in one network instead of training multiple networks in different part of the brain as SLANT [6].

For atlas data set$ \left\{ {\left( {M_{1} ,L_{1} } \right),\left( {M_{2} ,L_{2} } \right), \cdots ,\left( {M_{k} ,L_{k} } \right)} \right\} $ and the weakly labeled data set$ \left\{ {\left( {F_{1} ,W_{1} ,C_{1} } \right),\left( {F_{2} ,W_{2} ,C_{2} } \right), \cdots ,\left( {F_{s} ,W_{s} ,C_{s} } \right)} \right\} $, we use two different loss function in the training stage. With atlas data set$ \left\{ {\left( {M_{1} ,L_{1} } \right),\left( {M_{2} ,L_{2} } \right), \cdots ,\left( {M_{k} ,L_{k} } \right)} \right\} $, we use cross-entropy as loss. For$ \left\{ {\left( {F_{1} ,W_{1} ,C_{1} } \right),\left( {F_{2} ,W_{2} ,C_{2} } \right), \cdots ,\left( {F_{s} ,W_{s} ,C_{s} } \right)} \right\} $, we use confidence map$ C_{i} $ as masks by setting a threshold (i.e.,$ \gamma $) to select parts of the image to enhance the performance of the segmentation network. Thus, our proposed loss can be defined by:

$$ Loss = \mathop \sum \limits_{x} \left[ {C\left( x \right) > \upgamma } \right] \times CrossEntrpy\left( {CINet\left( F \right),W} \right)\left( x \right) $$

Here, we named the modified model CINet which trained on the confidence integration data set in Sect. 2.2. The framework of CINet is outlined in Fig. 2.

3Materials and Experiments

3.1Materials

Our brain data set consisted of 67 brain T1 MRI images from clinical partner hospital. We randomly chose 37 as training set, 10 subjects as validation set and the remaining 20 subjects as test set. All 67 subjects were processed by FreeSurfer first and manually-corrected by an experienced doctor. Each subject has 112 brain areas. Besides, we have also acquired 96 MRI images for data augmentation proposed in Sects. 2.1 and2.2. The image size of all these images is$ 208 \times 208 \times 160 $, and the voxel size is$ 1 \times 1 \times 1\,{\text{mm}}^{3} $. In this study, we first processed all MR image using a standard pipeline. More specifically, rigid registration was employed to align all training and testing data to colin27 template [3], followed by skull stripping to ensure the skull are cleanly removed. Then Histogram match was deployed to transform the intensity-scale to the intensity space of colin27 cohort.

3.2Experimental and Results

We did ablation experimental to demonstrate the effectiveness of our proposed method. First, with an image patch as the input of network, we trained the original V-Net as the baseline method on the atlas data set. The parameters were set the same as the original V-Net. Next, we trained our CINet on the same atlas data set which have 37 brain images (CINet-37). By using extra weakly labeled data set which obtained by Sect. 2.1, we first trained the CINet on it and fine-tuning the network on the atlas data set. Finally, we coalesced confidence map provide in Sect. 2.2 and augmented data in Sect. 2.3 to train our network where VNet was modified to fulfill GPU memory limitation (CINet-37+96C).

We used Pytroch to implement all the experiments. For all networks, we used the same hyper-parameter with batch size = 4, input resolution = 208*208*160 (96*96*96 for the original V-Net), input channel = 1, output channel = 113, optimizer = Adam, max iteration = 1000. The learning rate for segmentation network and confidence network are initialized to 0.001. Every 100 epoch, the learning rate automatically decreased 50%. Two GPUs with 32 GB are utilized to train the networks. After training each network, we use the atlas image set to fine-tuning the segmentation network.

We evaluated the accuracy of each segmentation experimental in terms of Dice score. The quantitative result has been showed in Table 1. The mean Dice coefficient of the original V-Net trained by patch-based method only achieve 0.69. Meanwhile, the proposed CINet got a dice = 0.70 which prove training the net- work with the whole brain image can improve segmentation accuracy. In our opinion, this is because our CINet is not bothered by FOV inconsistency during training and testing. Consistent with the conclusion of [9], the weakly labeled data set bring 3.7% improvements on the test data set. This shows that although the data obtained by data augmentation has the components of the incorrect labeling, these data still bring diversity to the training samples and improve the generalization performance of the model. Further, the effectiveness of confidence map was confirmed by the final result obtained by CINet-37+96C which is the best result of all the method. It is easy to explain because the confidence map reduces the misleading information of incorrectly labeled training for the net- work. Figure 3 shows a visual comparison of different labeling results with VNet, CINet, CINet-37+96 W, and CINet-37+96C.

Table 1. Segmentation result in terms of Dice, evaluated on the test data set. We report the mean (standard deviation) Dice coefficient across all 112 brain labels on all test subjects.

Full size table

4Conclusion and Discussion

In this study, we develop the CINet to reduce the consumption of GPU memory which enable us to train a neural network on the whole brain space. Because of this advantage, we can greatly improve the speed of segmentation algorithm, and make it possible to explore more complex network structure. For the same network, results from 96 extra weakly annotated data acquired automatically helps us achieved better performance than the results from atlas image set. We demonstrate that our method reduces the cost of labeling and achieves segmentation accuracy close to all data with accurate labeling by automatic data argumentation.

References

Avants, B.B., Epstein, C.L., Grossman, M., Gee, J.C.: Symmetric diffeomorphic image registration with cross-correlation: evaluating automated labeling of elderly and neurodegenerative brain. Med. Image Anal.12(1), 26–41 (2008)
Article Google Scholar
Balakrishnan, G., Zhao, A., Sabuncu, M.R., Guttag, J., Dalca, A.V.: Voxelmorph: A learning framework for deformable medical image registrationPP(99), 1 (2018)
Google Scholar
Collins, D.L., et al.: Design and construction of a realistic digital brain phantom. IEEE Trans. Med. Imaging17(3), 463–468 (1998)
Article Google Scholar
Dalca, A.V., Balakrishnan, G., Guttag, J., Sabuncu, M.R.: Unsupervised learning for fast probabilistic diffeomorphic registration. In: Frangi, A., Schnabel, J., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11070, pp. 729–738. Springer, Cham (2018).https://doi.org/10.1007/978-3-030-00928-1_82
Chapter Google Scholar
Fang, L., et al.: Automatic brain labeling via multi-atlas guided fully convolutional networks. Med. Image Anal.51, 157–168 (2019)
Article Google Scholar
Huo, Y., et al.: Spatially localized atlas network tiles enables 3D whole brain segmentation from limited data. In: Frangi, A., Schnabel, J., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11072, pp. 698–705. Springer, Cham (2018).https://doi.org/10.1007/978-3-030-00931-1_80
Chapter Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
Google Scholar
Milletari, F., Navab, N., Ahmadi, S.A.: V-net: fully convolutional neural networks for volumetric medical image segmentation. In: Fourth International Conference on 3D Vision (2016)
Google Scholar
Zlateski, A., Jaroensri, R., Sharma, P., Durand, F.: On the importance of label quality for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1479–1487 (2018)
Google Scholar

Download references

Acknowledgement

This research was supported by the grants from the National Key Research and Development Program of China (No. 2018YFC0116400).

Author information

Authors and Affiliations

Shanghai United Imaging Intelligence Co., Ltd., Shanghai, China
Bin Xiao, Yiqiang Zhan, Xiang Sean Zhou, Zhong Xue & Feng Shi
School of Biomedical Engineering, Med-X Research Institute, Shanghai Jiao Tong University, Shanghai, China
Bin Xiao, Qian Wang, Lichi Zhang & Dongming Wei
Department of Medical Imaging, Jinling Hospital, Nanjing University School of Medicine, Nanjing, China
Xiaoqing Cheng & Guangming Lu
School of Biomedical Engineering, Southern Medical University, Guangdong, China
Qingfeng Li

Authors

Bin Xiao
View author publications
Search author on:PubMed Google Scholar
Xiaoqing Cheng
View author publications
Search author on:PubMed Google Scholar
Qingfeng Li
View author publications
Search author on:PubMed Google Scholar
Qian Wang
View author publications
Search author on:PubMed Google Scholar
Lichi Zhang
View author publications
Search author on:PubMed Google Scholar
Dongming Wei
View author publications
Search author on:PubMed Google Scholar
Yiqiang Zhan
View author publications
Search author on:PubMed Google Scholar
Xiang Sean Zhou
View author publications
Search author on:PubMed Google Scholar
Zhong Xue
View author publications
Search author on:PubMed Google Scholar
Guangming Lu
View author publications
Search author on:PubMed Google Scholar
Feng Shi
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence toFeng Shi.

Editor information

Editors and Affiliations

Korea University, Seoul, Korea (Republic of)
Heung-Il Suk
University of North Carolina, Chapel Hill, NC, USA
Mingxia Liu
Rensselaer Polytechnic Institute, Troy, NY, USA
Pingkun Yan
University of North Carolina, Chapel Hill, NC, USA
Chunfeng Lian

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Xiao, B.et al. (2019). Weakly Supervised Confidence Learning for Brain MR Image Dense Parcellation. In: Suk, HI., Liu, M., Yan, P., Lian, C. (eds) Machine Learning in Medical Imaging. MLMI 2019. Lecture Notes in Computer Science(), vol 11861. Springer, Cham. https://doi.org/10.1007/978-3-030-32692-0_47

Download citation

DOI:https://doi.org/10.1007/978-3-030-32692-0_47
Published:10 October 2019
Publisher Name:Springer, Cham
Print ISBN:978-3-030-32691-3
Online ISBN:978-3-030-32692-0
eBook Packages:Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)