- Bin Xiao12,13,
- Xiaoqing Cheng14,
- Qingfeng Li15,
- Qian Wang13,
- Lichi Zhang13,
- Dongming Wei13,
- Yiqiang Zhan12,
- Xiang Sean Zhou12,
- Zhong Xue12,
- Guangming Lu14 &
- …
- Feng Shi12
Part of the book series:Lecture Notes in Computer Science ((LNIP,volume 11861))
Included in the following conference series:
5719Accesses
Abstract
Automatic dense parcellation of brain MR image, which labels hundreds of regions of interest (ROIs), plays an important role for neuroimage analysis. Specifically, the brain image parcellation using deep learning technology has been widely recognized for its effective performance, but it remains limited in actual application due to its high demand for sufficient training data and intensive memory allocation of GPU resources. Due to the high cost of manual segmentation, it is usually not feasible to provide large dataset for training the network. On the other hand, it is relatively easy to transfer labeling information to many new unlabeled datasets and thus augment the training data. However, the augmented data can only be considered as weakly labeled for training. Therefore, in this paper, we propose a cascaded weakly super- vised confidence integration network (CINet). The main contributions of our method are two-folds. First, we propose the image registration-based data argumentation method, and evaluate the confidence of the labeling information for each augmented image. The augmented data, as well as the original yet small training dataset, contribute to the modeling of the CINet jointly for segmentation. Second, we propose the random crop strategy to handle the large amount of feature channels in the network, which are needed to label hundreds of neural ROIs. The demanding requirement to GPU memory is thus relieved, while better accuracy can also be achieved. In experiments, we use 37 manually labeled subjects and augment 96 images with weak labels for training. The testing result in overall Dice score over 112 brain regions reaches 75%, which is higher than using the original training data only.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1Introduction
Whole brain segmentation toward structural MRI is of great significance for both brain researches and clinical applications. As the topic has been investigated for long time, there are several ways to implement whole brain segmentation, which can be generally grouped as follows:
- (1)
Registration-based method [2,4]. One can label a brain image in two steps, by first registering the atlas image, as well as its label map, to the target image precisely, and then derive the segmentation result for the target by fusing the warped label maps (e.g., through voting). However, the registration-based methods may suffer from large structural variation and registration errors;
- (2)
Patch-based method [5]. Atlas images are first roughly registered to the target image (e.g., by affine registration). Then, sophisticated patch-based label fusion algorithms are adopted to evaluate the patch-wise similarity between the target image and the atlases. If patches have similar intensity appearance in a spatially non-local area, they should have similar labels. Though the non- local label fusion has shown its superior accuracy and alleviated the demanding requirement to precise registration, the process could be very time-consuming due to the patch-by-patch processing;
- (3)
Learning-based method. Supervised machine learning has long been ap- plied to solve brain image segmentation. The fully convolution neural network (FCN) [7] in particular has become state-of-the-art solution in many problems of medical image segmentation. We can directly learn a network from the atlas data set, in which every atlas comes with its labels for supervision. At the same time, the inference upon a target image is very fast by the FCN framework.
However, it is challenging if we are trying to automatically label the whole brain into a large number of small ROIs (e.g. >100 labels corresponding to different neural structural and functional areas) by FCN. The limited GPU memory has restricted the number of the channels in the later convolutional layers of the network, thus making it hard for the network to handle hundreds of ROIs simultaneously. Meanwhile, with many ROIs to label, it is more difficult to prepare sufficient training data, which obviously hurts the generalization capability of the trained network.
In this paper, we propose a cascaded weakly supervised confidence integration network (CINet) to address the problem of the limited GPU memory and limited training data. Specifically, we design the registration-based data augmentation method as well as the confidence evaluation mechanism upon the augmented data. The augmented images, with their evaluated confidences, can act as weak supervision and help optimize the network in addition to the original training dataset. Besides, we extend the V-Net [8] framework by the novel random crop strategy, such that hundreds of ROIs can be simultaneously handled in both training and testing. We apply our method to whole brain parcellation, and achieve superior performance in segmenting brain MR images.
2Method
In this section, we present the proposed data augmentation method and the details of CINet. Particularly we propose a registration-based approach to augment the training data with weak labels in Sect. 2.1. The confidence evaluation toward the weak labeling information of the augmented images is provided in Sect. 2.2. Then, we introduce CINet and apply it to brain image segmentation in Sect. 2.3.
2.1Augmentation Toward Weakly Labeled Images
An important factor that limits the accuracy of the trained segmentation network is the insufficient number of high-quality training data. Concerning the difficulty to generate many high-quality experts labeling, we propose to argument the limited number of initial training data. Although the augmented images own weak labels only, the segmentation network can benefit from incorporating more training images, i.e., by combing the augmented images with the initial high-quality training data.
The augmentation toward the weakly labeled data is attained through conventional registration-based segmentation, which can only label the ROIs roughly for an unlabeled image. Specifically, given an arbitrary image that is unlabeled, we first use SyN [1] to register the well labeled images (i.e., from the initial training dataset) and warp their label maps accordingly. Then, for label fusion, we adopt majority voting weighted by local patch-to-patch similarity. In our implementation, the patch size is set to 5*5*5.
Since the above registration and label fusion process introduces high errors, the new image can only be considered as weakly labeled. For convenience, the weakly labeled images form a dataset\( \left\{ {\left( {F_{1} ,W_{1} } \right),\left( {F_{2} ,W_{2} } \right), \cdots ,\left( {F_{s} ,W_{s} } \right)} \right\} \), as\( F_{i} \) has its weak label map asWi. Since the unlabeled image can be arbitrarily selected, theoretically we can acquire unlimited number of augmented images and their weak labels in the way above.
2.2Confidence of the Weakly Labeled Data
In order to make proper use of the weakly labeled images in\( \left\{ {\left( {F_{1} ,W_{1} } \right),\left( {F_{2} ,W_{2} } \right), \cdots ,\left( {F_{s} ,W_{s} } \right)} \right\} \), we train a confidence network to predict the confidence that measures the quality of the label at each point of\( F_{j} \). The role of the confidence network is illustrated in Fig. 1. The confidence network is trained by using the initial training data set, which is designated by\( \left\{ {\left( {M_{1} ,L_{1} } \right),\left( {M_{2} ,L_{2} } \right), \cdots ,\left( {M_{k} ,L_{k} } \right)} \right\} \). For a certain image\( M_{i} \), we can apply the protocol in Sect. 2.1, and generate its corresponding weak label (i.e.,\( L_{i}^{'} \)) by using other training images in the dataset. Then,\( M_{i} \) and\( L_{i}^{'} \) are input to the confidence network, which aims to predict the errors in\( L_{i}^{'} \) by referring to the supervision of\( L_{i} \). That is, for location x in the image space, the output of the confidence network is 1 if\( L_{i}^{'} \left( x \right) = L_{i} \left( x \right) \), or 0 if\( L_{i}^{'} \left( x \right) \ne L_{i} \left( x \right) \). The architecture of the confidence network, which has two dense.
The architecture of the confidence network. The unlabeled image and its weak label map (left) are input to the network, while the errors between the weak labels and the real labels (right) supervise the training of the network. In inference, the output of the network can be regarded as confidence measure to the weak labels of the input image.
Blocks, is shown in Fig. 1. The exemplar inputs and output of the network are also shown in Fig. 1.
Once the training of the confidence network is completed, it can be used to evaluate the weakly labeled data in Sect. 2.1. We apply it to the weakly labeled dataset and obtain the confidence map\( C_{i} \) for each\( W_{i} \). At each position,\( C_{i} \) gives\( W_{i} \) a confidence flag. If it is equal to 1, it means that the label in Wi can be used as a correct label for network training. If it is equal to 0, it means that the label in\( W_{i} \) is wrong which should not participate in the training process of the network. Thus, we have the set of\( \left\{ {\left( {F_{1} ,W_{1} ,C_{1} } \right),\left( {F_{2} ,W_{2} ,C_{2} } \right), \cdots ,\left( {F_{s} ,W_{s} ,C_{s} } \right)} \right\} \), which will be used to train CINet in the next.
2.3Confidence Integration Network (CINet) and Loss Function
Besides the insufficient number of high-quality training data problem, the net- work training also suffers from large memory consumption of GPU memory. The root of the problem lies in one-hot encoding. For example, if the image size is 208*208*160, which we use in this paper, the output of the network with 112 brain regions would be 208*208*160*113. This will take up to 20G GPU memory. To tackle the above dilemma, previous studies follow region-based method [6] which train more than one network to track different part of the brain. As a result, the whole process will become quite complex. Another way to track that problem is that train the network on a random patch of the image and test it on the whole brain image. However, the segmentation accuracy will accept a decrease as the field of view (FOV) at the test stage is different from that at the training stage. Therefore, it seems that there must always be an inevitably trade-off between GPU memory consumption and input image size.
Here we made three changes on the basic framework of V-Net to track that contradictions: (1) Decompose V-Net into two sub-networks as describe in Fig. 2. At any time, point in the training stage, only intermediate variables associated with one of the sub-networks are stored in the GPU memory. This can be easily implemented with the help of technology checkpoint provided by Pytorch; (2) Replace all the convolution operation in the second sub-network by 1*1*1 convolution. This allows the FOV to remain unchanged at this stage; (3) In the training stage, we randomly crop a patch from the output of the first sub-network as the input of the second sub-network. Naturally, the corresponding image patch in the ground truth was used to calculate the loss of the network. In the test stage, the crop operation will be ignored. We name this strategy Random crop mechanism.
Illustration of the two-stage process in CINet. We use part of the V-Net as the Sub-network 1 of our CINet. The input image size is 208*208*160. After Subnetwork 1 (V-Net alike), the feature map has a resolution of 208*208*160*32. In the training stage, Subnetwork 2 use a patch randomly cropped from that feature map to optimize the network parameter. While in the test stage, the Subnetwork 2 use the whole feature map to predict the label map without crop operation. The label map has 113 channels as we have 112 regions of interest and background.
It is clearly that, we can freely change the memory occupied by subnetwork 2 according to the GPU memory we have and the FOV of the whole network will make no difference between training and testing stage. Meanwhile, the proposed network can process the entire brain image in one network instead of training multiple networks in different part of the brain as SLANT [6].
For atlas data set\( \left\{ {\left( {M_{1} ,L_{1} } \right),\left( {M_{2} ,L_{2} } \right), \cdots ,\left( {M_{k} ,L_{k} } \right)} \right\} \) and the weakly labeled data set\( \left\{ {\left( {F_{1} ,W_{1} ,C_{1} } \right),\left( {F_{2} ,W_{2} ,C_{2} } \right), \cdots ,\left( {F_{s} ,W_{s} ,C_{s} } \right)} \right\} \), we use two different loss function in the training stage. With atlas data set\( \left\{ {\left( {M_{1} ,L_{1} } \right),\left( {M_{2} ,L_{2} } \right), \cdots ,\left( {M_{k} ,L_{k} } \right)} \right\} \), we use cross-entropy as loss. For\( \left\{ {\left( {F_{1} ,W_{1} ,C_{1} } \right),\left( {F_{2} ,W_{2} ,C_{2} } \right), \cdots ,\left( {F_{s} ,W_{s} ,C_{s} } \right)} \right\} \), we use confidence map\( C_{i} \) as masks by setting a threshold (i.e.,\( \gamma \)) to select parts of the image to enhance the performance of the segmentation network. Thus, our proposed loss can be defined by:
Here, we named the modified model CINet which trained on the confidence integration data set in Sect. 2.2. The framework of CINet is outlined in Fig. 2.
3Materials and Experiments
3.1Materials
Our brain data set consisted of 67 brain T1 MRI images from clinical partner hospital. We randomly chose 37 as training set, 10 subjects as validation set and the remaining 20 subjects as test set. All 67 subjects were processed by FreeSurfer first and manually-corrected by an experienced doctor. Each subject has 112 brain areas. Besides, we have also acquired 96 MRI images for data augmentation proposed in Sects. 2.1 and2.2. The image size of all these images is\( 208 \times 208 \times 160 \), and the voxel size is\( 1 \times 1 \times 1\,{\text{mm}}^{3} \). In this study, we first processed all MR image using a standard pipeline. More specifically, rigid registration was employed to align all training and testing data to colin27 template [3], followed by skull stripping to ensure the skull are cleanly removed. Then Histogram match was deployed to transform the intensity-scale to the intensity space of colin27 cohort.
3.2Experimental and Results
We did ablation experimental to demonstrate the effectiveness of our proposed method. First, with an image patch as the input of network, we trained the original V-Net as the baseline method on the atlas data set. The parameters were set the same as the original V-Net. Next, we trained our CINet on the same atlas data set which have 37 brain images (CINet-37). By using extra weakly labeled data set which obtained by Sect. 2.1, we first trained the CINet on it and fine-tuning the network on the atlas data set. Finally, we coalesced confidence map provide in Sect. 2.2 and augmented data in Sect. 2.3 to train our network where VNet was modified to fulfill GPU memory limitation (CINet-37+96C).
We used Pytroch to implement all the experiments. For all networks, we used the same hyper-parameter with batch size = 4, input resolution = 208*208*160 (96*96*96 for the original V-Net), input channel = 1, output channel = 113, optimizer = Adam, max iteration = 1000. The learning rate for segmentation network and confidence network are initialized to 0.001. Every 100 epoch, the learning rate automatically decreased 50%. Two GPUs with 32 GB are utilized to train the networks. After training each network, we use the atlas image set to fine-tuning the segmentation network.
We evaluated the accuracy of each segmentation experimental in terms of Dice score. The quantitative result has been showed in Table 1. The mean Dice coefficient of the original V-Net trained by patch-based method only achieve 0.69. Meanwhile, the proposed CINet got a dice = 0.70 which prove training the net- work with the whole brain image can improve segmentation accuracy. In our opinion, this is because our CINet is not bothered by FOV inconsistency during training and testing. Consistent with the conclusion of [9], the weakly labeled data set bring 3.7% improvements on the test data set. This shows that although the data obtained by data augmentation has the components of the incorrect labeling, these data still bring diversity to the training samples and improve the generalization performance of the model. Further, the effectiveness of confidence map was confirmed by the final result obtained by CINet-37+96C which is the best result of all the method. It is easy to explain because the confidence map reduces the misleading information of incorrectly labeled training for the net- work. Figure 3 shows a visual comparison of different labeling results with VNet, CINet, CINet-37+96 W, and CINet-37+96C.
Visual comparison of labeling results by VNet, CINet, CINet-37+96 W, and CINet-37+96C. The proposed method CINet-37+96C produces the most accurate labels for the regions inside the box. We can observe that the segmentation result is better from vnet to CINet-37+96C, and CInet is better than Vnet as it overcomes the problem of inconsistency of field of view in the training and testing phases. At the same time, thanks to the weakly labeled data, the generalization ability of CINet-37+96 W and CINet-37+96C to the unknown data is gradually improved, and better segmentation results are obtained.
4Conclusion and Discussion
In this study, we develop the CINet to reduce the consumption of GPU memory which enable us to train a neural network on the whole brain space. Because of this advantage, we can greatly improve the speed of segmentation algorithm, and make it possible to explore more complex network structure. For the same network, results from 96 extra weakly annotated data acquired automatically helps us achieved better performance than the results from atlas image set. We demonstrate that our method reduces the cost of labeling and achieves segmentation accuracy close to all data with accurate labeling by automatic data argumentation.
References
Avants, B.B., Epstein, C.L., Grossman, M., Gee, J.C.: Symmetric diffeomorphic image registration with cross-correlation: evaluating automated labeling of elderly and neurodegenerative brain. Med. Image Anal.12(1), 26–41 (2008)
Balakrishnan, G., Zhao, A., Sabuncu, M.R., Guttag, J., Dalca, A.V.: Voxelmorph: A learning framework for deformable medical image registrationPP(99), 1 (2018)
Collins, D.L., et al.: Design and construction of a realistic digital brain phantom. IEEE Trans. Med. Imaging17(3), 463–468 (1998)
Dalca, A.V., Balakrishnan, G., Guttag, J., Sabuncu, M.R.: Unsupervised learning for fast probabilistic diffeomorphic registration. In: Frangi, A., Schnabel, J., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11070, pp. 729–738. Springer, Cham (2018).https://doi.org/10.1007/978-3-030-00928-1_82
Fang, L., et al.: Automatic brain labeling via multi-atlas guided fully convolutional networks. Med. Image Anal.51, 157–168 (2019)
Huo, Y., et al.: Spatially localized atlas network tiles enables 3D whole brain segmentation from limited data. In: Frangi, A., Schnabel, J., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11072, pp. 698–705. Springer, Cham (2018).https://doi.org/10.1007/978-3-030-00931-1_80
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
Milletari, F., Navab, N., Ahmadi, S.A.: V-net: fully convolutional neural networks for volumetric medical image segmentation. In: Fourth International Conference on 3D Vision (2016)
Zlateski, A., Jaroensri, R., Sharma, P., Durand, F.: On the importance of label quality for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1479–1487 (2018)
Acknowledgement
This research was supported by the grants from the National Key Research and Development Program of China (No. 2018YFC0116400).
Author information
Authors and Affiliations
Shanghai United Imaging Intelligence Co., Ltd., Shanghai, China
Bin Xiao, Yiqiang Zhan, Xiang Sean Zhou, Zhong Xue & Feng Shi
School of Biomedical Engineering, Med-X Research Institute, Shanghai Jiao Tong University, Shanghai, China
Bin Xiao, Qian Wang, Lichi Zhang & Dongming Wei
Department of Medical Imaging, Jinling Hospital, Nanjing University School of Medicine, Nanjing, China
Xiaoqing Cheng & Guangming Lu
School of Biomedical Engineering, Southern Medical University, Guangdong, China
Qingfeng Li
- Bin Xiao
You can also search for this author inPubMed Google Scholar
- Xiaoqing Cheng
You can also search for this author inPubMed Google Scholar
- Qingfeng Li
You can also search for this author inPubMed Google Scholar
- Qian Wang
You can also search for this author inPubMed Google Scholar
- Lichi Zhang
You can also search for this author inPubMed Google Scholar
- Dongming Wei
You can also search for this author inPubMed Google Scholar
- Yiqiang Zhan
You can also search for this author inPubMed Google Scholar
- Xiang Sean Zhou
You can also search for this author inPubMed Google Scholar
- Zhong Xue
You can also search for this author inPubMed Google Scholar
- Guangming Lu
You can also search for this author inPubMed Google Scholar
- Feng Shi
You can also search for this author inPubMed Google Scholar
Corresponding author
Correspondence toFeng Shi.
Editor information
Editors and Affiliations
Korea University, Seoul, Korea (Republic of)
Heung-Il Suk
University of North Carolina, Chapel Hill, NC, USA
Mingxia Liu
Rensselaer Polytechnic Institute, Troy, NY, USA
Pingkun Yan
University of North Carolina, Chapel Hill, NC, USA
Chunfeng Lian
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Xiao, B.et al. (2019). Weakly Supervised Confidence Learning for Brain MR Image Dense Parcellation. In: Suk, HI., Liu, M., Yan, P., Lian, C. (eds) Machine Learning in Medical Imaging. MLMI 2019. Lecture Notes in Computer Science(), vol 11861. Springer, Cham. https://doi.org/10.1007/978-3-030-32692-0_47
Download citation
Published:
Publisher Name:Springer, Cham
Print ISBN:978-3-030-32691-3
Online ISBN:978-3-030-32692-0
eBook Packages:Computer ScienceComputer Science (R0)
Share this paper
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative