Computer Science > Computer Vision and Pattern Recognition
arXiv:2307.12591 (cs)
[Submitted on 24 Jul 2023]
Title:SwinMM: Masked Multi-view with Swin Transformers for 3D Medical Image Segmentation
Authors:Yiqing Wang,Zihan Li,Jieru Mei,Zihao Wei,Li Liu,Chen Wang,Shengtian Sang,Alan Yuille,Cihang Xie,Yuyin Zhou
View a PDF of the paper titled SwinMM: Masked Multi-view with Swin Transformers for 3D Medical Image Segmentation, by Yiqing Wang and 9 other authors
View PDFAbstract:Recent advancements in large-scale Vision Transformers have made significant strides in improving pre-trained models for medical image segmentation. However, these methods face a notable challenge in acquiring a substantial amount of pre-training data, particularly within the medical field. To address this limitation, we present Masked Multi-view with Swin Transformers (SwinMM), a novel multi-view pipeline for enabling accurate and data-efficient self-supervised medical image analysis. Our strategy harnesses the potential of multi-view information by incorporating two principal components. In the pre-training phase, we deploy a masked multi-view encoder devised to concurrently train masked multi-view observations through a range of diverse proxy tasks. These tasks span image reconstruction, rotation, contrastive learning, and a novel task that employs a mutual learning paradigm. This new task capitalizes on the consistency between predictions from various perspectives, enabling the extraction of hidden multi-view information from 3D medical data. In the fine-tuning stage, a cross-view decoder is developed to aggregate the multi-view information through a cross-attention block. Compared with the previous state-of-the-art self-supervised learning method Swin UNETR, SwinMM demonstrates a notable advantage on several medical image segmentation tasks. It allows for a smooth integration of multi-view information, significantly boosting both the accuracy and data-efficiency of the model. Code and models are available atthis https URL.
Comments: | MICCAI 2023; project page:this https URL |
Subjects: | Computer Vision and Pattern Recognition (cs.CV) |
Cite as: | arXiv:2307.12591 [cs.CV] |
(orarXiv:2307.12591v1 [cs.CV] for this version) | |
https://doi.org/10.48550/arXiv.2307.12591 arXiv-issued DOI via DataCite |
Full-text links:
Access Paper:
- View PDF
- TeX Source
- Other Formats
View a PDF of the paper titled SwinMM: Masked Multi-view with Swin Transformers for 3D Medical Image Segmentation, by Yiqing Wang and 9 other authors
References & Citations
Bibliographic and Citation Tools
Bibliographic Explorer(What is the Explorer?)
Connected Papers(What is Connected Papers?)
Litmaps(What is Litmaps?)
scite Smart Citations(What are Smart Citations?)
Code, Data and Media Associated with this Article
alphaXiv(What is alphaXiv?)
CatalyzeX Code Finder for Papers(What is CatalyzeX?)
DagsHub(What is DagsHub?)
Gotit.pub(What is GotitPub?)
Hugging Face(What is Huggingface?)
Papers with Code(What is Papers with Code?)
ScienceCast(What is ScienceCast?)
Demos
Recommenders and Search Tools
Influence Flower(What are Influence Flowers?)
CORE Recommender(What is CORE?)
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community?Learn more about arXivLabs.