Movatterモバイル変換


[0]ホーム

URL:


Skip to main content

Advertisement

Springer Nature Link
Log in

Token Sparsification for Faster Medical Image Segmentation

  • Conference paper
  • First Online:

Part of the book series:Lecture Notes in Computer Science ((LNCS,volume 13939))

  • 3487Accesses

  • 3Citations

Abstract

Can we use sparse tokens for dense prediction, e.g., segmentation? Although token sparsification has been applied to Vision Transformers (ViT) to accelerate classification, it is still unknown how to perform segmentation from sparse tokens. To this end, we reformulate segmentation as asparse encoding\(\rightarrow \)tokencompletion\(\rightarrow \)dense decoding (SCD) pipeline. We first empirically show that naïvely applying existing approaches from classification token pruning and masked image modeling (MIM) leads to failure and inefficient training caused by inappropriate sampling algorithms and the low quality of the restored dense features. In this paper, we proposeSoft-topK Token Pruning (STP) andMulti-layer Token Assembly (MTA) to address these problems. Insparse encoding,STP predicts token importance scores with a lightweight sub-network and samples the topK tokens. The intractable topK gradients are approximated through a continuous perturbed score distribution. Intoken completion,MTA restores a full token sequence by assembling both sparse output tokens and pruned multi-layer intermediate ones. The lastdense decoding stage is compatible with existing segmentation decoders, e.g., UNETR. Experiments show SCD pipelines equipped withSTP andMTA are much faster than baselines without token pruning in both training (up to 120% higher throughput) and inference (up to 60.6% higher throughput) while maintaining segmentation quality. Code is available here:https://github.com/cvlab-stonybrook/TokenSparse-for-MedSeg.

This is a preview of subscription content,log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 12583
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 15729
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide -see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Similar content being viewed by others

Notes

  1. 1.

    \(\textrm{Gumbel}(0,1)\) samples are drawn by sampling\(-\textrm{log}(-\textrm{log}\;u)\) where\(u \sim \textrm{Uniform}(0, 1)\).

References

  1. Antonelli, M., et al.: The medical segmentation decathlon. arXiv preprintarXiv:2106.05735 (2021)

  2. Bao, H., Dong, L., Wei, F.: BEiT: BERT pre-training of image transformers. arXiv preprintarXiv:2106.08254 (2021)

  3. Chen, J.N.: Transunet.https://github.com/Beckschen/TransUNet

  4. Chen, J., et al.: TransUNet: transformers make strong encoders for medical image segmentation. arXiv preprintarXiv:2102.04306 (2021)

  5. Cordonnier, J.B., Mahendran, A., Dosovitskiy, A., Weissenborn, D., Uszkoreit, J., Unterthiner, T.: Differentiable patch selection for image recognition. In: CVPR, pp. 2351–2360 (2021)

    Google Scholar 

  6. Dosovitskiy, A., et al.: An image is worth 16\(\times \)16 words: transformers for image recognition at scale. arXiv preprintarXiv:2010.11929 (2020)

  7. Fu, S., et al.: Domain adaptive relational reasoning for 3D multi-organ segmentation. In: Martel, A.L., et al. (eds.) MICCAI 2020. LNCS, vol. 12261, pp. 656–666. Springer, Cham (2020).https://doi.org/10.1007/978-3-030-59710-8_64

    Chapter  Google Scholar 

  8. Hatamizadeh, A., et al.: UNETR. In: WACV (2022)

    Google Scholar 

  9. He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.: Masked autoencoders are scalable vision learners. arXiv preprintarXiv:2111.06377 (2021)

  10. Jang, E., Gu, S., Poole, B.: Categorical reparameterization with gumbel-softmax. arXiv preprintarXiv:1611.01144 (2016)

  11. Landman, B., Xu, Z., Igelsias, J., Styner, M., Langerak, T., Klein, A.: MICCAI multi-atlas labeling beyond the cranial vault-workshop and challenge. In: Proceedings of the MICCAI Multi-Atlas Labeling Beyond Cranial Vault-Workshop Challenge (2015)

    Google Scholar 

  12. Li, J., Cotterell, R., Sachan, M.: Differentiable subset pruning of transformer heads. Trans. Assoc. Comput. Linguist.9, 1442–1459 (2021)

    Article  Google Scholar 

  13. Li, Y., Mao, H., Girshick, R., He, K.: Exploring plain vision transformer backbones for object detection. arXiv preprintarXiv:2203.16527 (2022)

  14. Li, Y., Xie, S., Chen, X., Dollar, P., He, K., Girshick, R.: Benchmarking detection transfer learning with vision transformers. arXiv preprintarXiv:2111.11429 (2021)

  15. Liang, Y., Chongjian, G., Tong, Z., Song, Y., Wang, J., Xie, P.: Evit: expediting vision transformers via token reorganizations. In: ICLR (2021)

    Google Scholar 

  16. Meng, L., et al.: AdaViT: adaptive ViTs for efficient image recognition. arXiv preprintarXiv:2111.15668 (2021)

  17. Milletari, F., Navab, N., Ahmadi, S.A.: V-net: fully convolutional neural networks for volumetric medical image segmentation. In: 3DV, pp. 565–571. IEEE (2016)

    Google Scholar 

  18. MONAI Consortium: MONAI: Medical Open Network for AI (2020).https://doi.org/10.5281/zenodo.4323058,https://github.com/Project-MONAI/MONAI

  19. Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library. In: NeurIPS, vol. 32 (2019)

    Google Scholar 

  20. Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV (2021)

    Google Scholar 

  21. Rao, Y., Zhao, W., Liu, B., Lu, J., Zhou, J., Hsieh, C.J.: DynamicViT: efficient vision transformers with dynamic token sparsification. In: NeurIPS, vol. 34 (2021)

    Google Scholar 

  22. Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015).https://doi.org/10.1007/978-3-319-24574-4_28

    Chapter  Google Scholar 

  23. Schlemper, J., et al.: Attention gated networks: learning to leverage salient regions in medical images. Med. Image Anal.53, 197–207 (2019)

    Article  Google Scholar 

  24. Tang, Y., et al.: Self-supervised pre-training of swin transformers for 3D medical image analysis. In: CVPR (2022)

    Google Scholar 

  25. Vaswani, A., et al.: Attention is all you need. In: NeurIPS, vol. 30 (2017)

    Google Scholar 

  26. Wang, J., Sun, K., Cheng, T., Jiang, B., Deng, C., Zhao, Y., Liu, D., Mu, Y., Tan, M., Wang, X., et al.: Deep high-resolution representation learning for visual recognition. IEEE Trans. PAMI43(10), 3349–3364 (2020)

    Article  Google Scholar 

  27. Wu, Y., et al.: D-former: a U-shaped dilated transformer for 3D medical image segmentation. arXiv preprintarXiv:2201.00462 (2022)

  28. Xie, S.M., Ermon, S.: Reparameterizable subset sampling via continuous relaxations. arXiv preprintarXiv:1901.10517 (2019)

  29. Zheng, S., et al.: Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: CVPR (2021)

    Google Scholar 

  30. Zhou, H.Y., Guo, J., Zhang, Y., Yu, L., Wang, L., Yu, Y.: nnFormer: interleaved transformer for volumetric segmentation. arXiv preprintarXiv:2109.03201 (2021)

Download references

Acknowledgement

The reported research was partly supported by NIH award # 1R21CA258493-01A1, NSF awards IIS-2212046 and IIS-2123920, and Stony Brook OVPR seed grants. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Author information

Authors and Affiliations

  1. Department of Computer Science, Stony Brook University, Stony Brook, NY, USA

    Lei Zhou, Huidong Liu & Dimitris Samaras

  2. Department of Biomedical Informatics, Stony Brook University, Stony Brook, NY, USA

    Joseph Bae & Prateek Prasanna

  3. Amazon, Seattle, WA, USA

    Huidong Liu

  4. Shanghai Artificial Intelligence Laboratory, Shanghai, China

    Junjun He

Authors
  1. Lei Zhou
  2. Huidong Liu
  3. Joseph Bae
  4. Junjun He
  5. Dimitris Samaras
  6. Prateek Prasanna

Corresponding author

Correspondence toLei Zhou.

Editor information

Editors and Affiliations

  1. University of Leeds, Leeds, UK

    Alejandro Frangi

  2. University of Copenhagen, Copenhagen, Denmark

    Marleen de Bruijne

  3. Inria Saclay - Île-de-France Research Centre, Palaiseau, France

    Demian Wassermann

  4. Technical University of Munich Garching, Munich, Bayern, Germany

    Nassir Navab

Rights and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhou, L., Liu, H., Bae, J., He, J., Samaras, D., Prasanna, P. (2023). Token Sparsification for Faster Medical Image Segmentation. In: Frangi, A., de Bruijne, M., Wassermann, D., Navab, N. (eds) Information Processing in Medical Imaging. IPMI 2023. Lecture Notes in Computer Science, vol 13939. Springer, Cham. https://doi.org/10.1007/978-3-031-34048-2_57

Download citation

Publish with us

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 12583
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 15729
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide -see info

Tax calculation will be finalised at checkout

Purchases are for personal use only


[8]ページ先頭

©2009-2025 Movatter.jp