Movatterモバイル変換


[0]ホーム

URL:


Skip to main content

Advertisement

Springer Nature Link
Log in

CLAD-Net: cross-layer aggregation attention network for real-time endoscopic instrument detection

  • Research
  • Published:
Health Information Science and Systems Aims and scope Submit manuscript

Abstract

As medical treatments continue to advance rapidly, minimally invasive surgery (MIS) has found extensive applications across various clinical procedures. Accurate identification of medical instruments plays a vital role in comprehending surgical situations and facilitating endoscopic image-guided surgical procedures. However, the endoscopic instrument detection poses a great challenge owing to the narrow operating space, with various interfering factors (e.g. smoke, blood, body fluids) and inevitable issues (e.g. mirror reflection, visual obstruction, illumination variation) in the surgery. To promote surgical efficiency and safety in MIS, this paper proposes a cross-layer aggregated attention detection network (CLAD-Net) for accurate and real-time detection of endoscopic instruments in complex surgical scenarios. We propose a cross-layer aggregation attention module to enhance the fusion of features and raise the effectiveness of lateral propagation of feature information. We propose a composite attention mechanism (CAM) to extract contextual information at different scales and model the importance of each channel in the feature map, mitigate the information loss due to feature fusion, and effectively solve the problem of inconsistent target size and low contrast in complex contexts. Moreover, the proposed feature refinement module (RM) enhances the network’s ability to extract target edge and detail information by adaptively adjusting the feature weights to fuse different layers of features. The performance of CLAD-Net was evaluated using a public laparoscopic dataset Cholec80 and another set of neuroendoscopic dataset from Sun Yat-sen University Cancer Center. From both datasets and comparisons, CLAD-Net achieves the\(AP_{0.5}\) of 98.9% and 98.6%, respectively, that is better than advanced detection networks. A video for the real-time detection is presented in the following link:https://github.com/A0268/video-demo.

This is a preview of subscription content,log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Omisore OM, Han S, Xiong J, Li H, Li Z, Wang L. A review on flexible robotic systems for minimally invasive surgery. IEEE Trans Syst Man Cybern Syst. 2020;52(1):631–44.

    Article  Google Scholar 

  2. Tonutti M, Elson DS, Yang G-Z, Darzi AW, Sodergren MH. The role of technology in minimally invasive surgery: state of the art, recent developments and future directions. Postgrad Med J. 2017;93(1097):159–67.

    Article  Google Scholar 

  3. Casas-Yrurzum S, Gimeno J, Casanova-Salas P, García-Pereira I, Olmo E, Salvador A, Guijarro R, Zaragoza C, Fernández M. A new mixed reality tool for training in minimally invasive robotic-assisted surgery. Health Inform Sci Syst. 2023;11(1):34.

    Article  Google Scholar 

  4. Kim M, Kim H-S, Oh SW, Adsul NM, Singh R, Kashlan ON, Noh JH, Jang IT, Oh SH. Evolution of spinal endoscopic surgery. Neurospine. 2019;16(1):6–14.

    Article  Google Scholar 

  5. Chu Y, Yang X, Li H, Ai D, Ding Y, Fan J, Song H, Yang J. Multi-level feature aggregation network for instrument identification of endoscopic images. Phys Med Biol. 2020;65(16): 165004.

    Article  Google Scholar 

  6. Lam K, Lo FP-W, An Y, Darzi A, Kinross JM, Purkayastha S, Lo B. Deep learning for instrument detection and assessment of operative skill in surgical videos. IEEE Trans Med Robot Bion. 2022;4(4):1068–71.

    Article  Google Scholar 

  7. Fuente López E, García ÁM, Del Blanco LS, Marinero JCF, Turiel JP. Automatic gauze tracking in laparoscopic surgery using image texture analysis. Comput Methods Programs Biomed. 2020;190:105378.

    Article  Google Scholar 

  8. Cartucho J, Wang C, Huang B, Elson SD, Darzi A, Giannarou S. An enhanced marker pattern that achieves improved accuracy in surgical tool tracking. Comput Methods Biomech Biomed Eng. 2022;10(4):400–8.

    Google Scholar 

  9. Kranzfelder M, Schneider A, Fiolka A, Schwan E, Gillen S, Wilhelm D, Schirren R, Reiser S, Jensen B, Feussner H. Real-time instrument detection in minimally invasive surgery using radiofrequency identification technology. J Surgical Res. 2013;185(2):704–10.

    Article  Google Scholar 

  10. Liu Y, Zhao Z, Shi P, Li F. Towards surgical tools detection and operative skill assessment based on deep learning. IEEE Trans Med Robot Bion. 2022;4(1):62–71.

    Article  Google Scholar 

  11. Lin T-Y, Goyal P, Girshick R, He K, Dollár P. Focal loss for dense object detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017:2980–2988.

  12. Ren S, He K, Girshick R, Sun J. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell. 2017;39(6):1137–49.

    Article  Google Scholar 

  13. Redmon J, Divvala S, Girshick R, Farhadi A. You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016:779–788.

  14. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC. SSD: Single shot multibox detector. In: European conference on computer vision, 2016:21–37.

  15. Liu Y, Zhang C, Wu W, Zhang B, Zhou F. MiniYOLO: a lightweight object detection algorithm that realizes the trade-off between model size and detection accuracy. Int J Intell Syst. 2022;37(12):12135–51.

    Article  Google Scholar 

  16. Peng J, Chen Q, Kang L, Jie H, Han Y. Autonomous recognition of multiple surgical instruments tips based on arrow obb-yolo network. IEEE Trans Instrum Meas. 2022;71:1–13.

    Google Scholar 

  17. Sarki R, Ahmed K, Wang H, Zhang Y. Automated detection of mild and multi-class diabetic eye diseases using deep learning. Health Inform Sci Syst. 2020;8(1):32.

    Article  Google Scholar 

  18. Qin F, Li Y, Su Y-H, Xu D, Hannaford B. Surgical instrument segmentation for endoscopic vision with data fusion of cnn prediction and kinematic pose. In: 2019 international conference on robotics and automation (ICRA), 2019:9821–9827.

  19. Yamashita K, Kusuda K, Ito Y, Komino M, Tanaka K, Kurokawa S, Ameya M, Eba D, Masamune K, Muragaki Y, et al. Evaluation of surgical instruments with radiofrequency identification tags in the operating room. Surgical Innov. 2018;25(4):374–9.

    Article  Google Scholar 

  20. Yang C, Zhao Z, Hu S. Image-based laparoscopic tool detection and tracking using convolutional neural networks: a review of the literature. Comput Assist Surg. 2020;25(1):15–28.

    Article  Google Scholar 

  21. Xue Y, Liu S, Li Y, Wang P, Qian X. A new weakly supervised strategy for surgical tool detection. Knowl-Based Syst. 2022;239: 107860.

    Article  Google Scholar 

  22. Namazi B, Sankaranarayanan G, Devarajan V. A contextual detector of surgical tools in laparoscopic videos using deep learning. Surg Endosc. 2021;8:1–10.

    Google Scholar 

  23. Yang L, Gu Y, Bian G, Liu Y. TMF-Net: a transformer-based multiscale fusion network for surgical instrument segmentation from endoscopic images. IEEE Trans Instrum Meas. 2023;72:1–15.

    Google Scholar 

  24. Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S. Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 2017:2117–2125.

  25. Wang C, Zhong C. Adaptive feature pyramid networks for object detection. IEEE Access. 2021;9:107024–32.

    Article  Google Scholar 

  26. Li Z, Lang C, Liew JH, Li Y, Hou Q, Feng J. Cross-layer feature pyramid network for salient object detection. IEEE Trans Image Process. 2021;30:4587–98.

    Article  Google Scholar 

  27. Hu J, Shen L, Sun G. Squeeze-and-Excitation Networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 2018:7132–7141.

  28. Woo S, Park J, Lee J-Y, Kweon IS. CBAM: Convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), 2018:3–19.

  29. Hou Q, Zhou D, Feng J. Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021:13713–13722.

  30. Peng Y, Xu Y, Wang M, Zhang H, Xie J. The nnU-Net based method for automatic segmenting fetal brain tissues. Health Inform Sci Syst. 2023;11(1):17.

    Article  Google Scholar 

  31. Wang H, Cao P, Yang J, Zaiane O. MCA-UNet: multi-scale cross co-attentional u-net for automatic medical image segmentation. Health Inform Sci Syst. 2023;11(1):10.

    Article  Google Scholar 

  32. Lin Z, He Z, Yao R, Wang X, Liu T, Deng Y, Xie S. Deep dual attention network for precise diagnosis of Covid-19 from chest ct images. In: IEEE Transactions on Artificial Intelligence, 2022:1–11.

  33. Ni Z-L, Bian G-B, Xie X-L, Hou Z-G, Zhou X-H, Zhou Y-J. RASNet: Segmentation for tracking surgical instruments in surgical videos using refined attention segmentation network. In: 2019 41st annual international conference of the IEEE engineering in medicine and biology society (EMBC), 2019:5735–5738.

  34. Liu T, He Z, Lin Z, Cao G-Z, Su W, Xie S. An adaptive image segmentation network for surface defect detection. In: IEEE Transactions on Neural Networks and Learning Systems, 2022:1–14.

  35. Li Y, Li Y, He W, Shi W, Wang T, Li Y. SE-OHFM: A surgical phase recognition network with se attention module. In: 2021 International Conference on Electronic Information Engineering and Computer Science (EIECS), 2021:608–611.

  36. Shaw P, Uszkoreit J, Vaswani A. Self-Attention with relative position representations. arXiv preprintarXiv:1803.02155 2018.

  37. Xu Y, Huang H, Feng C, Hu Y. A supervised multi-head self-attention network for nested named entity recognition. Proc AAAI Conf Artif Intell. 2021;35:14185–93.

    Google Scholar 

  38. Twinanda AP, Shehata S, Mutter D, Marescaux J, De Mathelin M, Padoy N. EndoNet: a deep architecture for recognition tasks on laparoscopic videos. IEEE Trans Med Imaging. 2016;36(1):86–97.

    Article  Google Scholar 

  39. Zheng Z, Wang P, Liu W, Li J, Ye R, Ren D. Distance-IoU loss: Faster and better learning for bounding box regression. In: Proceedings of the AAAI conference on artificial intelligence, 2020:12993–13000.

  40. Zhou X, Wang D, Krähenbühl P. Objects as points. arXiv preprintarXiv:1904.07850 2019.

  41. Tan M, Pang R, Le QV. Efficientdet: Scalable and efficient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020:10781–10790.

  42. Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S. End-to-end object detection with transformers. In: European conference on computer vision, 2020:213–229.

  43. ultralytics: yolov5.https://github.com/ultralytics/yolov5

  44. Ge Z, Liu S,Wang F, Li Z, Sun J. YOLOX: Exceeding yolo series in 2021. arXiv preprintarXiv:2107.08430 2021.

  45. Li C, Li L, Jiang H, Weng K, Geng Y, Li L, Ke Z, Li Q, Cheng M, Nie W et al. YOLOv6: A single-stage object detection framework for industrial applications. arXiv preprintarXiv:2209.02976 2022.

  46. Lv W, Xu S, Zhao Y, Wang G, Wei J, Cui C, Du Y, Dang Q, Liu Y. Detrs beat yolos on real-time object detection. arXiv preprintarXiv:2304.08069 2023.

  47. Sarikaya D, Corso JJ, Guru KA. Detection and localization of robotic tools in robot-assisted surgery videos using deep neural networks for region proposal and detection. IEEE Trans Med Imaging. 2017;36(7):1542–9.

    Article  Google Scholar 

  48. Shi M, Shen J, Yi Q, Weng J, Huang Z, Luo A, Zhou Y. LMFFNet: a well-balanced lightweight network for fast and accurate semantic segmentation. IEEE Trans Neural Netw Learn Syst. 2023;34(6):3205–19.

    Article  Google Scholar 

  49. Shi X, Chen Z, Wang H, Yeung D-Y, Wong W-K, Woo W-c. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. Advances in Neural Information Processing Systems. 2015;28.

  50. Xu H, Xie H, Tan Q, Zhang Y. Meta semi-supervised medical image segmentation with label hierarchy. Health Inform Sci Syst. 2023;11(1):26.

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported by the Fundamental and Applied Basic Research Program of Guangdong Province (Grant No. 2023A1515030179)

Author information

Authors and Affiliations

  1. School of Automation, Guangdong University of Technology, Guangzhou, 510006, China

    Xiushun Zhao, Jing Guo & Zhaoshui He

  2. Department of Neurosurgery, Sun Yat-Sen University Cancer Center, Guangzhou, 510006, China

    Xiaobing Jiang & Depei Li

  3. Department of Gastroenterology, The First Affiliated Hospital of Zhejiang Chinese Medical University (Zhejiang Provincial Hospital of Chinese Medicine), Hangzhou, 310006, China

    Haifang Lou

Authors
  1. Xiushun Zhao

    You can also search for this author inPubMed Google Scholar

  2. Jing Guo

    You can also search for this author inPubMed Google Scholar

  3. Zhaoshui He

    You can also search for this author inPubMed Google Scholar

  4. Xiaobing Jiang

    You can also search for this author inPubMed Google Scholar

  5. Haifang Lou

    You can also search for this author inPubMed Google Scholar

  6. Depei Li

    You can also search for this author inPubMed Google Scholar

Corresponding authors

Correspondence toHaifang Lou orDepei Li.

Ethics declarations

Conflict of interest

The authors declare no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhao, X., Guo, J., He, Z.et al. CLAD-Net: cross-layer aggregation attention network for real-time endoscopic instrument detection.Health Inf Sci Syst11, 58 (2023). https://doi.org/10.1007/s13755-023-00260-9

Download citation

Keywords

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Advertisement


[8]ページ先頭

©2009-2025 Movatter.jp