Movatterモバイル変換


[0]ホーム

URL:


Skip to main content

Advertisement

Springer Nature Link
Log in

TOORNet: Training-Free Online Object Removal Network for Adversarial Attacks on Intelligent Video Surveillance Systems

  • Conference paper
  • First Online:

Abstract

With the huge advance of deep neural networks (DNN) in recent years, modern video surveillance systems have often been equipped with DNN-based vision methods such as object detection, thus promoting operational efficiency. However, many adversarial attack methods to malfunction the DNN-based vision systems have also been introduced concurrently. In this paper, we suggest that object removal methods, which remove particular objects in the scenes by masking the objects and filling in the masked regions, can work as a new way of adversarial attacks to intelligent video surveillance systems that employ DNN-based object detection models. For this purpose, we propose a novel object removal network, Training-free Online Object Removal Network (TOORNet). Unlike most existing object removal methods and video inpainting ones, which are closely related to them, our method works online (i.e., processing a frame based on only the past frames right upon a model receives it) and in real-time, thereby enabling us to perform object removals in live streaming videos. Experimental results show that our method successfully removes particular objects in live streaming videos, hindering the intelligent video surveillance systems from detecting the objects.

This is a preview of subscription content,log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 8464
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 10581
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide -see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

References

  1. Brown, T.B., Mané, D., Roy, A., Abadi, M., Gilmer, J.: Adversarial patch. arXiv preprintarXiv:1712.09665 (2017)

  2. Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: 2017 IEEE Symposium on Security and Privacy (SP), pp. 39–57. IEEE (2017)

    Google Scholar 

  3. Chang, Y.L., Liu, Z.Y., Hsu, W.: VORNet: spatio-temporally consistent video inpainting for object removal. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (2019)

    Google Scholar 

  4. Cheng, H.K., Schwing, A.G.: XMem: long-term video object segmentation with an Atkinson-Shiffrin memory model. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13688, pp. 640–658. Springer, Cham (2022).https://doi.org/10.1007/978-3-031-19815-1_37

  5. Cheng, T., Song, L., Ge, Y., Liu, W., Wang, X., Shan, Y.: YOLO-World: real-time open-vocabulary object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16901–16911 (2024)

    Google Scholar 

  6. Contributors, M.: MMTracking: OpenMMLab video perception toolbox and benchmark (2020).https://github.com/open-mmlab/mmtracking

  7. Gong, T., et al.: Temporal ROI align for video object recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 1442–1450 (2021)

    Google Scholar 

  8. Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. arXiv preprintarXiv:1412.6572 (2014)

  9. Herling, J., Broll, W.: High-quality real-time video inpainting with PixMix. IEEE Trans. Visual Comput. Graphics20(6), 866–879 (2014)

    Article MATH  Google Scholar 

  10. Hong, L., et al.: LVOS: a benchmark for long-term video object segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13480–13492 (2023)

    Google Scholar 

  11. Kari, M., et al.: TransforMR: pose-aware object substitution for composing alternate mixed realities. In: 2021 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 69–79. IEEE (2021)

    Google Scholar 

  12. Li, Z., Lu, C.Z., Qin, J., Guo, C.L., Cheng, M.M.: Towards an end-to-end framework for flow-guided video inpainting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 17562–17571 (2022)

    Google Scholar 

  13. Liu, R., et al.: Decoupled spatial-temporal transformer for video inpainting. arXiv preprintarXiv:2104.06637 (2021)

  14. Liu, R., et al.: FuseFormer: fusing fine-grained information in transformers for video inpainting. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14040–14049 (2021)

    Google Scholar 

  15. Moosavi-Dezfooli, S.M., Fawzi, A., Frossard, P.: DeepFool: a simple and accurate method to fool deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2574–2582 (2016)

    Google Scholar 

  16. Papernot, N., McDaniel, P., Jha, S., Fredrikson, M., Celik, Z.B., Swami, A.: The limitations of deep learning in adversarial settings. In: 2016 IEEE European symposium on security and privacy (EuroS &P), pp. 372–387. IEEE (2016)

    Google Scholar 

  17. Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vision115, 211–252 (2015)

    Article MathSciNet  Google Scholar 

  18. Thiry, G., Tang, H., Timofte, R., Van Gool, L.: Towards online real-time memory-based video inpainting transformers. arXiv preprintarXiv:2403.16161 (2024)

  19. Vaswani, A., et al.: Attention is all You need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  20. Wang, T.C., et al.: Video-to-video synthesis. arXiv preprintarXiv:1808.06601 (2018)

  21. Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process.13(4), 600–612 (2004)

    Article MATH  Google Scholar 

  22. Wu, H., Chen, Y., Wang, N., Zhang, Z.: Sequence level semantics aggregation for video object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 9217–9225 (2019)

    Google Scholar 

  23. Zhang, C., et al.: Faster segment anything: towards lightweight SAM for mobile applications. arXiv preprintarXiv:2306.14289 (2023)

  24. Zhu, X., Wang, Y., Dai, J., Yuan, L., Wei, Y.: Flow-guided feature aggregation for video object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 408–417 (2017)

    Google Scholar 

  25. Zhu, X., Xiong, Y., Dai, J., Yuan, L., Wei, Y.: Deep feature flow for video recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2349–2358 (2017)

    Google Scholar 

Download references

Acknowledgments

This work is funded by Korea Ministry of Land, Infrastructure and Transport(MOLIT) as Innovative Talent Education Program for Smart City.

Author information

Authors and Affiliations

  1. Smart Systems Lab, Yonsei University, Seoul, South Korea

    Gitaek Lee, Byeongjin Kim & Wooju Kim

  2. Korea National Defense University, Nonsan, South Korea

    Taehwa Lee

Authors
  1. Gitaek Lee

    You can also search for this author inPubMed Google Scholar

  2. Taehwa Lee

    You can also search for this author inPubMed Google Scholar

  3. Byeongjin Kim

    You can also search for this author inPubMed Google Scholar

  4. Wooju Kim

    You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence toWooju Kim.

Editor information

Editors and Affiliations

  1. Mahasarakham University, Mahasarakham, Thailand

    Chattrakul Sombattheera

  2. Duke Kunshan University, Kunshan, China

    Paul Weng

  3. University of Luxembourg, Esch-sur-Alzette, Luxembourg

    Jun Pang

Rights and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lee, G., Lee, T., Kim, B., Kim, W. (2025). TOORNet: Training-Free Online Object Removal Network for Adversarial Attacks on Intelligent Video Surveillance Systems. In: Sombattheera, C., Weng, P., Pang, J. (eds) Multi-disciplinary Trends in Artificial Intelligence. MIWAI 2024. Lecture Notes in Computer Science(), vol 15431. Springer, Singapore. https://doi.org/10.1007/978-981-96-0692-4_30

Download citation

Publish with us

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 8464
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 10581
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide -see info

Tax calculation will be finalised at checkout

Purchases are for personal use only


[8]ページ先頭

©2009-2025 Movatter.jp