Movatterモバイル変換

Gitaek Lee ORCID:orcid.org/0009-0003-0750-2759¹⁰,
Taehwa Lee¹¹,
Byeongjin Kim¹⁰ &
…
Wooju Kim ORCID:orcid.org/0000-0001-5828-178X¹⁰

Part of the book series:Lecture Notes in Computer Science ((LNAI,volume 15431))

Included in the following conference series:

International Conference on Multi-disciplinary Trends in Artificial Intelligence

157Accesses

Abstract

With the huge advance of deep neural networks (DNN) in recent years, modern video surveillance systems have often been equipped with DNN-based vision methods such as object detection, thus promoting operational efficiency. However, many adversarial attack methods to malfunction the DNN-based vision systems have also been introduced concurrently. In this paper, we suggest that object removal methods, which remove particular objects in the scenes by masking the objects and filling in the masked regions, can work as a new way of adversarial attacks to intelligent video surveillance systems that employ DNN-based object detection models. For this purpose, we propose a novel object removal network, Training-free Online Object Removal Network (TOORNet). Unlike most existing object removal methods and video inpainting ones, which are closely related to them, our method works online (i.e., processing a frame based on only the past frames right upon a model receives it) and in real-time, thereby enabling us to perform object removals in live streaming videos. Experimental results show that our method successfully removes particular objects in live streaming videos, hindering the intelligent video surveillance systems from detecting the objects.

This is a preview of subscription content,log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 8464; Price includes VAT (Japan)

Softcover Book: JPY 10581; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Brown, T.B., Mané, D., Roy, A., Abadi, M., Gilmer, J.: Adversarial patch. arXiv preprintarXiv:1712.09665 (2017)
Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: 2017 IEEE Symposium on Security and Privacy (SP), pp. 39–57. IEEE (2017)
Google Scholar
Chang, Y.L., Liu, Z.Y., Hsu, W.: VORNet: spatio-temporally consistent video inpainting for object removal. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (2019)
Google Scholar
Cheng, H.K., Schwing, A.G.: XMem: long-term video object segmentation with an Atkinson-Shiffrin memory model. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13688, pp. 640–658. Springer, Cham (2022).https://doi.org/10.1007/978-3-031-19815-1_37
Cheng, T., Song, L., Ge, Y., Liu, W., Wang, X., Shan, Y.: YOLO-World: real-time open-vocabulary object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16901–16911 (2024)
Google Scholar
Contributors, M.: MMTracking: OpenMMLab video perception toolbox and benchmark (2020).https://github.com/open-mmlab/mmtracking
Gong, T., et al.: Temporal ROI align for video object recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 1442–1450 (2021)
Google Scholar
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. arXiv preprintarXiv:1412.6572 (2014)
Herling, J., Broll, W.: High-quality real-time video inpainting with PixMix. IEEE Trans. Visual Comput. Graphics20(6), 866–879 (2014)
Article MATH Google Scholar
Hong, L., et al.: LVOS: a benchmark for long-term video object segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13480–13492 (2023)
Google Scholar
Kari, M., et al.: TransforMR: pose-aware object substitution for composing alternate mixed realities. In: 2021 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 69–79. IEEE (2021)
Google Scholar
Li, Z., Lu, C.Z., Qin, J., Guo, C.L., Cheng, M.M.: Towards an end-to-end framework for flow-guided video inpainting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 17562–17571 (2022)
Google Scholar
Liu, R., et al.: Decoupled spatial-temporal transformer for video inpainting. arXiv preprintarXiv:2104.06637 (2021)
Liu, R., et al.: FuseFormer: fusing fine-grained information in transformers for video inpainting. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14040–14049 (2021)
Google Scholar
Moosavi-Dezfooli, S.M., Fawzi, A., Frossard, P.: DeepFool: a simple and accurate method to fool deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2574–2582 (2016)
Google Scholar
Papernot, N., McDaniel, P., Jha, S., Fredrikson, M., Celik, Z.B., Swami, A.: The limitations of deep learning in adversarial settings. In: 2016 IEEE European symposium on security and privacy (EuroS &P), pp. 372–387. IEEE (2016)
Google Scholar
Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vision115, 211–252 (2015)
Article MathSciNet Google Scholar
Thiry, G., Tang, H., Timofte, R., Van Gool, L.: Towards online real-time memory-based video inpainting transformers. arXiv preprintarXiv:2403.16161 (2024)
Vaswani, A., et al.: Attention is all You need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Wang, T.C., et al.: Video-to-video synthesis. arXiv preprintarXiv:1808.06601 (2018)
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process.13(4), 600–612 (2004)
Article MATH Google Scholar
Wu, H., Chen, Y., Wang, N., Zhang, Z.: Sequence level semantics aggregation for video object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 9217–9225 (2019)
Google Scholar
Zhang, C., et al.: Faster segment anything: towards lightweight SAM for mobile applications. arXiv preprintarXiv:2306.14289 (2023)
Zhu, X., Wang, Y., Dai, J., Yuan, L., Wei, Y.: Flow-guided feature aggregation for video object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 408–417 (2017)
Google Scholar
Zhu, X., Xiong, Y., Dai, J., Yuan, L., Wei, Y.: Deep feature flow for video recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2349–2358 (2017)
Google Scholar

Download references

Acknowledgments

This work is funded by Korea Ministry of Land, Infrastructure and Transport(MOLIT) as Innovative Talent Education Program for Smart City.

Author information

Authors and Affiliations

Smart Systems Lab, Yonsei University, Seoul, South Korea
Gitaek Lee, Byeongjin Kim & Wooju Kim
Korea National Defense University, Nonsan, South Korea
Taehwa Lee

Authors

Gitaek Lee
View author publications
You can also search for this author inPubMed Google Scholar
Taehwa Lee
View author publications
You can also search for this author inPubMed Google Scholar
Byeongjin Kim
View author publications
You can also search for this author inPubMed Google Scholar
Wooju Kim
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence toWooju Kim.

Editor information

Editors and Affiliations

Mahasarakham University, Mahasarakham, Thailand
Chattrakul Sombattheera
Duke Kunshan University, Kunshan, China
Paul Weng
University of Luxembourg, Esch-sur-Alzette, Luxembourg
Jun Pang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lee, G., Lee, T., Kim, B., Kim, W. (2025). TOORNet: Training-Free Online Object Removal Network for Adversarial Attacks on Intelligent Video Surveillance Systems. In: Sombattheera, C., Weng, P., Pang, J. (eds) Multi-disciplinary Trends in Artificial Intelligence. MIWAI 2024. Lecture Notes in Computer Science(), vol 15431. Springer, Singapore. https://doi.org/10.1007/978-981-96-0692-4_30

Download citation

DOI:https://doi.org/10.1007/978-981-96-0692-4_30
Published:20 February 2025
Publisher Name:Springer, Singapore
Print ISBN:978-981-96-0691-7
Online ISBN:978-981-96-0692-4
eBook Packages:Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Movatterモバイル変換

TOORNet: Training-Free Online Object Removal Network for Adversarial Attacks on Intelligent Video Surveillance Systems

Abstract

Access this chapter

Subscribe and save

Buy Now

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Access this chapter

Subscribe and save

Buy Now