- Gitaek Lee ORCID:orcid.org/0009-0003-0750-275910,
- Taehwa Lee11,
- Byeongjin Kim10 &
- …
- Wooju Kim ORCID:orcid.org/0000-0001-5828-178X10
Part of the book series:Lecture Notes in Computer Science ((LNAI,volume 15431))
Included in the following conference series:
157Accesses
Abstract
With the huge advance of deep neural networks (DNN) in recent years, modern video surveillance systems have often been equipped with DNN-based vision methods such as object detection, thus promoting operational efficiency. However, many adversarial attack methods to malfunction the DNN-based vision systems have also been introduced concurrently. In this paper, we suggest that object removal methods, which remove particular objects in the scenes by masking the objects and filling in the masked regions, can work as a new way of adversarial attacks to intelligent video surveillance systems that employ DNN-based object detection models. For this purpose, we propose a novel object removal network, Training-free Online Object Removal Network (TOORNet). Unlike most existing object removal methods and video inpainting ones, which are closely related to them, our method works online (i.e., processing a frame based on only the past frames right upon a model receives it) and in real-time, thereby enabling us to perform object removals in live streaming videos. Experimental results show that our method successfully removes particular objects in live streaming videos, hindering the intelligent video surveillance systems from detecting the objects.
This is a preview of subscription content,log in via an institution to check access.
Access this chapter
Subscribe and save
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
Buy Now
- Chapter
- JPY 3498
- Price includes VAT (Japan)
- eBook
- JPY 8464
- Price includes VAT (Japan)
- Softcover Book
- JPY 10581
- Price includes VAT (Japan)
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Brown, T.B., Mané, D., Roy, A., Abadi, M., Gilmer, J.: Adversarial patch. arXiv preprintarXiv:1712.09665 (2017)
Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: 2017 IEEE Symposium on Security and Privacy (SP), pp. 39–57. IEEE (2017)
Chang, Y.L., Liu, Z.Y., Hsu, W.: VORNet: spatio-temporally consistent video inpainting for object removal. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (2019)
Cheng, H.K., Schwing, A.G.: XMem: long-term video object segmentation with an Atkinson-Shiffrin memory model. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13688, pp. 640–658. Springer, Cham (2022).https://doi.org/10.1007/978-3-031-19815-1_37
Cheng, T., Song, L., Ge, Y., Liu, W., Wang, X., Shan, Y.: YOLO-World: real-time open-vocabulary object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16901–16911 (2024)
Contributors, M.: MMTracking: OpenMMLab video perception toolbox and benchmark (2020).https://github.com/open-mmlab/mmtracking
Gong, T., et al.: Temporal ROI align for video object recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 1442–1450 (2021)
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. arXiv preprintarXiv:1412.6572 (2014)
Herling, J., Broll, W.: High-quality real-time video inpainting with PixMix. IEEE Trans. Visual Comput. Graphics20(6), 866–879 (2014)
Hong, L., et al.: LVOS: a benchmark for long-term video object segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13480–13492 (2023)
Kari, M., et al.: TransforMR: pose-aware object substitution for composing alternate mixed realities. In: 2021 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 69–79. IEEE (2021)
Li, Z., Lu, C.Z., Qin, J., Guo, C.L., Cheng, M.M.: Towards an end-to-end framework for flow-guided video inpainting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 17562–17571 (2022)
Liu, R., et al.: Decoupled spatial-temporal transformer for video inpainting. arXiv preprintarXiv:2104.06637 (2021)
Liu, R., et al.: FuseFormer: fusing fine-grained information in transformers for video inpainting. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14040–14049 (2021)
Moosavi-Dezfooli, S.M., Fawzi, A., Frossard, P.: DeepFool: a simple and accurate method to fool deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2574–2582 (2016)
Papernot, N., McDaniel, P., Jha, S., Fredrikson, M., Celik, Z.B., Swami, A.: The limitations of deep learning in adversarial settings. In: 2016 IEEE European symposium on security and privacy (EuroS &P), pp. 372–387. IEEE (2016)
Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vision115, 211–252 (2015)
Thiry, G., Tang, H., Timofte, R., Van Gool, L.: Towards online real-time memory-based video inpainting transformers. arXiv preprintarXiv:2403.16161 (2024)
Vaswani, A., et al.: Attention is all You need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Wang, T.C., et al.: Video-to-video synthesis. arXiv preprintarXiv:1808.06601 (2018)
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process.13(4), 600–612 (2004)
Wu, H., Chen, Y., Wang, N., Zhang, Z.: Sequence level semantics aggregation for video object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 9217–9225 (2019)
Zhang, C., et al.: Faster segment anything: towards lightweight SAM for mobile applications. arXiv preprintarXiv:2306.14289 (2023)
Zhu, X., Wang, Y., Dai, J., Yuan, L., Wei, Y.: Flow-guided feature aggregation for video object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 408–417 (2017)
Zhu, X., Xiong, Y., Dai, J., Yuan, L., Wei, Y.: Deep feature flow for video recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2349–2358 (2017)
Acknowledgments
This work is funded by Korea Ministry of Land, Infrastructure and Transport(MOLIT) as Innovative Talent Education Program for Smart City.
Author information
Authors and Affiliations
Smart Systems Lab, Yonsei University, Seoul, South Korea
Gitaek Lee, Byeongjin Kim & Wooju Kim
Korea National Defense University, Nonsan, South Korea
Taehwa Lee
- Gitaek Lee
You can also search for this author inPubMed Google Scholar
- Taehwa Lee
You can also search for this author inPubMed Google Scholar
- Byeongjin Kim
You can also search for this author inPubMed Google Scholar
- Wooju Kim
You can also search for this author inPubMed Google Scholar
Corresponding author
Correspondence toWooju Kim.
Editor information
Editors and Affiliations
Mahasarakham University, Mahasarakham, Thailand
Chattrakul Sombattheera
Duke Kunshan University, Kunshan, China
Paul Weng
University of Luxembourg, Esch-sur-Alzette, Luxembourg
Jun Pang
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Lee, G., Lee, T., Kim, B., Kim, W. (2025). TOORNet: Training-Free Online Object Removal Network for Adversarial Attacks on Intelligent Video Surveillance Systems. In: Sombattheera, C., Weng, P., Pang, J. (eds) Multi-disciplinary Trends in Artificial Intelligence. MIWAI 2024. Lecture Notes in Computer Science(), vol 15431. Springer, Singapore. https://doi.org/10.1007/978-981-96-0692-4_30
Download citation
Published:
Publisher Name:Springer, Singapore
Print ISBN:978-981-96-0691-7
Online ISBN:978-981-96-0692-4
eBook Packages:Computer ScienceComputer Science (R0)
Share this paper
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative