- Flávio Santos ORCID:orcid.org/0000-0003-2378-537617,
- Dalila Durães ORCID:orcid.org/0000-0002-8313-702317,
- Francisco S. Marcondes ORCID:orcid.org/0000-0002-2221-226117,
- Niklas Hammerschmidt18,
- Sascha Lange18,
- José Machado ORCID:orcid.org/0000-0003-4121-616917 &
- …
- Paulo Novais ORCID:orcid.org/0000-0002-3549-075417
Part of the book series:Lecture Notes in Computer Science ((LNISA,volume 13113))
Included in the following conference series:
1879Accesses
Abstract
When it is intended to detect violence in the car, audio, speech processing, music, and ambient sound are some of the main points of this problem since it is necessary to find the similarities and differences between these domains. The recent increase in interest in deep learning has allowed practical applications in many areas of signal processing, often surpassing traditional signal processing on a large scale. This paper presents a comparative study of state-of-the-art deep learning architectures applied for inside car violence detection based only on the audio signal. The methodology proposed for audio signal representation was Mel-spectrogram, after an in-depth review of the literature. We build an In-Car video dataset in the experiments and apply four different deep learning architectures to solve the classification problem. The results have shown that the ResNet-18 model presents the best accuracy results on the test set.
This is a preview of subscription content,log in via an institution to check access.
Access this chapter
Subscribe and save
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
Buy Now
- Chapter
- JPY 3498
- Price includes VAT (Japan)
- eBook
- JPY 11439
- Price includes VAT (Japan)
- Softcover Book
- JPY 14299
- Price includes VAT (Japan)
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Arukgoda, A.S.: Improving Sinhala-Tamil translation through deep learning techniques. Ph.D. thesis (2021)
Cho, Y., Bianchi-Berthouze, N., Julier, S.J.: DeepBreath: deep learning of breathing patterns for automatic stress recognition using low-cost thermal imaging in unconstrained settings. In: 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII), pp. 456–463. IEEE (2017)
Choi, K., Fazekas, G., Cho, K., Sandler, M.B.: A tutorial on deep learning for music information retrieval. CoRR abs/1709.04396 (2017).http://arxiv.org/abs/1709.04396
Crocco, M., Cristani, M., Trucco, A., Murino, V.: Audio surveillance: a systematic review. ACM Comput. Surv. (CSUR)48(4), 1–46 (2016)
Gaviria, J.F., et al.: Deep learning-based portable device for audio distress signal recognition in urban areas. Appl. Sci.10(21) (2020).https://doi.org/10.3390/app10217448.https://www.mdpi.com/2076-3417/10/21/7448
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Hossain, M.S., Muhammad, G.: Emotion recognition using deep learning approach from audio-visual emotional big data. Inf. Fusion49, 69–78 (2019)
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)
Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., Keutzer, K.: SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and\(<\)0.5 MB model size. arXiv preprintarXiv:1602.07360 (2016)
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Panchapagesan, S., et al.: Multi-task learning and weighted cross-entropy for DNN-based keyword spotting. In: Interspeech, vol. 9, pp. 760–764 (2016)
Peixoto, B., Lavi, B., Bestagini, P., Dias, Z., Rocha, A.: Multimodal violence detection in videos. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2957–2961. IEEE (2020)
Purwins, H., Li, B., Virtanen, T., Schlüter, J., Chang, S.Y., Sainath, T.: Deep learning for audio signal processing. IEEE J. Sel. Top. Signal Process.13(2), 206–219 (2019)
Rouas, J.L., Louradour, J., Ambellouis, S.: Audio events detection in public transport vehicle. In: 2006 IEEE Intelligent Transportation Systems Conference, pp. 733–738. IEEE (2006)
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: MobileNetV 2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018)
Souto, H., Mello, R., Furtado, A.: An acoustic scene classification approach involving domestic violence using machine learning. In: Anais do XVI Encontro Nacional de Inteligência Artificial e Computacional, pp. 705–716. SBC (2019)
Uçar, A., Demir, Y., Güzeliş, C.: Object recognition and detection with deep learning for autonomous driving applications. Simulation93(9), 759–769 (2017)
Acknowledgments
This work is supported by: European Structural and Investment Funds in the FEDER component, through the Operational Competitiveness and Internationalization Programme (COMPETE 2020) [Project n\(^{\circ }\) 039334; Funding Reference: POCI-01-0247-FEDER- 039334].
Author information
Authors and Affiliations
Algorithm Center, University of Minho, Braga, Portugal
Flávio Santos, Dalila Durães, Francisco S. Marcondes, José Machado & Paulo Novais
Bosch Car Multimedia, Braga, Portugal
Niklas Hammerschmidt & Sascha Lange
- Flávio Santos
You can also search for this author inPubMed Google Scholar
- Dalila Durães
You can also search for this author inPubMed Google Scholar
- Francisco S. Marcondes
You can also search for this author inPubMed Google Scholar
- Niklas Hammerschmidt
You can also search for this author inPubMed Google Scholar
- Sascha Lange
You can also search for this author inPubMed Google Scholar
- José Machado
You can also search for this author inPubMed Google Scholar
- Paulo Novais
You can also search for this author inPubMed Google Scholar
Corresponding author
Correspondence toDalila Durães.
Editor information
Editors and Affiliations
University of Manchester, Manchester, UK
Hujun Yin
Universidad Politecnica de Madrid, Madrid, Spain
David Camacho
University of Birmingham, Birmingham, UK
Peter Tino
University of Manchester, Manchester, UK
Richard Allmendinger
University of Huelva, Huelva, Spain
Antonio J. Tallón-Ballesteros
Southern University of Science and Technology, Shenzhen, China
Ke Tang
Yonsei University, Seoul, Korea (Republic of)
Sung-Bae Cho
University of Minho, Braga, Portugal
Paulo Novais
NOVA University of Lisbon, Lisbon, Portugal
Susana Nascimento
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Santos, F.et al. (2021). In-Car Violence Detection Based on the Audio Signal. In: Yin, H.,et al. Intelligent Data Engineering and Automated Learning – IDEAL 2021. IDEAL 2021. Lecture Notes in Computer Science(), vol 13113. Springer, Cham. https://doi.org/10.1007/978-3-030-91608-4_43
Download citation
Published:
Publisher Name:Springer, Cham
Print ISBN:978-3-030-91607-7
Online ISBN:978-3-030-91608-4
eBook Packages:Computer ScienceComputer Science (R0)
Share this paper
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative