Multi-Scale, Class-Generic, Privacy-Preserving Video
Abstract
:1. Introduction
2. Background and Related Works
2.1. Semantic Image Segmentation
2.2. DeepLab
- Atrous Convolutions are a type of convolution that introduces a new parameter called the “dilation rate”. While normal convolutional filters map each filter coefficient onto adjacent pixels, atrous convolutions allow for spacing between kernel values. For example, a 3 × 3 kernel with a dilation rate of 2 will convolve each filter weight with every other pixel (in a checkerboard pattern), effectively turning it into a 5 × 5 filter while maintaining the 3 × 3 filter computational cost.
- Atrous Spatial Pyramid Pooling (ASPP) uses multiple atrous convolutions, each with different dilation rates, to capture image information at different scales.
- Fully Connected Conditional Random Fields (CRF) are used to smooth segmentation maps as a post-processing step. These models have two terms. The first one corresponds to the softmax probability of the pixel class assigned to each pixel. The second is a “penalty term” that penalizes pixels that are close together but have different labels. Labels are assigned by finding the maximal probability label assignments under this model.
2.3. Gaussian Blur Algorithm
3. Design and Implementation
3.1. System Design
3.2. Analyzer
3.3. Anonymizer
4. Evaluation
4.1. Datasets
4.2. Experiments
4.2.1. UCF101 Action Recognition
4.2.2. CCPD*
5. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Ren, Z.; Lee, Y.J.; Ryoo, M.S. Learning to anonymize faces for privacy preserving action detection. In Proceedings of the 2018 European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
- Dufaux, F.; Ebrahimi, T. A framework for the validation of privacy protection solutions in video surveillance. In Proceedings of the 2010 IEEE International Conference on Multimedia and Expo, Singapore, 19–23 July 2010; pp. 66–71. [Google Scholar] [CrossRef] [Green Version]
- Padilla-López, J.R.; Chaaraoui, A.A.; Flórez-Revuelta, F. Visual privacy protection methods: A survey.Expert Syst. Appl.2015,42, 4177–4195. [Google Scholar] [CrossRef] [Green Version]
- Olade, I.; Champion, C.; Liang, H.; Fleming, C. The Smart2 Speaker Blocker: An Open-Source Privacy Filter for Connected Home Speakers.arXiv2020, arXiv:1901.04879v3. [Google Scholar]
- Gedraite, E.; Hadad, M. Investigation on the effect of a Gaussian Blur in image filtering and segmentation. In Proceedings of the ELMAR-2011, Zadar, Croatia, 14–16 September 2011; pp. 393–396. [Google Scholar]
- Thomas, R.E.; Banu, S.K.; Tripathy, B.K. Image anonymization using clustering with pixelization.Int. J. Eng. Technol.2018,7, 990–993. [Google Scholar] [CrossRef]
- Ryoo, M.S.; Rothrock, B.; Fleming, C.; Yang, H.J. Privacy-preserving human activity recognition from extreme low resolution. In Proceedings of the 2017 AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017. [Google Scholar]
- Yu, C.; Fleming, C.; Liang, H.N. Scale Invariant Privacy Preserving Video via Wavelet Decomposition.Int. J. Des. Anal. Tools Integr. Circuits Syst.2018,7, 56–58. [Google Scholar]
- Bao, J.; Chen, D.; Wen, F.; Li, H.; Hua, G. Towards open-set identity preserving face synthesis. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6713–6722. [Google Scholar]
- Li, T.; Lin, L. Anonymousnet: Natural face de-identification with measurable privacy. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA, 16–20 June 2019. [Google Scholar]
- He, W.; Zhang, X.Y.; Yin, F.; Luo, Z.; Ogier, J.M.; Liu, C.L. Realtime multi-scale scene text detection with scale-based region proposal network.Pattern Recognit.2020,98, 107026. [Google Scholar] [CrossRef]
- Hao, Z.; Liu, Y.; Qin, H.; Yan, J.; Li, X.; Hu, X. Scale-aware face detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 6186–6195. [Google Scholar]
- Matthews, C.E.; Kuncheva, L.I.; Yousefi, P. Classification and comparison of on-line video summarisation methods.Mach. Vis. Appl.2019,30, 507–518. [Google Scholar] [CrossRef] [Green Version]
- Fan, J.; Luo, H.; Hacid, M.S.; Bertino, E. A novel approach for privacy-preserving video sharing. In Proceedings of the 14th ACM International Conference on Information and Knowledge Management, Bremen, Germany, 31 October–5 November 2005; pp. 609–616. [Google Scholar]
- Yousefi, P.; Kuncheva, L.I. Selective keyframe summarisation for egocentric videos based on semantic concept search. In Proceedings of the 2018 IEEE International Conference on Image Processing, Applications and Systems (IPAS), Sophia Antipolis, France, 12–14 December 2018; pp. 19–24. [Google Scholar]
- Wu, Z.; Wang, Z.; Wang, Z.; Jin, H. Towards privacy-preserving visual recognition via adversarial training: A pilot study. In Proceedings of the 2018 European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 606–624. [Google Scholar]
- Fleming, C.; Peterson, P.; Kline, E.; Reiher, P. Data Tethers: Preventing information leakage by enforcing environmental data access policies. In Proceedings of the 2012 IEEE International Conference on Communications (ICC), Ottawa, ON, Canada, 10–15 June 2012; pp. 835–840. [Google Scholar]
- Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the 2018 European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
- Chen, L.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs.IEEE Trans. Pattern Anal. Mach. Intell.2018,40, 834–848. [Google Scholar] [CrossRef] [PubMed]
- Thoma, M. A Survey of Semantic Segmentation.arXiv2016, arXiv:1602.06541. [Google Scholar]
- Learned-Miller, E.; Huang, G.B.; Roychowdhury, A.; Li, H.; Gang, H. Labeled faces in the wild: A survey. InAdvances in Face Detection and Facial Image Analysis; Springer: Berlin/Heidelberg, Germany, 2016. [Google Scholar]
- Hammer, B.; Biehl, M.; Bunte, K.; Mokbel, B. A general framework for dimensionality reduction for large data sets. In Proceedings of the 2011 International Conference on Advances in Self-Organizing Maps, Espoo, Finland, 13–15 June 2011. [Google Scholar]
- Krähenbühl, P.; Koltun, V. Efficient inference in fully connected CRFs with gaussian edge potentials. InAdvances in Neural Information Processing Systems 24; Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F., Weinberger, K.Q., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2011; pp. 109–117. [Google Scholar]
- Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1800–1807. [Google Scholar]
- Hao, S.; Zhou, Y.; Guo, Y. A Brief Survey on Semantic Segmentation with Deep Learning.Neurocomputing2020,406, 302–321. [Google Scholar] [CrossRef]
- Howard, A.; Sandler, M.; Chu, G.; Chen, L.C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. Searching for mobilenetv3. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27–28 October 2019; pp. 1314–1324. [Google Scholar]
- Ding, H.; Jiang, X.; Shuai, B.; Liu, A.Q.; Wang, G. Semantic correlation promoted shape-variant context for segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 8885–8894. [Google Scholar]
- Zhang, H.; Zhang, H.; Wang, C.; Xie, J. Co-occurrent features in semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 548–557. [Google Scholar]
- Yang, T.; Wu, Y.; Zhao, J.; Guan, L. Semantic segmentation via highly fused convolutional network with multiple soft cost functions.Cogn. Syst. Res.2019,53, 20–30. [Google Scholar] [CrossRef] [Green Version]
- Lazebnik, S.; Schmid, C.; Ponce, J. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), New York, NY, USA, 17–22 June 2006; Volume 2, pp. 2169–2178. [Google Scholar] [CrossRef] [Green Version]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015, Munich, Germany, 5–9 October 2015; Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F., Eds.; Springer International Publishing: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
- Erdélyi, Á.; Winkler, T.; Rinner, B. Privacy protection vs. utility in visual data.Multimed. Tools Appl.2018,77, 2285–2312. [Google Scholar] [CrossRef] [Green Version]
- Chen, L.C.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking Atrous Convolution for Semantic Image Segmentation.arXiv2017, arXiv:1706.05587. [Google Scholar]
- Wei, L.; Rabinovich, A.; Berg, A.C. ParseNet: Looking Wider to See Better.arXiv2015, arXiv:1506.04579v2. [Google Scholar]
- Dr.Sleep. DeepLab-ResNet-TensorFlow. Available online:https://github.com/DrSleep/tensorflow-deeplab-resnet (accessed on 11 May 2019).
- Soomro, K.; Zamir, A.R.; Shah, M. UCF101: A Dataset of 101 Human Actions Classes from Videos in the Wild.arXiv2012, arXiv:1212.0402. [Google Scholar]
- Xu, Z.; Yang, W.; Meng, A.; Lu, N.; Huang, H. Towards end-to-end license plate detection and recognition: A large dataset and baseline. In Proceedings of the 2018 European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 255–271. [Google Scholar]
- Wang, L.; Xiong, Y.; Wang, Z.; Qiao, Y.; Lin, D.; Tang, X.; Val Gool, L. Temporal segment networks: Towards good practices for deep action recognition. In Proceedings of the 2016 European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 8–16 October 2016. [Google Scholar]
- ShadowN1ght. License Plate Detection and Recognition Model (Implemented on Tensorflow). Available online:https://blog.csdn.net/shadown1ght/article/details/78571187 (accessed on 8 May 2019).
- Zhang, Y.; Huang, C. A robust chinese license plate detection and recognition systemin natural scenes. In Proceedings of the 2019 IEEE 4th International Conference on Signal and Image Processing (ICSIP), Wuxi, China, 19–21 July 2019; pp. 137–142. [Google Scholar] [CrossRef]
Dataset | Accuracy | |
---|---|---|
Multi-Scale | Single-Scale | |
Original UCF101 | 93.5% | 93.5% |
Blurred UCF101 | 88.9% | 40% |
1/2 Blurred UCF101 | 88.9% | 36.2% |
1/4 Blurred UCF101 | 88.9% | 31.1% |
1/8 Blurred UCF101 | 88.9% | 28.6% |
Task | Accuracy |
---|---|
Base Detection Rate | 100% |
Base Recognition Rate | 97.8% |
DeepLabv3+ Detection Rate | 98.3% |
Post-anonymization Detection Rate | 10.7% |
Post-anonymized Recogntion Rate | 2.8% |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, Z.; Cilloni, T.; Walter, C.; Fleming, C. Multi-Scale, Class-Generic, Privacy-Preserving Video.Electronics2021,10, 1172. https://doi.org/10.3390/electronics10101172
Zhang Z, Cilloni T, Walter C, Fleming C. Multi-Scale, Class-Generic, Privacy-Preserving Video.Electronics. 2021; 10(10):1172. https://doi.org/10.3390/electronics10101172
Chicago/Turabian StyleZhang, Zhixiang, Thomas Cilloni, Charles Walter, and Charles Fleming. 2021. "Multi-Scale, Class-Generic, Privacy-Preserving Video"Electronics 10, no. 10: 1172. https://doi.org/10.3390/electronics10101172
APA StyleZhang, Z., Cilloni, T., Walter, C., & Fleming, C. (2021). Multi-Scale, Class-Generic, Privacy-Preserving Video.Electronics,10(10), 1172. https://doi.org/10.3390/electronics10101172