375Accesses
3Citations
1Altmetric
Abstract
We propose a progressive hierarchical analysis model to perceive the collective activities. Compared with previous activity recognition works, it not only recognizes the collective activities, but also perceives the location and the action category of each individual. At first, we perform the person temporal consistency detection procedure for each individual of the collective activities. A person detection network and conditional random field are used to receive the bounding box sequences of the activity participators. Then, we recognize the individual actions using the learned spatial features and the motion features based on LSTM. At last, the combination of the recognized person-level action category vector, the scene context features and the interaction Context features are used to recognize the collective activities. We evaluate the proposed approach on benchmark collective activity datasets. Extensive experiments demonstrate the effects of the progressive hierarchical analysis model.
This is a preview of subscription content,log in via an institution to check access.
Access this article
Subscribe and save
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
Buy Now
Price includes VAT (Japan)
Instant access to the full article PDF.



Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Ullah A, Muhammad K, Ding W, Palade V, Haq IU, Baik SW (2021) Efficient activity recognition using lightweight cnn and ds-gru network for surveillance applications. Appl Soft Comput 103(12):1–13
Antic B, Ommer B (2014) Learning latent constituents for recognition of group activities in video. In: Proceedings of the European conference on computer vision, pp 33–47
Bagautdinov T, Alahi A, Fua FFP, Savarese S (2017) Social scene understanding: End-to-end multi-person action localization and collective activity recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–10
Blank M, Gorelick L, Shechtman E, Irani M, Basri R (2005) Actions as space-time shapes. In: IEEE international conference on computer vision. pp 1395–1402
Borja-Borja LF, Azorin-Lopez J, Saval-Calvo M, Fuster-Guillo A (2020) Deep learning architecture for group activity recognition using description of local motions. In: Proceedings of the international joint conference on neural networks, pp 1–8
Carreira J, Zisserman A (2017) Quo vadis, action recognition? a new model and the kinetics datase. In: IEEE conference on computer vision and pattern recognition, pp 1–10
Chen B, Ting J.A, Marlin B, de Freitas N (2010) Deep learning of invariant spatio-temporal features from video. In: Workshop of neural information processing systems
Choi W, Savarese S (2014) Understanding collective activities of people from videos. IEEE Trans Pattern Anal Mach Intell 36(6):1242–1257
Choi W, Shahid K, Savarese S (2009) What are they doing? : collective activity classification using spatio-temporal relationship among people. In: IEEE international conference on computer vision workshops, pp 1282–1289
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE conference on computer vision and pattern recognition pp 886–893
Dawn DD, Shaikh SH (2015) A comprehensive survey of human action recognition with spatio-temporal interest point (STIP) detector. The Visual Computer.https://doi.org/10.1007/s00371-015-1066-2
Deng Z, Vahdat A, Hu H, Mori G (2016) Structure inference machines: recurrent neural networks for analyzing relations in group activity recognition. In: IEEE conference on computer vision and pattern recognition, pp 4772–4781
Dixon S, Hansen R, Deneke W (2019) Probabilistic grammar induction for long term human activity parsing. In: Proceedings of the international conference on computational science and computational intelligence, pp 306–311
Donahue J, Hendricks LA, Guadarrama S, Rohrbach M, Venugopalany S, Saenkoz K, Darrell T (2015) Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–13
Donahue J, Hendricks LA, Rohrbach M, Venugopalan S, Guadarrama S, Saenko K, Darrell T (2017) Long-term recurrent convolutional networks for visual recognition and description. IEEE Trans Pattern Anal Mach Intell 39(4):677–691
Feichtenhofer C, Pinz A, Zisserman A (2016) Convolutional two-stream network fusion for video action recognition. In: IEEE conference on computer vision and pattern recognition, pp 1933–1941
Gavrilyuk K, Sanford R, Javan M, Snoek CGM (2020) Actor-transformers for group activity recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 836–845
Hajimirsadeghi H, Mori G (2017) Multi-instance classification by maxmargin training of cardinality-based markov networks. IEEE Trans Pattern Anal Mach Intell 39(9):1839–1852
Hu G, Cui B, He Y, Yu S (2020) Progressive relation learning for group activity recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 977–986
Ibrahim M, Muralidharan S, Deng Z, Vahdat A, Mori G (2016) A hierarchical deep temporal model for group activity recognition. In: IEEE international conference on on computer vision and pattern recognition, pp 1–10
Ibrahim MS, Mori G (2018) Hierarchical relational networks for group activity recognition and retrieval. In: Proceedings of the european conference on computer vision, pp 721–736
Ibrahim MS, Muralidharan S, Deng Z, Vahdat A, Mori G (2016) A hierarchical deep temporal model for group activity recognition. In: IEEE conference on computer vision and pattern recognition, pp 1–10
Ibrahim MS, Muralidharan S, Deng Z, Vahdat A, Mori G (2016) Hierarchical deep temporal models for group activity recognition. pp 1–7. arXiv preprint,arXiv:1607.02643
Ji S, Xu W, Yang M, Yu K (2012) 3D convolutional neural networks for human action recognition. In: IEEE international conference on machine learning, pp 3212–3220
Jia Y (2013) Caffe: An open source convolutional architecture or fast feature embedding.http://caffe.berkeleyvision.org/
Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. In: IEEE conference on computer vision and pattern recognition
Kläser A, Marszalek M, Schmid C (2008) A spatio-temporal descriptor based on 3D gradients. In: British machine vision conference
Krizhevsky A, Sutskever I, Hinton GE (2018) Imagenet classification with deep convolutional neural networks. Communications of the ACM pp. 84–90 (2017) bibitem2018SRN Kıvrak, H., Köse, H.: Social robot navigation in human-robot interactive environments: Social force model approach. In: Proceedings of the signal processing and communications applications conference, pp 1–4
Lan T, Wang Y, Yang W, Robinovitch SN, Mori G (2012) Discriminative latent models for recognizing contextual group activities. IEEE Trans Pattern Anal Mach Intell 34(8):1549–1562
Laptev I (2005) On space-time interest points. IEEE Int J Comput Vis 64:107–123
Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: IEEE conference on computer vision and pattern recognition. pp 1–8
Le QV, Zou WY, Yeung SY, Ng AY (2011) Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. In: IEEE conference on computer vision and pattern recognition. pp 3361–3368
Li X, Chuah MC (2017) Sbgar: semantics based group activity recognition. In: Proceedings of the IEEE international conference on computer vision, pp 2895–2904
Li Z, Gavrilyuk K, Gavves E, Jain M, Snoek CG (2018) Videolstm convolves, attends and flows for action recognition. Comput Vis Image Understand 166:41–50
Liang X, Lin L, Cao L (2013) Learning latent spatio-temporal compositional model for human action recognition. ACM Multimedia, Chengdu
Azar SM, Atigh MG, Nickabadi A, Alahi A (2020) Convolutional relational machine for group activity recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7884–7893
Amer MR, Lei P, Todorovic S (2014) Hirf: Hierarchical random field for collective activity recognition in videos. In: European conference on computer vision. vol 2, pp 571–585
Ni B, Yang X, Gao S (2016) Progressively parsing interactional objects for fine grained action detection. In: IEEE international conference on on computer vision and pattern recognition, pp 1–10
Pei L, Ye M, Xu P, Li T (2014) Fast multi-class action recognition by querying inverted index tables. Multimedia tools and applications.https://doi.org/10.1007/s11042-014-2207-8
Pei L, Ye M, Xu P, Zhao X, Guo G (2014) One example based action detection in hough space. Multimed Tools Appl 72(2):1751–1772
Qi M, Wang Y, Qin J, Li A, Luo J, Gool LV (2020) stagnet: an attentive semantic rnn for group activity and individual action recognition. IEEE Trans Circuits Syst Video Technol 30(2):549–565
Shu T, Todorovic S, Zhu SC (2017) Cern: Confidence-energy recurrent network for group activity recognition. In: IEEE conference on computer vision and pattern recognition, pp 1–10
Shu T, Xie D, Rothrock B, Todorovic S, Zhu SC (2015) Joint inference of groups, events and human roles in aerial videos. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4576–4584
Shu X, Zhang L, Sun Y, Tang J (2020) Host-parasite: graph lstm-in-lstm for group activity recognition. IEEE Trans Neural Netw Learn Syst 99:1–12
Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In: In advances in neural information processing systems, pp 568–576
Singh S, Arora C, Jawahar CV (2016) First person action recognition using deep learned descriptors. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2620–2628
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: IEEE conference on computer vision and pattern recognition. pp 1–10
Taylor GW, Fergus R, LeCun Y, Bregler C (2010) Convolutional learning of spatio-temporal features. In: European conference on computer vision. pp 140–153
Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3d convolutional networks. In: IEEE international conference on computer vision, pp 4489–4497
Wang H, Ullah M.M, Kläser A, Laptev L, Schmid C (2010) Evaluation of local spatio-temporal features for action recognition. In: British machine vision conference
Wang K, Wang X, Lin L, Wang M, Zuo W (2014) 3D human activity recognition with reconfigurable convolutional neural networks. ACM multimedia
Wang L, Qiao Y, Tang X (2015) Action recognition with trajectory-pooled deep-convolutional descriptors. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4305–4314
Wang M, Ni B, Yang X (2017) Recurrent modeling of interaction context for collective activity recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3048–3056
Xue C, Liu P, Liu W (2019) Studies on a video surveillance system designed for deep learning. In: Proceedings of the IEEE conference on imaging systems and techniques, pp 1–5
Zhang S, Benenson R, Schiele B (2015) Filtered feature channels for pedestrian detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1751–1760
Zhou I, Li K, He X, Li M (2016) A generative model for recognizing mixed group activities in still images. In: Proceedings of the twenty-fifth international joint conference on artificial intelligence. July, pp 3654–3660
Zou WY, Zhu S, Ng AY, Yu K (2012) Deep learning of invariant features via simulated fixations in video. In: IEEE conference on neural information processing systems. pp 3212–3220
Acknowledgements
This work was supported by the Research Programs of Henan Science and Technology Department (192102210097, 192102210126, 212102210160, 182102210210), the National Natural Science Foundation of China (61806073) and the Open Project Foundation of Information Technology Research Base of Civil Aviation Administration of China (NO. CAAC-ITRB-201607).
Author information
Authors and Affiliations
School of Computer and Information Engineering, Henan University of Economics and Law, Zhengzhou, 450046, China
Lishen Pei
School of Intelligent Engineering, Zhengzhou University of Aeronautics, Zhengzhou, 450046, China
Xuezhuan Zhao
College of Information Engineering, HeNan Radio Television University, Zhengzhou, 450008, China
Tao Li & Zheng Zhang
- Lishen Pei
You can also search for this author inPubMed Google Scholar
- Xuezhuan Zhao
You can also search for this author inPubMed Google Scholar
- Tao Li
You can also search for this author inPubMed Google Scholar
- Zheng Zhang
You can also search for this author inPubMed Google Scholar
Corresponding author
Correspondence toXuezhuan Zhao.
Ethics declarations
Conflict of interest
The authors have no conflicts of interest to declare that are relevant to the content of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Pei, L., Zhao, X., Li, T.et al. A progressive hierarchical analysis model for collective activity recognition.Neural Comput & Applic34, 12415–12425 (2022). https://doi.org/10.1007/s00521-021-06585-4
Received:
Accepted:
Published:
Issue Date:
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative