Movatterモバイル変換

Part of the book series:Lecture Notes in Computer Science ((LNIP,volume 5304))

Included in the following conference series:

European Conference on Computer Vision

8331Accesses

Abstract

This paper presents a probabilistic grammar approach to the recognition of complex events in videos. Firstly, based on the original motion features, a rule induction algorithm is adopted to learn the event rules. Then, a multi-thread parsing (MTP) algorithm is adopted to recognize the complex events involving parallel temporal relation in sub-events, whereas the commonly used parser can only handle the sequential relation. Additionally, a Viterbi-like error recovery strategy is embedded in the parsing process to correct the large time scale errors, such as insertion and deletion errors. Extensive experiments including indoor gymnastic exercises and outdoor traffic events are performed. As supported by experimental results, the MTP algorithm can effectively recognize the complex events due to the strong discriminative representation and the error recovery strategy.

Download to read the full chapter text

Chapter PDF

Soft video parsing by label distribution learning

Article11 April 2019

$$\mathtt {OSL}\alpha $$ : Online Structure Learning Using Background Knowledge Axiomatization

Complex human activities recognition using interval temporal syntactic model

Article01 October 2016

References

Niebles, J.C., Wang, H., Fei-Fei, L.: Unsupervised Learning of Human Action Categories Using Spatial-TemporalWords. In: Proc. Conf. BMVC (2006)
Google Scholar
Laptev, I., Lindeberg, T.: Space-time interest points. In: Proc. Int. Conf. on Computer Vision (ICCV) (2003)
Google Scholar
Laxton, B., Lim, J., Kriegman, D.: Leveraging temporal, contextual and ordering constraints for recognizing complex activities in video. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) (2007)
Google Scholar
Shi, Y., Huang, Y., Minnen, D., Bobick, A., Essa, I.: Propagation networks for recognition of partially ordered sequential action. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) (2004)
Google Scholar
Nguyen, N.T., Phung, D.Q., Venkatesh, S., Bui, H.: Learning and Detecting Activities from Movement Trajectories Using the Hierarchical Hidden Markov Model. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) (2005)
Google Scholar
Xiang, T., Gong, S.: Beyond Tracking: Modelling Activity and Understanding Behaviour. International Journal of Computer Vision (IJCV) 67(1) (2006)
Google Scholar
Min, J., Kasturi, R.: Activity Recognition Based on Multiple Motion Trajectories. In: Proc. Int. Conf. on Pattern Recognition (ICPR), pp. 199–202 (2004)
Google Scholar
Minnen, D., Essa, I., Starner, T.: Expectation Grammars: Leveraging High-Level Expectations for Activity Recognition. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), vol. 2, pp. 626–632 (2003)
Google Scholar
Moore, D., Essa, I.: Recognizing Multitasked Activities from Video Using Stochastic Context-Free Grammar. In: Proc. Conf. AAAI (2002)
Google Scholar
Ivanov, Y.A., Bobick, A.F.: Recognition of visual activities and interactions by stochastic parsing. IEEE TRANS. PAMI 22(8), 852–872 (2000)
Article Google Scholar
Ryoo, M.S., Aggarwal, J.K.: Recognition of Composite Human Activities through Context-Free Grammar Based Representation. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) (2006)
Google Scholar
Yamamoto, M., Mitomi, H., Fujiwara, F., Sato, T.: Bayesian Classification of Task-Oriented Actions Based on Stochastic Context-Free Grammar. In: Proc. Int. Conf. on Automatic Face and Gesture Recognition (FGR) (2006)
Google Scholar
Wang, X., Tieu, K., Grimson, E.: Learning Semantic Scene Models by Trajectory Analysis. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3953, pp. 110–123. Springer, Heidelberg (2006)
Chapter Google Scholar
Zhang, Z., Huang, K.Q., Tan, T.N., Wang, L.S.: Trajectory Series Analysis based Event Rule Induction for Visual Surveillance. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) (2007)
Google Scholar
Hakeem, A., Shah, M.: Learning, detection and representation of multi-agent events in videos. Artif. Intell. 171(8-9), 586–605 (2007)
Article Google Scholar
Nevatia, R., Zhao, T., Hongeng, S.: Hierarchical Language-based Representation of Events in Video Streams. In: Proc. CVPRW on Event Mining (2003)
Google Scholar
Allen, J.F., Ferguson, F.: Actions and Events in Interval Temporal Logical. J. Logic and Computation 4(5), 531–579 (1994)
Article MathSciNet MATH Google Scholar
Johnston, M.: Unification-based Multimodal Parsing. In: Proc. Conf. on COLING-ACL, pp. 624–630 (1998)
Google Scholar
Amengual, J.C., Vidal, E.: Efficient Error-Correcting Viterbi Parsing. IEEE TRANS PAMI 20(10), 1109–1116 (1998)
Article Google Scholar
Stolcke, A.: An efficient probabilistic context-free parsing algorithm that computes prefix probabilities. Computational Linguistics 21(2), 165–201 (1995)
MathSciNet Google Scholar
Snoek, C.G.M., Worring, M.: Multimedia event-based video indexing using time intervals. IEEE TRANS Multimedia 7(4), 638–647 (2005)
Article Google Scholar

Download references

Author information

Authors and Affiliations

National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Science, Beijing, 100190, China
Zhang Zhang, Kaiqi Huang & Tieniu Tan

Authors

Zhang Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Kaiqi Huang
View author publications
You can also search for this author inPubMed Google Scholar
Tieniu Tan
View author publications
You can also search for this author inPubMed Google Scholar

Editor information

Editors and Affiliations

Computer Science Department, University of Illinois at Urbana Champaign, 3310 Siebel Hall, IL 61801, Urbana, USA
David Forsyth
Department of Computing, Oxford Brookes University, OX33 1HX, Wheatley, Oxford, UK
Philip Torr
Department of Engineering Science, University of Oxford, Parks Road, OX1 3PJ, Oxford, UK
Andrew Zisserman

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, Z., Huang, K., Tan, T. (2008). Multi-thread Parsing for Recognizing Complex Events in Videos. In: Forsyth, D., Torr, P., Zisserman, A. (eds) Computer Vision – ECCV 2008. ECCV 2008. Lecture Notes in Computer Science, vol 5304. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88690-7_55

Download citation

DOI:https://doi.org/10.1007/978-3-540-88690-7_55
Publisher Name:Springer, Berlin, Heidelberg
Print ISBN:978-3-540-88689-1
Online ISBN:978-3-540-88690-7
eBook Packages:Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics