Movatterモバイル変換

Part of the book series:Lecture Notes in Computer Science ((LNIP,volume 8691))

Included in the following conference series:

European Conference on Computer Vision

19kAccesses
124Citations

Abstract

We consider inferring the future actions of people from a still image or a short video clip. Predicting future actions before they are actually executed is a critical ingredient for enabling us to effectively interact with other humans on a daily basis. However, challenges are two fold: First, we need to capture the subtle details inherent in human movements that may imply a future action; second, predictions usually should be carried out as quickly as possible in the social world, when limited prior observations are available.

In this paper, we proposehierarchical movemes - a new representation to describe human movements at multiple levels of granularities, ranging from atomic movements (e.g. an open arm) to coarser movements that cover a larger temporal extent. We develop a max-margin learning framework for future action prediction, integrating a collection of moveme detectors in a hierarchical way. We validate our method on two publicly available datasets and show that it achieves very promising performance.

Download to read the full chapter text

Chapter PDF

Human Action Recognition and Prediction: A Survey

Article28 March 2022

Few-Shot Human Motion Prediction via Meta-learning

Online Action Detection

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Bregler, C.: Learning and recognizing human dynamics in video sequences. In: CVPR (1997)
Google Scholar
Do, T.M.T., Artieres, T.: Large margin training for hidden markov models with partially observed states. In: ICML (2009)
Google Scholar
Dollár, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: ICCV 2005 Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance (2005)
Google Scholar
Zhou, F., De la Torre, F., Hodgins, J.K.: Hierarchical aligned cluster analysis for temporal clustering of human motion. PAMI (2013)
Google Scholar
Fanti, C.: Towards Automatic Discovery of Human Movemes. Ph.D. thesis, California Institute of Technology (2008)
Google Scholar
Hoai, M., De la Torre, F.: Max-margin early event detectors. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (2012)
Google Scholar
Kitani, K.M., Ziebart, B.D., Bagnell, D., Hebert, M.: Activity forecasting. In: European Conference on Computer Vision (2012)
Google Scholar
Kong, Y., Jia, Y., Fu, Y.: Learning human interaction by interactive phrases. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part I. LNCS, vol. 7572, pp. 300–313. Springer, Heidelberg (2012)
Chapter Google Scholar
Koppula, H.S., Saxena, A.: Anticipating human activities using object affordances for reactive robotic response. In: Robotics: Science and Systems, RSS (2013)
Google Scholar
Lan, T., Sigal, L., Mori, G.: Social roles in hierarchical models for human activity recognition. In: Computer Vision and Pattern Recognition, CVPR (2012)
Google Scholar
Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2008)
Google Scholar
Liu, C., Yuen, J., Torralba, A., Sivic, J., Freeman, W.T.: SIFT flow: Dense correspondence across different scenes. In: European Conference on Computer Vision (2008)
Google Scholar
Malisiewicz, T., Gupta, A., Efros, A.A.: Ensemble of exemplar-SVMs for object detection and beyond. In: IEEE International Conference on Computer Vision (2011)
Google Scholar
Patron-Perez, A., Marszalek, M., Reid, I., Zisserman, A.: Structured learning of human interactions in tv shows. PAMI (2013)
Google Scholar
Pellegrini, S., Ess, A., Schindler, K., Gool, L.J.V.: You’ll never walk alone: Modeling social behavior for multi-target tracking. In: ICCV (2009)
Google Scholar
Raptis, M., Sigal, L.: Poselet key-framing: A model for human activity recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2013)
Google Scholar
Ryoo, M., Aggarwal, J.: Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities. In: ICCV (2009)
Google Scholar
Ryoo, M.S.: Human activity prediction: Early recognition of ongoing activities from streaming videos. In: IEEE International Conference on Computer Vision (2011)
Google Scholar
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: A local SVM approach. In: IEEE International Conference on Pattern Recognition, vol. 3, pp. 32–36 (2004)
Google Scholar
Singh, S., Gupta, A., Efros, A.A.: Unsupervised discovery of mid-level discriminative patches. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part II. LNCS, vol. 7573, pp. 73–86. Springer, Heidelberg (2012)
Chapter Google Scholar
Vahdat, A., Gao, B., Ranjbar, M., Mori, G.: A discriminative key pose sequence model for recognizing human interactions. In: VS (2010)
Google Scholar
Wang, H., Schmid, C.: Action recognition with improved trajectories. In: IEEE International Conference on Computer Vision (2013)
Google Scholar
Wang, Y., Tran, D., Liao, Z., Forsyth, D.: Discriminative hierarchical part-based models for human parsing and action recognition. JMLR (2012)
Google Scholar
Wang, Z., Deisenroth, M., Amor, H.B., Vogt, D., Scholkopf, B.: Probabilistic modeling of human movements for intention inference. In: Robotics: Science and Systems, RSS (2013)
Google Scholar
Yang, W., Wang, Y., Mori, G.: Recognizing human actions from still images with latent poses. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2010)
Google Scholar
Yao, A., Gall, J., Gool, L.V.: A hough transform-based voting framework for action recognition. In: CVPR (2010)
Google Scholar
Yao, B., Fei-Fei, L.: Modeling mutual context of object and human pose in human-object interaction activities. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2010)
Google Scholar
Yao, B., Jiang, X., Khosla, A., Lin, A.L., Guibas, L., Fei-Fei, L.: Human action recognition by learning bases of action attributes and parts. In: IEEE International Conference on Computer Vision (2011)
Google Scholar
Yuen, J., Torralba, A.: A data-driven approach for event prediction. In: European Conference on Computer Vision (2010)
Google Scholar
Zhang, Y., Liu, X., Chang, M.-C., Ge, W., Chen, T.: Spatio-temporal phrases for activity recognition. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part III. LNCS, vol. 7574, pp. 707–721. Springer, Heidelberg (2012)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Stanford University, USA
Tian Lan, Tsung-Chuan Chen & Silvio Savarese

Authors

Tian Lan
View author publications
You can also search for this author inPubMed Google Scholar
Tsung-Chuan Chen
View author publications
You can also search for this author inPubMed Google Scholar
Silvio Savarese
View author publications
You can also search for this author inPubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Toront, 6 King’s College Road, M5H 3S5, Toronto, ON, Canada
David Fleet
Faculty of Electrical Engineering, Department of Cybernetics, Czech Technical University in Prague, Technicka 2, 166 27, Prague 6, Czech Republic
Tomas Pajdla
Max-Planck-Institut für Informatik, Campus E1 4, 66123, Saarbrücken, Germany
Bernt Schiele
ESAT - PSI, iMinds, KU Leuven, Kasteelpark Arenberg 10, Bus 2441, 3001, Leuven, Belgium
Tinne Tuytelaars

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lan, T., Chen, TC., Savarese, S. (2014). A Hierarchical Representation for Future Action Prediction. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds) Computer Vision – ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, vol 8691. Springer, Cham. https://doi.org/10.1007/978-3-319-10578-9_45

Download citation

DOI:https://doi.org/10.1007/978-3-319-10578-9_45
Publisher Name:Springer, Cham
Print ISBN:978-3-319-10577-2
Online ISBN:978-3-319-10578-9
eBook Packages:Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics