Movatterモバイル変換


[0]ホーム

URL:


Skip to main content

Advertisement

Springer Nature Link
Log in

A Hierarchical Representation for Future Action Prediction

  • Conference paper

Part of the book series:Lecture Notes in Computer Science ((LNIP,volume 8691))

Included in the following conference series:

Abstract

We consider inferring the future actions of people from a still image or a short video clip. Predicting future actions before they are actually executed is a critical ingredient for enabling us to effectively interact with other humans on a daily basis. However, challenges are two fold: First, we need to capture the subtle details inherent in human movements that may imply a future action; second, predictions usually should be carried out as quickly as possible in the social world, when limited prior observations are available.

In this paper, we proposehierarchical movemes - a new representation to describe human movements at multiple levels of granularities, ranging from atomic movements (e.g. an open arm) to coarser movements that cover a larger temporal extent. We develop a max-margin learning framework for future action prediction, integrating a collection of moveme detectors in a hierarchical way. We validate our method on two publicly available datasets and show that it achieves very promising performance.

Similar content being viewed by others

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. Bregler, C.: Learning and recognizing human dynamics in video sequences. In: CVPR (1997)

    Google Scholar 

  2. Do, T.M.T., Artieres, T.: Large margin training for hidden markov models with partially observed states. In: ICML (2009)

    Google Scholar 

  3. Dollár, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: ICCV 2005 Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance (2005)

    Google Scholar 

  4. Zhou, F., De la Torre, F., Hodgins, J.K.: Hierarchical aligned cluster analysis for temporal clustering of human motion. PAMI (2013)

    Google Scholar 

  5. Fanti, C.: Towards Automatic Discovery of Human Movemes. Ph.D. thesis, California Institute of Technology (2008)

    Google Scholar 

  6. Hoai, M., De la Torre, F.: Max-margin early event detectors. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (2012)

    Google Scholar 

  7. Kitani, K.M., Ziebart, B.D., Bagnell, D., Hebert, M.: Activity forecasting. In: European Conference on Computer Vision (2012)

    Google Scholar 

  8. Kong, Y., Jia, Y., Fu, Y.: Learning human interaction by interactive phrases. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part I. LNCS, vol. 7572, pp. 300–313. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  9. Koppula, H.S., Saxena, A.: Anticipating human activities using object affordances for reactive robotic response. In: Robotics: Science and Systems, RSS (2013)

    Google Scholar 

  10. Lan, T., Sigal, L., Mori, G.: Social roles in hierarchical models for human activity recognition. In: Computer Vision and Pattern Recognition, CVPR (2012)

    Google Scholar 

  11. Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2008)

    Google Scholar 

  12. Liu, C., Yuen, J., Torralba, A., Sivic, J., Freeman, W.T.: SIFT flow: Dense correspondence across different scenes. In: European Conference on Computer Vision (2008)

    Google Scholar 

  13. Malisiewicz, T., Gupta, A., Efros, A.A.: Ensemble of exemplar-SVMs for object detection and beyond. In: IEEE International Conference on Computer Vision (2011)

    Google Scholar 

  14. Patron-Perez, A., Marszalek, M., Reid, I., Zisserman, A.: Structured learning of human interactions in tv shows. PAMI (2013)

    Google Scholar 

  15. Pellegrini, S., Ess, A., Schindler, K., Gool, L.J.V.: You’ll never walk alone: Modeling social behavior for multi-target tracking. In: ICCV (2009)

    Google Scholar 

  16. Raptis, M., Sigal, L.: Poselet key-framing: A model for human activity recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2013)

    Google Scholar 

  17. Ryoo, M., Aggarwal, J.: Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities. In: ICCV (2009)

    Google Scholar 

  18. Ryoo, M.S.: Human activity prediction: Early recognition of ongoing activities from streaming videos. In: IEEE International Conference on Computer Vision (2011)

    Google Scholar 

  19. Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: A local SVM approach. In: IEEE International Conference on Pattern Recognition, vol. 3, pp. 32–36 (2004)

    Google Scholar 

  20. Singh, S., Gupta, A., Efros, A.A.: Unsupervised discovery of mid-level discriminative patches. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part II. LNCS, vol. 7573, pp. 73–86. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  21. Vahdat, A., Gao, B., Ranjbar, M., Mori, G.: A discriminative key pose sequence model for recognizing human interactions. In: VS (2010)

    Google Scholar 

  22. Wang, H., Schmid, C.: Action recognition with improved trajectories. In: IEEE International Conference on Computer Vision (2013)

    Google Scholar 

  23. Wang, Y., Tran, D., Liao, Z., Forsyth, D.: Discriminative hierarchical part-based models for human parsing and action recognition. JMLR (2012)

    Google Scholar 

  24. Wang, Z., Deisenroth, M., Amor, H.B., Vogt, D., Scholkopf, B.: Probabilistic modeling of human movements for intention inference. In: Robotics: Science and Systems, RSS (2013)

    Google Scholar 

  25. Yang, W., Wang, Y., Mori, G.: Recognizing human actions from still images with latent poses. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2010)

    Google Scholar 

  26. Yao, A., Gall, J., Gool, L.V.: A hough transform-based voting framework for action recognition. In: CVPR (2010)

    Google Scholar 

  27. Yao, B., Fei-Fei, L.: Modeling mutual context of object and human pose in human-object interaction activities. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2010)

    Google Scholar 

  28. Yao, B., Jiang, X., Khosla, A., Lin, A.L., Guibas, L., Fei-Fei, L.: Human action recognition by learning bases of action attributes and parts. In: IEEE International Conference on Computer Vision (2011)

    Google Scholar 

  29. Yuen, J., Torralba, A.: A data-driven approach for event prediction. In: European Conference on Computer Vision (2010)

    Google Scholar 

  30. Zhang, Y., Liu, X., Chang, M.-C., Ge, W., Chen, T.: Spatio-temporal phrases for activity recognition. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part III. LNCS, vol. 7574, pp. 707–721. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

  1. Stanford University, USA

    Tian Lan, Tsung-Chuan Chen & Silvio Savarese

Authors
  1. Tian Lan

    You can also search for this author inPubMed Google Scholar

  2. Tsung-Chuan Chen

    You can also search for this author inPubMed Google Scholar

  3. Silvio Savarese

    You can also search for this author inPubMed Google Scholar

Editor information

Editors and Affiliations

  1. Department of Computer Science, University of Toront, 6 King’s College Road, M5H 3S5, Toronto, ON, Canada

    David Fleet

  2. Faculty of Electrical Engineering, Department of Cybernetics, Czech Technical University in Prague, Technicka 2, 166 27, Prague 6, Czech Republic

    Tomas Pajdla

  3. Max-Planck-Institut für Informatik, Campus E1 4, 66123, Saarbrücken, Germany

    Bernt Schiele

  4. ESAT - PSI, iMinds, KU Leuven, Kasteelpark Arenberg 10, Bus 2441, 3001, Leuven, Belgium

    Tinne Tuytelaars

Rights and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Lan, T., Chen, TC., Savarese, S. (2014). A Hierarchical Representation for Future Action Prediction. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds) Computer Vision – ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, vol 8691. Springer, Cham. https://doi.org/10.1007/978-3-319-10578-9_45

Download citation

Publish with us


[8]ページ先頭

©2009-2025 Movatter.jp