Movatterモバイル変換

343Accesses
74Citations
3Altmetric
Explore all metrics

Abstract

This paper address the problems of modeling the appearance of humans and distinguishing human appearance from the appearance of general scenes. We seek a model of appearance and motion that is generic in that it accounts for the ways in which people's appearance varies and, at the same time, is specific enough to be useful for tracking people in natural scenes. Given a 3D model of the person projected into an image we model the likelihood of observing various image cues conditioned on the predicted locations and orientations of the limbs. These cues are taken to be steered filter responses corresponding to edges, ridges, and motion-compensated temporal differences. Motivated by work on the statistics of natural scenes, the statistics of these filter responses for human limbs are learned from training images containing hand-labeled limb regions. Similarly, the statistics of the filter responses in general scenes are learned to define a “background” distribution. The likelihood of observing a scene given a predicted pose of a person is computed, for each limb, using the likelihood ratio between the learned foreground (person) and background distributions. Adopting a Bayesian formulation allows cues to be combined in a principled way. Furthermore, the use of learned distributions obviates the need for hand-tuned image noise models and thresholds. The paper provides a detailed analysis of the statistics of how people appear in scenes and provides a connection between work on natural image statistics and the Bayesian tracking of people.

This is a preview of subscription content,log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Uncertainty and Robustness in Dynamic Vision

Explore related subjects

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Artificial Intelligence

References

Black, M.J. and Anandan, P. 1996. The robust estimation of multiple motions: Parametric and piecewise-smooth flow fields.Computer Vision and Image Understanding, 63(1):75-104.
Google Scholar
Black, M.J. and Jepson, A.D. 1998. Eigentracking: Robust matching and tracking of articulated objects using a view-based representation.International Journal of Computer Vision, 26(1):63-84.
Google Scholar
Bregler, C. and Malik, J. 1998. Tracking people with twists and exponential maps. InIEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 8-15.
Cham, T.-J. and Rehg, J.M. 1999. A multiple hypothesis approach to figure tracking. InIEEE Conference on Computer Vision and Pattern Recognition, CVPR, vol. 1, pp. 239-245.
Google Scholar
Comaniciu, D., Ramesh, V., and Meer, P. 2000. Real-time tracking of non-rigid objects using mean shift. InIEEE Conference on Computer Vision and Pattern Recognition, CVPR, vol. 2, pp. 142- 149.
Google Scholar
Darrell, T., Gordon, G., Harville, M., and Woodfill, J. 2000. Integrated person tracking using stereo, color, and pattern detection.International Journal of Computer Vision, 37(2):175-185.
Google Scholar
DeCarlo, D. and Metaxas, D. 1996. The integration of optical flow and deformable models with applications to human face shape and motion estimation. InIEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 231-238.
Deutscher, J., Blake, A., and Reid, I. 2000. Articulated motion capture by annealed particle filtering. InIEEE Conference on Computer Vision and Pattern Recognition, CVPR, vol. 2, pp. 126-133.
Google Scholar
Fischler, M.A. and Bolles, R.C. 1981. RANSAC random sample consensus:A paradigm for model fitting with applications to image analysis and automated cartography.Communications of the ACM, 26:381-395.
Google Scholar
Freeman, W.T. and Adelson, E.H. 1991. The design and use of steerable filters.IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(9):891-906.
Google Scholar
Gavrila, D.M. 1996. Vision-based 3-D tracking of humans in action. Ph.D. thesis, University of Maryland, College Park, MD.
Google Scholar
Gavrila, D.M. 1999. The visual analysis of human movement: A survey.Computer Vision and Image Understanding, 73(1):82-98.
Google Scholar
Geman, D. and Jedynak, B. 1996. Anactive testing model for tracking roads in satellite images.IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(1):1-14.
Google Scholar
Gordon, N. 1993. A novel approach to nonlinear/non-Gaussian Bayesian state estimation.IEE Proceedings on Radar, Sonar and Navigation, 140(2):107-113.
Google Scholar
Hogg, D.C. 1983. Model-based vision: A program to see a walking person.Image and Vision Computing, 1(1):5-20.
Google Scholar
Haritaoglu, I., Harwood, D., and Davis, L.S. 2000. W4: Real-time surveillance of people and their activities.IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8):809-830.
Google Scholar
Isard, M. and Blake, A. 1998. Condensation-Conditional density propagation for visual tracking.International Journal of Computer Vision, 29(1):5-28.
Google Scholar
Jepson, A.D., Fleet, D.J., and El-Maraghi, T.F. 2001. Robust on-line appearance models for visual tracking. InIEEE Conference on Computer Vision and Pattern Recognition, CVPR, vol. I, pp. 415-422.
Google Scholar
Ju, S.X., Black, M.J., and Yacoob, Y. 1996. Cardboard people: A parameterized model of articulated motion. InInternational Conference on Automatic Face and Gesture Recognition, pp. 38-44.
Kaliath, T. 1951. The divergence and Bhattarcharyya distance measures in signal selection.IEEE Transactions on Communication Technology, COM-15(1):52-60.
Google Scholar
Konishi, S.M., Yuille, A.L., Coughlan, J.M., and Zhu, S.C. 1999. Fundamental bounds on edge detection: An information theoretic evaluation of different edge cues. InIEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 573-579.
Kullback, S. and Leibler, R.A. 1951. On information and sufficiency.Annals of Mathematical Statistics, 22:79-86.
Google Scholar
Lee, A.B., Mumford, D., and Huang, J. 2001. Occlusion models for natural images: A statistical study of a scale-invariant dead leaves model.International Journal of Computer Vision, 41(1/2):35-59.
Google Scholar
Lindeberg, T. 1998. Edge detection and ridge detection with automatic scale selection.International Journal of Computer Vision, 30(2):117-156.
Google Scholar
Moeslund, T.B. and Granum, E. 2001. A survey of computer vision-based human motion capture.Computer Vision and Image Understanding, 18:231-268.
Google Scholar
Nestares, O. and Fleet, D.J. 2001. Probabilistic tracking of motion boundaries with spatiotemporal predictions. InIEEE Conference on Computer Vision and Pattern Recognition, CVPR, vol. II, pp. 358-365.
Google Scholar
Olshausen, B.A. and Field, D.J. 1996. Natural image statistics and efficient coding.Computation in Neural Systems, 7(2):333- 339.
Google Scholar
Ormoneit, D., Sidenbladh, H., Black, M.J., and Hastie, T. 2001. Learning and tracking cyclic human motion. InAdvances in Neural Information Processing Systems 13, T.K. Leen, T.G. Dietterich, and V. Tresp (Eds.), pp. 894-900.
Rasmussen, C. and Hager, G. 2001. Probabilistic data association methods for tracking complex visual objects.IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(6):560-576.
Google Scholar
Rehg, J. and Kanade, T. 1995. Model-based tracking of selfoccluding articulated objects. InIEEE International Conference on Computer Vision, ICCV, pp. 612-617.
Rittscher, J., Kato, J., Joga, S., and Blake, A. 2000. A probabilistic background model for tracking. InEuropean Conference on Computer Vision, ECCV, D. Vernon (Ed.), pp. 336-350.
Rohr, K. 1994. Towards model-based recognition of human movements in image sequences.CVGIP-Image Understanding, 59(1):94-115.
Google Scholar
Rohr, K. 1997. Human movement analysis based on explicit motion models. InMotion-Based Recognition, M. Shah and R. Jain (Eds.), pp. 171-198.
Ruderman, D.L. 1994. The statistics of natural images.Network: Computation in Neural Systems, 5(4):517-548.
Google Scholar
Ruderman, D.L. 1997. Origins of scaling in natural images.Vision Research, 37(23):3385-3395.
Google Scholar
Sidenbladh, H. 2001. Probabilistic tracking and reconstruction of 3D human motion in monocular video sequences. Ph.D. Thesis, KTH, Sweden. TRITA-NA-0114.
Sidenbladh, H. and Black, M.J. 2001. Learning image statistics for Bayesian tracking. InIEEE International Conference on Computer Vision, ICCV, vol. 2, pp. 709-716.
Google Scholar
Sidenbladh, H., Black, M.J., and Fleet, D.J. 2000a. Stochastic tracking of 3D human figures using 2D image motion. InEuropean Conference on Computer Vision, ECCV, D. Vernon (Ed.), vol. 2, pp. 702-718.
Sidenbladh, H., Black, M.J., and Sigal, L. 2002. Implicit probabilistic models of human motion for synthesis and tracking. InEuropean Conference on Computer Vision, ECCV, Copenhagen.
Sidenbladh, H., De la Torre, F., and Black, M.J. 2000b. A framework for modeling the appearance of 3D articulated figures. InInternational Conference on Automatic Face and Gesture Recognition, pp. 368-375.
Simoncelli, E.P. 1997. Statistical models for images: Compression, restoration and optical flow. InAsilomar Conference on Signals, Systems and Computers.
Simoncelli, E.P., Adelson, E.H., and Heeger, D.J. 1991. Probability distributions of optical flow. InIEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 310-315.
Sminchisescu, C. and Triggs, B. 2001. Covariance scaled sampling for monocular 3D body tracking. InIEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 447-454.
Sullivan, J., Blake, A., Isard, M., and MacCormick, J. 1999. Object localization by Bayesian correlation. InIEEE International Conference on Computer Vision, ICCV, vol. 2, pp. 1068-1075.
Google Scholar
Sullivan, J., Blake, A., and Rittscher, J. 2000. Statistical foreground modelling for object localisation. InEuropean Conference on Computer Vision, ECCV, D. Vernon (Ed.), vol. 2, pp. 307-323.
Wachter, S. and Nagel, H. 1999. Tracking of persons in monocular image sequences.Computer Vision and Image Understanding, 74(3):174-192.
Google Scholar
Wren, C., Azarbayejani, A., Darrel, T., and Pentland, A. 1997. Pfinder: Real-time tracking of the human body.IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(7):780-785.
Google Scholar
Yacoob, Y. and Black, M.J. 1999. Parameterized modeling and recognition of activities.Computer Vision and Image Understanding, 73(2):232-247.
Google Scholar
Zhu, S.C. and Mumford, D. 1997. Learning generic prior models for visual computation.IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(11):1236-1250.
Google Scholar

Download references

Author information

Authors and Affiliations

Computational Vision and Active Perception Laboratory, Department of Numerical Analysis and Computer Science, KTH, SE-100 44, Stockholm, Sweden
Hedvig Sidenbladh
Department of Computer Science, Brown University, Box 1910, Providence, RI, 02912, USA
Michael J. Black

Authors

Hedvig Sidenbladh
View author publications
You can also search for this author inPubMed Google Scholar
Michael J. Black
View author publications
You can also search for this author inPubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sidenbladh, H., Black, M.J. Learning the Statistics of People in Images and Video.International Journal of Computer Vision54, 183–209 (2003). https://doi.org/10.1023/A:1023765619733

Download citation

Issue Date:August 2003
DOI:https://doi.org/10.1023/A:1023765619733

Movatterモバイル変換

Learning the Statistics of People in Images and Video

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Uncertainty and Robustness in Dynamic Vision

Uncertainty and Robustness in Dynamic Vision

Uncertainty and Robustness in Dynamic Vision

Explore related subjects

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Access this article

Subscribe and save

Buy Now