Part of the book series:Lecture Notes in Computer Science ((LNAI,volume 11441))
Included in the following conference series:
2091Accesses
Abstract
The major challenge for 3D human pose estimation is the ambiguity in the process of regressing 3D poses from 2D. The ambiguity is introduced by the poor exploiting of the image cues especially the spatial relations. Previous works try to use a weakly-supervised method to constrain illegal spatial relations instead of leverage image cues directly. We follow the weakly-supervised method to train an end-to-end network by first detecting 2D body joints heatmaps, and then constraining 3D regression through 2D heatmaps. To further utilize the inherent spatial relations, we propose to use a multi-scale recalibrated approach to regress 3D pose. The recalibrated approach is integrated into the network as an independent module, and the scale factor is altered to capture information in different resolutions. With the additional multi-scale recalibration modules, the spatial information in pose is better exploited in the regression process. The whole network is fine-tuned for the extra parameters. The quantitative result on Human3.6m dataset demonstrates the performance surpasses the state-of-the-art. Qualitative evaluation results on the Human3.6m and in-the-wild MPII datasets show the effectiveness and robustness of our approach which can handle some complex situations such as self-occlusions.
This is a preview of subscription content,log in via an institution to check access.
Access this chapter
Subscribe and save
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
Buy Now
- Chapter
- JPY 3498
- Price includes VAT (Japan)
- eBook
- JPY 9380
- Price includes VAT (Japan)
- Softcover Book
- JPY 11725
- Price includes VAT (Japan)
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Sarafianos, N., Boteanu, B., Ionescu, B., Kakadiaris, I.A.: 3D human pose estimation: a review of the literature and analysis of covariates. Comput. Vis. Image Underst.152, 1–20 (2016)
Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2D human pose estimation: new benchmark and state of the art analysis. In: Proceedings of the IEEE Conference on computer Vision and Pattern Recognition, pp. 3686–3693 (2014)
Sigal, L., Balan, A.O., Black, M.J.: HUMANEVA: synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. Int. J. Comput. Vis.87(1–2), 4 (2010)
Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3.6M: large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell.36(7), 1325–1339 (2014)
Moreno-Noguer, F.: 3D human pose estimation from a single image via distance matrix regression. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1561–1570. IEEE (2017)
Chen, C.-H., Ramanan, D.: 3D human pose estimation = 2D pose estimation + matching. In: CVPR, p. 6 (2017)
Martinez, J., Hossain, R., Romero, J., Little, J.J.: A simple yet effective baseline for 3D human pose estimation. In: IEEE International Conference on Computer Vision, p. 3 (2017)
Ramakrishna, V., Kanade, T., Sheikh, Y.: Reconstructing 3D human pose from 2D image landmarks. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7575, pp. 573–586. Springer, Heidelberg (2012).https://doi.org/10.1007/978-3-642-33765-9_41
Pavlakos, G., Zhou, X., Derpanis, K.G., Daniilidis, K.: Coarse-to-fine volumetric prediction for single-image 3D human pose. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1263–1272. IEEE (2017)
Tekin, B., Katircioglu, I., Salzmann, M., Lepetit, V., Fua, P.: Structured prediction of 3D human pose with deep neural networks. arXiv preprint:arXiv:1605.05180 (2016)
Li, S., Chan, A.B.: 3D human pose estimation from monocular images with deep convolutional neural network. In: Cremers, D., Reid, I., Saito, H., Yang, M.H. (eds.) ACCV 2014. LNCS, vol. 9004, pp. 332–347. Springer, Cham (2015).https://doi.org/10.1007/978-3-319-16808-1_23
Li, S., Zhang, W., Chan, A.B.: Maximum-margin structured learning with deep networks for 3D human pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2848–2856 (2015)
Varol, G., et al.: Learning from synthetic humans. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017) (2017)
Kadkhodamohammadi, A., Gangi, A., de Mathelin, M., Padoy, N.: A multi-view RGB-D approach for human pose estimation in operating rooms. In: 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 363–372. IEEE (2017)
Sun, X., Shang, J., Liang, S., Wei, Y.: Compositional human pose regression. In: The IEEE International Conference on Computer Vision (ICCV) (2017)
Zhou, X., Huang, Q., Sun, X., Xue, X., Wei, Y.: Towards 3D human pose estimation in the wild: a weakly-supervised approach. In: IEEE International Conference on Computer Vision (2017)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. arXiv preprint:arXiv:1709.01507 (2017)
Wang, Y., Xie, L., Qiao, S., Zhang, Y., Zhang, W., Yuille, A.L.: Multi-scale spatially-asymmetric recalibration for image classification. arXiv preprint:arXiv:1804.00787 (2018)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Moeslund, T.B., Granum, E.: A survey of computer vision-based human motion capture. Comput. Vis. Image Underst.81(3), 231–268 (2001)
Mehta, D., et al.: VNect: real-time 3D human pose estimation with a single RGB camera. ACM Trans. Graph. (TOG)36(4), 44 (2017)
Wei, S.-E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4724–4732 (2016)
Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016).https://doi.org/10.1007/978-3-319-46484-8_29
Akhter, I., Black, M.J.: Pose-conditioned joint angle limits for 3D human pose reconstruction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1446–1455 (2015)
Zhou, X., Sun, X., Zhang, W., Liang, S., Wei, Y.: Deep kinematic pose regression. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9915, pp. 186–201. Springer, Cham (2016).https://doi.org/10.1007/978-3-319-49409-8_17
He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8691, pp. 346–361. Springer, Cham (2014).https://doi.org/10.1007/978-3-319-10578-9_23
Girshick, R.: Fast R-CNN. arXiv preprint:arXiv:1504.08083 (2015)
Xie, L., Zheng, L., Wang, J., Yuille, A.L., Tian, Q.: Interactive: inter-layer activeness propagation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 270–279 (2016)
Chen, L.-C., Yang, Y., Wang, J., Xu, W., Yuille, A.L.: Attention to scale: scale-aware semantic image segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3640–3649 (2016)
Simo-Serra, E., Quattoni, A., Torras, C., Moreno-Noguer, F.: A joint model for 2D and 3D pose estimation from a single image. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3634–3641. IEEE (2013)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint:arXiv:1502.03167 (2015)
Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 807–814 (2010)
Tome, D., Russell, C., Agapito, L.: Lifting from the deep: convolutional 3D pose estimation from a single image. In: CVPR 2017 Proceedings, pp. 2500–2509 (2017)
Zhou, X., Zhu, M., Pavlakos, G., Leonardos, S., Derpanis, K.G., Daniilidis, K.: MonoCap: monocular human motion capture using a CNN coupled with a geometric prior. IEEE Trans. Pattern Anal. Mach. Intell. (2018)
Mehta, D., Rhodin, H., Casas, D., Sotnychenko, O., Xu, W., Theobalt, C.: Monocular 3D human pose estimation using transfer learning and improved CNN supervision. arXiv preprint:arXiv:1611.09813 (2016)
Acknowledgments
This work is supported by Chinese National Nature Science Foundation (61571062) and the 111 project (NO. B17007). We would like to thank Rui Zhang for helping with Fig. 3 and Dr. Pingyu Wang for instructive discussions. Also, we thank reviewers who gave us useful comments.
Author information
Authors and Affiliations
Beijing Key Laboratory of Networks System Architecture and Convergence, School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing, 100876, China
Ziwei Xie, Hailun Xia & Chunyan Feng
Beijing Laboratory of Advanced Information, Beijing, 100876, China
Hailun Xia & Chunyan Feng
- Ziwei Xie
You can also search for this author inPubMed Google Scholar
- Hailun Xia
You can also search for this author inPubMed Google Scholar
- Chunyan Feng
You can also search for this author inPubMed Google Scholar
Corresponding author
Correspondence toHailun Xia.
Editor information
Editors and Affiliations
Hong Kong University of Science and Technology, Hong Kong, China
Qiang Yang
Nanjing University, Nanjing, China
Zhi-Hua Zhou
University of Macau, Taipa, Macau, China
Zhiguo Gong
Southeast University, Nanjing, China
Min-Ling Zhang
Nanjing University of Aeronautics and Astronautics, Nanjing, China
Sheng-Jun Huang
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Xie, Z., Xia, H., Feng, C. (2019). A Multi-scale Recalibrated Approach for 3D Human Pose Estimation. In: Yang, Q., Zhou, ZH., Gong, Z., Zhang, ML., Huang, SJ. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2019. Lecture Notes in Computer Science(), vol 11441. Springer, Cham. https://doi.org/10.1007/978-3-030-16142-2_31
Download citation
Published:
Publisher Name:Springer, Cham
Print ISBN:978-3-030-16141-5
Online ISBN:978-3-030-16142-2
eBook Packages:Computer ScienceComputer Science (R0)
Share this paper
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative