Evaluating current state of monocular 3D pose models for golf

Authors

Christian Keilstrup IngwersenTrackMan A/S & Technical University of Denmark
Janus Nørtoft JensenTechnical University of Denmark
Morten Rieger HannemoseTechnical University of Denmark
Anders Bjorholm DahlTechnical University of Denmark

DOI:

Keywords:

Human pose estimation, smpl, sport, 3D pose, 2D pose, kinematic analysis

Abstract

Monocular 3D human pose estimation has reached an impressive performance. State-of-the-art mod- els predict joint locations that can be accurately reprojected back into the image, resulting in vi- sually convincing detections. However, our aim is to use the predicted poses in a domain with high- frequency movements, that is, for video of ath- letes performing golf swings. Our investigation is based on accurate marker-based motion capture data. Also, for our data, the predicted 3D joint locations look convincing when we reproject them into the image. However, by quantitatively com- paring the results with the motion capture data, we see significant model errors that are too erroneous to be used for any kinematic analysis of the move- ments. Thus we conclude that the current models cannot be used out of the box for advanced golf analytics.

References

Q. AB. Qualisys.https://www.qualisys.com/.

I. Akhter and M. J. Black. Pose-conditioned joint angle limits for 3D human pose recon- struction. Proceedings of the IEEE Com- puter Society Conference on Computer Vi- sion and Pattern Recognition, 07-12-June- 2015:1446–1455, 2015. ISSN 10636919. doi: 10.1109/CVPR.2015.7298751.

M. Andriluka, L. Pishchulin, P. Gehler, and B. Schiele. 2d human pose estimation: New benchmark and state of the art analysis. In 2014 IEEE Conference on Computer Vision and Pattern Recognition, pages 3686–3693, 2014. doi: 10.1109/CVPR.2014.471.

B. Artacho and A. Savakis. Unipose: Uni- fied human pose estimation in single images and videos. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7033–7042, 2020. doi: 10.1109/CVPR42600.2020.00706.

F. Bogo, A. Kanazawa, C. Lassner, P. Gehler, J. Romero, and M. J. Black. Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image BT - Com- puter Vision – ECCV 2016. pages 561–578, Cham, 2016. Springer International Publish- ing. ISBN 978-3-319-46454-1. doi: 10.1007/ 978-3-319-46454-1 34. URLhttps://doi.org/10.1007/978-3-319-46454-1_34.

A. Bulat, J. Kossaifi, G. Tzimiropoulos, and M. Pantic. Toward fast and accurate hu- man pose estimation via soft-gated skip con- nections. In 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020), pages 8–15, 2020. doi: 10.1109/FG47880.2020.00014.

X. T. B.V. Xsens.https://www.xsens.com/.

H. Choi, G. Moon, J. Y. Chang, and K. M. Lee. Beyond static features for temporally consistent 3d human pose and shape from a video. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1964–1973, 2021. doi: 10. 1109/CVPR46437.2021.00200.

J. C. Gower. Generalized procrustes analysis. Psychometrika, 40(1):33–51, 1975. ISSN 00333123. doi: 10.1007/BF02291478.

S. Guan, J. Xu, M. Z. He, Y. Wang, B. Ni, and X. Yang. Out-of-domain human mesh recon- struction via dynamic bilevel online adapta- tion. IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 1–16, 2022. doi: 10.1109/TPAMI.2022.3194167.

H. Gulgin, C. Armstrong, and P. Gribble. Hip rotational velocities during the full golf swing. Journal of Sports Science and Medicine, 8(2): 296–299, 2009. ISSN 13032968. doi: 10.1249/00005768-200605001-02539.

R. A. Gu ̈ler, N. Neverova, and I. Kokkinos. Densepose: Dense human pose estimation in the wild. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7297–7306, 2018. doi: 10.1109/CVPR. 2018.00762.

K. He, X. Zhang, S. Ren, and J. Sun. Iden- tity Mappings in Deep Residual Networks. In B. Leibe, J. Matas, N. Sebe, and M. Welling, editors, Computer Vision – ECCV 2016, pages 630–645, Cham, 2016. Springer International Publishing. ISBN 978-3-319-46493-0. doi: 10. 1007/978-3-319-46493-0 38. URLhttps://doi.org/10.1007/978-3-319-46493-0_38.

Y. He, R. Yan, K. Fragkiadaki, and S.-I. Yu. Epipolar transformers. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7776–7785, 2020. doi: 10.1109/CVPR42600.2020.00780.

S. A. Horan, K. Evans, N. R. Morris, and J. J. Kavanagh. Thorax and pelvis kinematics dur- ing the downswing of male and female skilled golfers. Journal of Biomechanics, 43(8):1456– 1462, 2010. ISSN 00219290. doi: 10.1016/j. jbiomech.2010.02.005. URLhttp://dx.doi.org/10.1016/j.jbiomech.2010.02.005.

C. Ionescu, D. Papava, V. Olaru, and C. Smin- chisescu. Human3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(7):1325–1339, 2014. doi: 10.1109/TPAMI. 2013.248.

K. Iskakov, E. Burkov, V. Lempitsky, and Y. Malkov. Learnable triangulation of human pose. In 2019 IEEE/CVF International Con- ference on Computer Vision (ICCV), pages 7717–7726, 2019. doi: 10.1109/ICCV.2019.00781.

A. Kanazawa, M. J. Black, D. W. Jacobs, and J. Malik. End-to-end recovery of human shape and pose. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7122–7131, 2018. doi: 10.1109/CVPR.2018.00744.

A. Kanazawa, J. Y. Zhang, P. Felsen, and J. Malik. Learning 3d human dynamics from video. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5607–5616, 2019. doi: 10.1109/CVPR.2019.00576.

M. Kocabas, N. Athanasiou, and M. J. Black. Vibe: Video inference for human body pose and shape estimation. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5252–5262, 2020. doi: 10.1109/CVPR42600.2020.00530.

M. Kocabas, C.-H. P. Huang, O. Hilliges, and M. J. Black. Pare: Part attention re- gressor for 3d human body estimation. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 11107–11117, 2021. doi: 10.1109/ICCV48922.2021.01094.

N. Kolotouros, G. Pavlakos, M. Black, and K. Daniilidis. Learning to reconstruct 3d hu- man pose and shape via model-fitting in the loop. In 2019 IEEE/CVF International Con- ference on Computer Vision (ICCV), pages 2252–2261, 2019. doi: 10.1109/ICCV.2019.00234.

K. Lin, L. Wang, and Z. Liu. Mesh graphormer. In 2021 IEEE/CVF Inter- national Conference on Computer Vision (ICCV), pages 12919–12928, 2021. doi: 10. 1109/ICCV48922.2021.01270.

K. Lin, L. Wang, and Z. Liu. End-to-end human pose and mesh reconstruction with transformers. In 2021 IEEE/CVF Confer- ence on Computer Vision and Pattern Recog- nition (CVPR), pages 1954–1963, 2021. doi: 10.1109/CVPR46437.2021.00199.

M. Loper, N. Mahmood, J. Romero, G. Pons- Moll, and M. J. Black. Smpl: A skinned multi- person linear model. ACM Trans. Graph., 34 (6), nov 2015. ISSN 0730-0301. doi: 10. 1145/2816795.2818013. URLhttps://doi.org/10.1145/2816795.2818013.

N. Mahmood, N. Ghorbani, N. F. Troje, G. Pons-Moll, and M. Black. Amass: Archive of motion capture as surface shapes. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 5441–5450, 2019. doi: 10.1109/ICCV.2019.00554.

J. Martinez, R. Hossain, J. Romero, and J. J. Little. A simple yet effective baseline for 3d human pose estimation. In 2017 IEEE In- ternational Conference on Computer Vision (ICCV), pages 2659–2668, 2017. doi: 10.1109/ ICCV.2017.288.

C. M. A. H. Matthew Trumble, An- drew Gilbert and J. Collomosse. Total cap- ture: 3d human pose estimation fusing video and inertial sensors. In G. B. Tae-Kyun Kim, Stefanos Zafeiriou and K. Mikolajczyk, edi- tors, Proceedings of the British Machine Vi- sion Conference (BMVC), pages 14.1–14.13. BMVA Press, September 2017. ISBN 1- 901725-60-X. doi: 10.5244/C.31.14. URLhttps://dx.doi.org/10.5244/C.31.14.

S. Mehdizadeh, H. Nabavi, A. Sabo, T. Arora, A. Iaboni, and B. Taati. Concurrent valid- ity of human pose tracking in video for mea- suring gait parameters in older adults: a pre- liminary analysis with multiple trackers, view- ing angles, and walking directions. Journal of NeuroEngineering and Rehabilitation, 18(1): 1–16, 2021. ISSN 17430003. doi: 10.1186/s12984-021-00933-0.

K. Mitchell, S. Banks, D. Morgan, and H. Sugaya. Shoulder Motions During the Golf Swing in Male Amateur Golfers. Journal of Orthopaedic & Sports Physical Therapy, 33 (4):196–203, 2003. doi: 10.2519/jospt.2003.33.4.196. URLhttps://doi.org/10.2519/jospt.2003.33.4.196.

F. Moreno-Noguer. 3d human pose estima- tion from a single image via distance matrix regression. CoRR, abs/1611.09010, 2016. URLhttp://arxiv.org/abs/1611.09010.

S. M. Nesbit. A three dimensional kinematic and kinetic study of the golf swing. Journal of Sports Science and Medicine, 4(4):499–519, 2005. ISSN 13032968.

G. Pavlakos, V. Choutas, N. Ghorbani, T. Bolkart, A. A. Osman, D. Tzionas, and M. J. Black. Expressive body capture: 3d hands, face, and body from a single image. In 2019 IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 10967–10977, 2019. doi: 10.1109/CVPR.2019.01123.

V. Ramakrishna, T. Kanade, and Y. Sheikh. Reconstructing 3D human pose from 2D im- age landmarks. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 7575 LNCS(PART 4):573– 586, 2012. ISSN 03029743. doi: 10.1007/978-3-642-33765-9_41.

N. D. Reddy, L. Guigues, L. Pishchulin, J. Ele- dath, and S. G. Narasimhan. Tessetrack: End- to-end learnable multi-person articulated 3d pose tracking. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recog- nition (CVPR), pages 15185–15195, 2021. doi: 10.1109/CVPR46437.2021.01494.

L. Sigal, A. O. Balan, and M. J. Black. HumanEva: Synchronized Video and Motion Capture Dataset and Baseline Algorithm for Evaluation of Articulated Human Motion. In- ternational Journal of Computer Vision, 87 (1):4, 2009. ISSN 1573-1405. doi: 10.1007/s11263-009-0273-6. URLhttps://doi.org/10.1007/s11263-009-0273-6.

J. Stenum, C. Rossi, and R. T. Roemmich. Two-dimensional video-based anal- ysis of human gait using pose estimation. PLoS Computational Biology, 17(4), 2021. ISSN 15537358. doi: 10.1371/journal.pcbi. 1008935. URLhttp://dx.doi.org/10.1371/journal.pcbi.1008935.

Z. Su, M. Ye, G. Zhang, L. Dai, and J. Sheng. Cascade feature aggregation for human pose estimation, 2019.

K. Sun, B. Xiao, D. Liu, and J. Wang. Deep high-resolution representation learning for hu- man pose estimation. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5686–5696, 2019. doi: 10.1109/CVPR.2019.00584.

Y. Sun, Q. Bao, W. Liu, Y. Fu, M. J. Black, and T. Mei. Monocular, one-stage, regression of multiple 3d people. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 11159–11168, 2021. doi: 10.1109/ICCV48922.2021.01099.

V. M. S. L. UK. Vicon.https://www.vicon.com/.

T. von Marcard, R. Henschel, M. J. Black, B. Rosenhahn, and G. Pons-Moll. Recover- ing Accurate 3D Human Pose in the Wild Us- ing IMUs and a Moving Camera BT - Com- puter Vision – ECCV 2018. pages 614–631, Cham, 2018. Springer International Publish- ing. ISBN 978-3-030-01249-6. doi: 10.1007/ 978-3-030-01249-6 37. URL https://doi. org/10.1007/978-3-030-01249-6_37.

S. Yang, Z. Quan, M. Nie, and W. Yang. Transpose: Keypoint localization via trans- former. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 11782–11792, 2021. doi: 10.1109/ICCV48922.2021.01159.

C. Zheng, W. Wu, C. Chen, T. Yang, S. Zhu, J. Shen, N. Kehtarnavaz, and M. Shah. Deep Learning-Based Human Pose Estimation: A Survey. arXiv e-prints, art. arXiv:2012.13392, Dec. 2020.

X. Zhou, M. Zhu, S. Leonardos, K. G. Der- panis, and K. Daniilidis. Sparseness meets deepness: 3d human pose estimation from monocular video. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 4966–4975, 2016. doi: 10.1109/CVPR.2016.537.

X. Zhou, M. Zhu, G. Pavlakos, S. Leonardos, K. G. Derpanis, and K. Daniilidis. Mono- cap: Monocular human motion capture using a cnn coupled with a geometric prior. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(4):901–914, 2019. doi: 10.1109/TPAMI.2018.2816031.