Movatterモバイル変換


[0]ホーム

URL:


Skip to main content

Advertisement

Springer Nature Link
Log in

A 3D-CNN and multi-loss video prediction architecture

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

The achievements of deep learning in the sphere of computer vision have elevated video prediction to a prominent research focus. The prevailing trend in current deep learning endeavors is to pursue advanced optimization of model architectures and enhancement of their performance metrics. The task of video prediction is inherently complex, and most of the algorithm models proposed in the past are also. In this paper, we propose a novel simple video prediction network structure based on three-Dimensional Convolutional Neural Network (3D-CNN) and multi-loss, abbreviated as ML3DVP. Our network model is completely based on 3D-CNN. Compared with Convolutional Long Short-Term Memory (ConvLSTM), Recurrent Neural Network (RNN), Generative Adversarial Network (GAN) and its variants, we start from the most basic network structure to reduce complexity, thereby improving the speed of model prediction. In addition, most models today will encounter quality problems such as insufficient clarity. To solve this problem, we introduced multiple losses for back propagation. Using multiple quality evaluation indicators, Structural Similarity (SSIM) and Peak Signal-to-Noise Ratio (PSNR), as optimization objectives, continuously improves the prediction quality during the training process. The evaluation of model complexity, parameter count, and predictive outcomes across four datasets substantiates that our proposed model has successfully attained the objectives of structural refinement and enhanced performance.

This is a preview of subscription content,log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Algorithm 1
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Availability of Data and Materials

My manuscript has associated data in a data repository.

Code Availability

The custom code used in this study is available in the GitHub repository athttps://github.com/okayq/ML3DVP.git.

References

  1. Shi X, Chen Z, Wang H, Yeung D-Y, Wong W-K, Woo W-c (2015) Convolutional lstm network: A machine learning approach for precipitation nowcasting. Adv Neural Inf Process Syst 28

  2. Wang Y, Zhang J, Zhu H, Long M, Wang J, Yu PS (2019) Memory in memory: A predictive neural network for learning higher-order non-stationarity from spatiotemporal dynamics. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9154–9162

  3. Wang Y, Jiang L, Yang M-H, Li L-J, Long M, Fei-Fei L (2018) Eidetic 3d lstm: A model for video prediction and beyond. In: International conference on learning representations

  4. Carreira J, Zisserman A (2017) Quo vadis, action recognition? a new model and the kinetics dataset. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6299–6308

  5. Ji S, Xu W, Yang M, Yu K (2012) 3d convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 221–231

  6. Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3d convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 4489–4497

  7. Oprea S, Martinez-Gonzalez P, Garcia-Garcia A, Castro-Vargas JA, Orts-Escolano S, Garcia-Rodriguez J, Argyros A (2020) A review on deep learning techniques for video prediction. IEEE Trans Pattern Anal Mach Intell 2806–2826

  8. Srivastava N, Mansimov E, Salakhudinov R (2015) Unsupervised learning of video representations using lstms. In: International conference on machine learning, pp 843–852

  9. Zhang J, Zheng Y, Qi D (2017) Deep spatio-temporal residual networks for citywide crowd flows prediction. In: Proceedings of the AAAI conference on artificial intelligence

  10. Ionescu C, Papava D, Olaru V, Sminchisescu C (2013) Human3. 6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE Trans Pattern Anal Mach Intell 1325–1339

  11. Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local svm approach. In: Proceedings of the 17th international conference on pattern recognition, 2004., pp 32–36

  12. Wang Y, Long M, Wang J, Gao Z, Yu PS (2017) Predrnn: Recurrent neural networks for predictive learning using spatiotemporal lstms. Adv Neural Inf Process Syst 30

  13. Wang Y, Gao Z, Long M, Wang J, Philip SY (2018) Predrnn++: Towards a resolution of the deep-in-time dilemma in spatiotemporal predictive learning. In: International conference on machine learning, pp 5123–5132

  14. Guen VL, Thome N (2020) Disentangling physical dynamics from unknown factors for unsupervised video prediction. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11474–11484

  15. Gao Z, Tan C, Wu L, Li SZ (2022) Simvp: Simpler yet better video prediction. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3170–3180

  16. Seo M, Lee H, Kim D, Seo J (2023) Implicit stacked autoregressive model for video prediction. arXiv preprintarXiv:2303.07849

  17. Byeon W, Wang Q, Srivastava RK, Koumoutsakos P (2018) Contextvp: Fully context-aware video prediction. In: Proceedings of the european conference on computer vision (ECCV), pp 753–769

  18. Lin Z, Li M, Zheng Z, Cheng Y, Yuan C (2020) Self-attention convlstm for spatiotemporal prediction. In: Proceedings of the AAAI conference on artificial intelligence, pp 11531–11538

  19. Saideni W, Helbert D, Courreges F, Cances JP (2022) A novel video prediction algorithm based on robust spatiotemporal convolutional long short-term memory (robust-st-convlstm). In: Proceedings of seventh international congress on information and communication technology: ICICT 2022, London, pp 193–204

  20. Vondrick C, Torralba A (2017) Generating the future with adversarial transformers. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1020–1028

  21. Walker J, Marino K, Gupta A, Hebert M (2017) The pose knows: Video forecasting by generating pose futures. In: Proceedings of the IEEE international conference on computer vision, pp 3332–3341

  22. Ji Y, Gong B, Langguth M, Mozaffari A, Zhi X (2022) Clgan: A gan-based video prediction model for precipitation nowcasting. EGUsphere 1–23

  23. Jing B, Ding H, Yang Z, Li B, Bao L (2022) Video prediction: a step-by-step improvement of a video synthesis network. Appl Intell 1–13

  24. Liu B, Chen Y, Liu S, Kim H-S (2021) Deep learning in latent space for video prediction and compression. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 701–710

  25. Liang X, Lee L, Dai W, Xing EP (2017) Dual motion gan for future-flow embedded video prediction. In: Proceedings of the IEEE international conference on computer vision, pp 1744–1752

  26. Jang Y, Kim G, Song Y (2018) Video prediction with appearance and motion conditions. In: International conference on machine learning, pp 2225–2234

  27. Farazi H, Nogga J, Behnke S (2021) Local frequency domain transformer networks for video prediction. In: 2021 International Joint Conference on Neural Networks (IJCNN), pp 1–10

  28. Ye X, Bilodeau G-A (2023) Video prediction by efficient transformers. Image Vis Comput 104612

  29. Ye X, Bilodeau G-A (2022) Vptr: Efficient transformers for video prediction. In: 2022 26th International conference on pattern recognition (ICPR), pp 3492–3499

  30. Rakhimov R, Volkhonskiy D, Artemov A, Zorin D, Burnaev E (2020) Latent video transformer. arXiv preprintarXiv:2006.10704

  31. Radford A, Metz L, Chintala S (2015) Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprintarXiv:1511.06434

  32. Fan J, Cao X, Wang Q, Yap P-T, Shen D (2019) Adversarial learning for mono-or multi-modal registration. Med Image Anal 101545

  33. Qin H, Xie W, Li Y, Jiang K, Lei J, Du Q (2023) Weakly supervised adversarial learning via latent space for hyperspectral target detection. Pattern Recognit 109125

  34. Vrskova R, Hudec R, Kamencay P, Sykora P (2022) Human activity classification using the 3dcnn architecture. Appl Sci 931

  35. Naik KJ, Soni A (2021) Video classification using 3d convolutional neural network. In: Advancements in security and privacy initiatives for multimedia images, pp 1–18

  36. Jiang G, Jiang X, Fang Z, Chen S (2021) An efficient attention module for 3d convolutional neural networks in action recognition. Appl Intell 1–15

  37. Wang X, Xie W, Song J (2018) Learning spatiotemporal features with 3dcnn and convgru for video anomaly detection. In: 2018 14th IEEE International conference on signal processing (ICSP), pp 474–479

  38. Majhi S, Dash R, Sa PK (2020) Temporal pooling in inflated 3dcnn for weakly-supervised video anomaly detection. In: 2020 11th International conference on computing, communication and networking technologies (ICCCNT), pp 1–6

  39. Murari K et al (2019) Recurrent 3d convolutional network for rodent behavior recognition. In: ICASSP 2019-2019 IEEE International conference on acoustics, speech and signal processing (ICASSP), pp 1174–1178

  40. Karasawa H, Liu C-L, Ohwada H (2018) Deep 3d convolutional neural network architectures for alzheimer’s disease diagnosis. In: Intelligent information and database systems: 10th asian conference, ACIIDS 2018, Dong Hoi City, Vietnam, March 19-21, 2018, Proceedings, Part I 10, pp 287–296

  41. Riahi A, Elharrouss O, Al-Maadeed S (2022) Bemd-3dcnn-based method for covid-19 detection. Comput Biol Med 105188

  42. Li X, Zhou Y, Du P, Lang G, Xu M, Wu W (2021) A deep learning system that generates quantitative ct reports for diagnosing pulmonary tuberculosis. Appl Intell 4082–4093

  43. Xu S, Liu C, Zong Y, Chen S, Lu Y, Yang L, Ng EY, Wang Y, Wang Y, Liu Y et al (2019) An early diagnosis of oral cancer based on three-dimensional convolutional neural networks. IEEE Access 158603–158611

  44. Collins T, Maktabi M, Barberio M, Bencteux V, Jansen-Winkeln B, Chalopin C, Marescaux J, Hostettler A, Diana M, Gockel I (2021) Automatic recognition of colon and esophagogastric cancer with machine learning and hyperspectral imaging. Diagnostics 1810

  45. Brown K, Dormer J, Fei B, Hoyt K (2019) Deep 3d convolutional neural networks for fast super-resolution ultrasound imaging. In: Medical imaging 2019: ultrasonic imaging and tomography, p 1095502

  46. Anand Kumar G, Sridevi P (2019) Intensity inhomogeneity correction for magnetic resonance imaging of automatic brain tumor segmentation. Microelectronics, electromagnetics and telecommunications: proceedings of the fourth ICMEET 2018:703–711

  47. James G (2013) An introduction to statistical learning. springer

  48. Feng R, Chen M, Song Y (2024) Learning traffic as videos: Short-term traffic flow prediction using mixed-pointwise convolution and channel attention mechanism. Expert Syst Appl 122468

  49. Do Nascimento CAR, Mariani VC, Santos Coelho L (2020) Integrative numerical modeling and thermodynamic optimal design of counter-flow plate-fin heat exchanger applying neural networks. Int J Heat Mass Transfer 120097

  50. Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 600–612

  51. Jia X, De Brabandere B, Tuytelaars T, Gool LV (2016) Dynamic filter networks. Advances in neural information processing systems

  52. Oliu M, Selva J, Escalera S (2018) Folded recurrent neural networks for future video prediction. In: Proceedings of the european conference on computer vision (ECCV), pp 716–731

  53. Yu W, Lu Y, Easterbrook S, Fidler S (2020) Efficient and information-preserving future frame prediction and beyond

  54. Kalchbrenner N, Oord A, Simonyan K, Danihelka I, Vinyals O, Graves A, Kavukcuoglu K (2017) Video pixel networks. In: International conference on machine learning, pp 1771–1779

  55. Wang Y, Gao Z, Long M, Wang J, Philip SY (2018) Predrnn++: Towards a resolution of the deep-in-time dilemma in spatiotemporal predictive learning. In: International conference on machine learning, pp 5123–5132

  56. Villegas R, Yang J, Hong S, Lin X, Lee H (2017) Decomposing motion and content for natural video sequence prediction. arXiv preprintarXiv:1706.08033

  57. Lee AX, Zhang R, Ebert F, Abbeel P, Finn C, Levine S (2018) Stochastic adversarial video prediction. arXiv preprintarXiv:1804.01523

  58. Zhang J, Wang Y, Long M, Jianmin W, Philip SY (2019) Z-order recurrent neural networks for video prediction. In: 2019 IEEE International conference on multimedia and expo (ICME), pp 230–235

  59. Babaeizadeh M, Finn C, Erhan D, Campbell RH, Levine S (2017) Stochastic variational video prediction. arXiv preprintarXiv:1710.11252

  60. Jin B, Hu Y, Zeng Y, Tang Q, Liu S, Ye J (2018) Varnet: Exploring variations for unsupervised video prediction. In: 2018 IEEE/RSJ International conference on intelligent robots and systems (IROS), pp 5801–5806

  61. Lee J, Lee J, Lee S, Yoon S (2018) Mutual suppression network for video prediction using disentangled features. arXiv preprintarXiv:1804.04810

  62. Jin B, Hu Y, Tang Q, Niu J, Shi Z, Han Y, Li X (2020) Exploring spatial-temporal multi-frequency analysis for high-fidelity and temporal-consistency video prediction. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4554–4563

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China under Grant no. 62476126.

Funding

This work is supported by the National Natural Science Foundation of China under Grant no. 62476126.

Author information

Authors and Affiliations

  1. College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, 211106, China

    Ziru Qin & Qun Dai

  2. MIIT Key Laboratory of Pattern Analysis and Machine Intelligence, Nanjing, 211106, China

    Ziru Qin & Qun Dai

Authors
  1. Ziru Qin

    You can also search for this author inPubMed Google Scholar

  2. Qun Dai

    You can also search for this author inPubMed Google Scholar

Contributions

Author 1(First Author): Conceptualization, Data Curation, Investigation, Methodology, Software, Validation, Visualization, Writing-Original Draft. Author 2(Corresponding Author): Project Administration, Supervision, Funding acquisition, Conceptualization, Methodology, Writing-Review & Editing

Corresponding author

Correspondence toQun Dai.

Ethics declarations

Ethics approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Advertisement


[8]ページ先頭

©2009-2025 Movatter.jp