Video Summarization Using Denoising Diffusion Probabilistic Model

Authors

  • Zirui ShangBeijing Institute of Technology
  • Yubo ZhuBeijing Institute of Technology
  • Hongxi LiBeijing Institute of Technology
  • Shuo YangShenzhen MSU-BIT University
  • Xinxiao WuBeijing Institute of TechnologyShenzhen MSU-BIT University

DOI:

https://doi.org/10.1609/aaai.v39i7.32727

Abstract

Video summarization aims to eliminate visual redundancy while retaining key parts of video to construct concise and comprehensive synopses. Most existing methods use discriminative models to predict the importance scores of video frames. However, these methods are susceptible to annotation inconsistency caused by the inherent subjectivity of different annotators when annotating the same video. In this paper, we introduce a generative framework for video summarization that learns how to generate summaries from a probability distribution perspective, effectively reducing the interference of subjective annotation noise. Specifically, we propose a novel diffusion summarization method based on the Denoising Diffusion Probabilistic Model (DDPM), which learns the probability distribution of training data through noise prediction, and generates summaries by iterative denoising. Our method is more resistant to subjective annotation noise, and is less prone to overfitting the training data than discriminative methods, with strong generalization ability. Moreover, to facilitate training DDPM with limited data, we employ an unsupervised video summarization model to implement the earlier denoising process. Extensive experiments on various datasets (TVSum, SumMe, and FPVSum) demonstrate the effectiveness of our method.
AAAI-25 / IAAI-25 / EAAI-25 Proceedings Cover

Downloads

Published

2025-04-11

How to Cite

Shang, Z., Zhu, Y., Li, H., Yang, S., & Wu, X. (2025). Video Summarization Using Denoising Diffusion Probabilistic Model.Proceedings of the AAAI Conference on Artificial Intelligence,39(7), 6776-6784. https://doi.org/10.1609/aaai.v39i7.32727

Issue

Section

AAAI Technical Track on Computer Vision VI