Name		Name	Last commit message	Last commit date
Latest commit History 111 Commits
data		data
eval_configs		eval_configs
figs		figs
prompts		prompts
train_configs		train_configs
video_llama		video_llama
.gitignore		.gitignore
LICENSE		LICENSE
LICENSE_Lavis.md		LICENSE_Lavis.md
LICENSE_Minigpt4.md		LICENSE_Minigpt4.md
LICENSE_Videollama.md		LICENSE_Videollama.md
README.md		README.md
apply_delta.py		apply_delta.py
demo_video.py		demo_video.py
environment.yml		environment.yml
extract_dinov2_feature.py		extract_dinov2_feature.py
feature_extraction.py		feature_extraction.py
generate_submission_file.py		generate_submission_file.py
jupyter		jupyter
train.py		train.py

Repository files navigation

LLMVA-GEBC: Large Language Model with Video Adapter for Generic Event Boundary Captioning

Code for theLOVEU@CVPR2023 Workshop Generic Event Boundary Captioning (GEBC) Chanllenge. Our proposed method achieved a76.14 score on the test set and won the$1^{st}$ place in the challenge. The technical report can be foundhere.

Introduction

We proposes an effective model LLMVA-GEBC (Large Language Model with Video Adapter for Generic Event Boundary Captioning):(1) We utilize a pretrained LLM for generating human-like captions with high quality.(2) To adapt the model to the GEBC task, we take the video Q-former as an adapter and train it with the frozen visual feature extractors and LLM.

Enviroment Preparation

First, you should create a conda environment:

conda env create -f environment.ymlconda activate llmvagebc

Prerequisite Checkpoints

Before using the repository, make sure you have obtained the following checkpoints:

Remember to change the path of checkpointsckpt in the config file.

Data

Download the Kinetic-GEBC dataset fromhttps://sites.google.com/view/loveucvpr23/track2.

For primary visual feature:UsingBLIP-2 to extract primary visual features. We usefeature_extraction.py to do so. Remember to change thevideo_dir andsave_dir intrain_configs/blip2_feature_extract.yaml before you run:

python feature_extraction.py --cfg-path train_configs/blip2_feature_extract.yaml

For other visual features:CLIP to extract frame-level features andOmnivore to extract clip-level features. We usethis pipeline to extract features.

Then, put the extracted features under these three folders:

data/features/eva_vit_g_q_former_tokens_12data/features/clip_fps_15_stride_1_rename,data/features/omnivore_fps_15_len_16_stride_1_rename

You can also directly download the official provided featureshere. But, remember to change theq_former_feature_folder,other_feat_total_size,other_feature_names andother_feature_folders in the config file.

UsingVinVL to extract region-level features. The region feature of a video is saved to multiple.npy files, where each single file contains the region feature of a sampled frame. Merge the feature file paths intovideo_to_frame_index.json in the following format:

{    "video_id": [        "frame_1_feat.npy",        "frame_2_feat.npy",        ...         ],    ...}

Then put this file underdata/features/.

Training and Validation

Firstly, set the configs intrain_configs/${NAME_OF_YOUR_CONFIG_FILE}.yaml.Then run the script

CUDA_VISIBLE_DEVICES=${YOUR_GPU_ID} python train.py \    --cfg-path train_configs/${NAME_OF_YOUR_CONFIG_FILE}.yaml

The results can be found invideo_llama/output/.

Acknowledgement

We are grateful for the following awesome projects our LLMVA-GEBC arising from:

Citation

If you find our code useful, please cite the repo as follows:

@article{tang2023llmva,title={LLMVA-GEBC: Large Language Model with Video Adapter for Generic Event Boundary Captioning},author={Tang, Yunlong and Zhang, Jinrui and Wang, Xiangchen and Wang, Teng and Zheng, Feng},journal={arXiv preprint arXiv:2306.10354},year={2023}}

About

Winner solution to Generic Event Boundary Captioning task in LOVEU Challenge (CVPR 2023 workshop)

Topics

video-captioning pytorch-implementation long-video-understanding

Resources

Readme

License

BSD-3-Clause and 3 other licenses found

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Licenses found

Folders and files

Latest commit

History

Repository files navigation

LLMVA-GEBC: Large Language Model with Video Adapter for Generic Event Boundary Captioning

Introduction

Enviroment Preparation

Prerequisite Checkpoints

Data

Training and Validation

Acknowledgement

Citation

About

Topics

Resources

License

Licenses found

Stars

Watchers

Forks

Releases

Packages

Contributors3

Languages

Movatterモバイル変換

License

Licenses found

zjr2000/LLMVA-GEBC

Folders and files

Latest commit

History

Repository files navigation

LLMVA-GEBC: Large Language Model with Video Adapter for Generic Event Boundary Captioning

Introduction

Enviroment Preparation

Prerequisite Checkpoints

Data

Training and Validation

Acknowledgement

Citation

About

Topics

Resources

License

Licenses found

Stars

Watchers

Forks

Releases

Packages0

Contributors3

Languages

Packages