Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

Winner solution to Generic Event Boundary Captioning task in LOVEU Challenge (CVPR 2023 workshop)

License

BSD-3-Clause and 3 other licenses found

Licenses found

BSD-3-Clause
LICENSE
BSD-3-Clause
LICENSE_Lavis.md
BSD-3-Clause
LICENSE_Minigpt4.md
BSD-3-Clause
LICENSE_Videollama.md
NotificationsYou must be signed in to change notification settings

zjr2000/LLMVA-GEBC

Code for theLOVEU@CVPR2023 Workshop Generic Event Boundary Captioning (GEBC) Chanllenge. Our proposed method achieved a76.14 score on the test set and won the$1^{st}$ place in the challenge. The technical report can be foundhere.

Introduction

We proposes an effective model LLMVA-GEBC (Large Language Model with Video Adapter for Generic Event Boundary Captioning):(1) We utilize a pretrained LLM for generating human-like captions with high quality.(2) To adapt the model to the GEBC task, we take the video Q-former as an adapter and train it with the frozen visual feature extractors and LLM.

LLMVA-GEBC

Enviroment Preparation

First, you should create a conda environment:

conda env create -f environment.ymlconda activate llmvagebc

Prerequisite Checkpoints

Before using the repository, make sure you have obtained the following checkpoints:

Remember to change the path of checkpointsckpt in the config file.

Data

Download the Kinetic-GEBC dataset fromhttps://sites.google.com/view/loveucvpr23/track2.

For primary visual feature:UsingBLIP-2 to extract primary visual features. We usefeature_extraction.py to do so. Remember to change thevideo_dir andsave_dir intrain_configs/blip2_feature_extract.yaml before you run:

python feature_extraction.py --cfg-path train_configs/blip2_feature_extract.yaml

For other visual features:CLIP to extract frame-level features andOmnivore to extract clip-level features. We usethis pipeline to extract features.

Then, put the extracted features under these three folders:

data/features/eva_vit_g_q_former_tokens_12data/features/clip_fps_15_stride_1_rename,data/features/omnivore_fps_15_len_16_stride_1_rename

You can also directly download the official provided featureshere. But, remember to change theq_former_feature_folder,other_feat_total_size,other_feature_names andother_feature_folders in the config file.

UsingVinVL to extract region-level features. The region feature of a video is saved to multiple.npy files, where each single file contains the region feature of a sampled frame. Merge the feature file paths intovideo_to_frame_index.json in the following format:

{    "video_id": [        "frame_1_feat.npy",        "frame_2_feat.npy",        ...         ],    ...}

Then put this file underdata/features/.

Training and Validation

Firstly, set the configs intrain_configs/${NAME_OF_YOUR_CONFIG_FILE}.yaml.Then run the script

CUDA_VISIBLE_DEVICES=${YOUR_GPU_ID} python train.py \    --cfg-path train_configs/${NAME_OF_YOUR_CONFIG_FILE}.yaml

The results can be found invideo_llama/output/.

Acknowledgement

We are grateful for the following awesome projects our LLMVA-GEBC arising from:

Citation

If you find our code useful, please cite the repo as follows:

@article{tang2023llmva,title={LLMVA-GEBC: Large Language Model with Video Adapter for Generic Event Boundary Captioning},author={Tang, Yunlong and Zhang, Jinrui and Wang, Xiangchen and Wang, Teng and Zheng, Feng},journal={arXiv preprint arXiv:2306.10354},year={2023}}

About

Winner solution to Generic Event Boundary Captioning task in LOVEU Challenge (CVPR 2023 workshop)

Topics

Resources

License

BSD-3-Clause and 3 other licenses found

Licenses found

BSD-3-Clause
LICENSE
BSD-3-Clause
LICENSE_Lavis.md
BSD-3-Clause
LICENSE_Minigpt4.md
BSD-3-Clause
LICENSE_Videollama.md

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp