Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

A Pytorch implementation of "Reconstruction Network for Video Captioning", CVPR 2018

License

NotificationsYou must be signed in to change notification settings

hobincar/RecNet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

84 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This project tries to implementRecNet proposed inReconstruction Network for Video Captioning [1],CVPR 2018.

Environment

  • Ubuntu 16.04
  • CUDA 9.0
  • cuDNN 7.3.1
  • Nvidia Geforce GTX Titan Xp 12GB

Requirements

  • Java 8
  • Python 2.7.12
    • PyTorch 1.0
    • Other python libraries specified in requirements.txt

How to use

Step 1. Setup python virtual environment

$ virtualenv .env$ source .env/bin/activate(.env) $ pip install --upgrade pip(.env) $ pip install -r requirements.txt

Step 2. Prepare Data

  1. Extract Inception-v4 [2] features from datasets, and locate them at<PROJECT ROOT>/<DATASET>/features/<DATASET>_InceptionV4.hdf5. I extracted the Inception-v4 features fromhere.

    DatasetInception-v4
    MSVDlink
    MSR-VTTlink
  2. Split the dataset along with the official splits by running following:

    (.env) $ python -m splits.MSVD(.env) $ python -m splits.MSR-VTT

Step 3. Prepare Evaluation Codes

Clone evaluation codes fromthe official coco-evaluation repo.

(.env) $ git clone https://github.com/tylin/coco-caption.git(.env) $ mv coco-caption/pycocoevalcap .(.env) $ rm -rf coco-caption

Step 4. Train

  • Stage 1 (Encoder-Decoder)

    (.env) $ python train.py -c configs.train_stage1
  • Stage 2 (Encoder-Decoder-Reconstructor

    Set thepretrained_decoder_fpath ofTrainConfig inconfigs/train_stage2.py as the checkpoint path saved at stage 1, then run

    (.env) $ python train.py -c configs.stage2

You can change some hyperparameters by modifyingconfigs/train_stage1.py andconfigs/train_stage2.py.

Step 5. Inference

  1. Set the checkpoint path by changingckpt_fpath ofRunConfig inconfigs/run.py.
  2. Run
    (.env) $ python run.py

Performances

*NOTE: As you can see, the performance of RecNet does not outperform SA-LSTM. Better hyperparameters should be found out.

  • MSVD

    ModelBLEU4CIDErMETEORROUGE_Lpretrained
    SA-LSTM45.376.231.964.2-
    RecNet (global)51.179.734.069.4-
    RecNet (local)52.380.334.169.8-
    (Ours) SA-LSTM50.979.633.469.6link
    (Ours) RecNet (global)49.978.733.269.7link
    (Ours) RecNet (local)49.879.433.269.6link
  • MSR-VTT

    ModelBLEU4CIDErMETEORROUGE_Lpretrained
    SA-LSTM36.339.925.558.3-
    RecNet (global)38.341.726.259.1-
    RecNet (local)39.142.726.659.3-
    (Ours) SA-LSTM38.040.225.658.1link
    (Ours) RecNet (global)37.440.025.558.0link
    (Ours) RecNet (local)37.940.925.758.3link

References

[1] Wang, Bairui, et al. "Reconstruction Network for Video Captioning." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.

[2] Szegedy, Christian, et al. "Inception-v4, inception-resnet and the impact of residual connections on learning." AAAI. Vol. 4. 2017.

About

A Pytorch implementation of "Reconstruction Network for Video Captioning", CVPR 2018

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages


[8]ページ先頭

©2009-2025 Movatter.jp