ZhaoPeiduo/BLIP2-JapanesePublic

NotificationsYou must be signed in to change notification settings
Fork1
Star12

Modifying LAVIS' BLIP2 Q-former with models pretrained on Japanese datasets.

License

BSD-3-Clause license

12 stars 1 fork Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
lavis		lavis
run_scripts		run_scripts
samples		samples
tests/models		tests/models
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
evaluate.py		evaluate.py
example.ipynb		example.ipynb
flickr30k_caption_finetune.json		flickr30k_caption_finetune.json
flickr30k_caption_generate.ipynb		flickr30k_caption_generate.ipynb
flickr30k_caption_pretrain.json		flickr30k_caption_pretrain.json
generator-ui.py		generator-ui.py
requirements.txt		requirements.txt
setup.py		setup.py
train.py		train.py

Repository files navigation

BLIP2-Japanese

This project builds uponLAVIS library's BLIP2 mdoel.

The main idea is to replace the tokenizer and the underlying BERT model in Blip2's Qformer with the one trained on Japanese datasets and retrain the upated model on Japanese captioning datasets.

The model has been trained using COCO dataset withSTAIR captions.

Quick Start

The weights of Blip2_Japanese_qformer trained on STAIR can be obtained fromhugging face.

Copy the whole folder under lavis directory, make sure the directory is called pretrained.

Moreover, download bert-base-japanese-whole-word-masking weights and config fromthe hugging face link

You should now be able to run the example.ipynb notebook.

For directory naming conventions, you can also refer to the .gitignore file.

Use Case: Generate Japanese Captions for Captioning Datasets

Captions generated forflickr30k dataset can be found in flickr30k_caption.json. Script in flickr30k_caption_generate.ipynb.

These captions are generated using top-k sampling instead of nucleus.

Captions generated by the pretrained and finetuned models are shown below:

pretrained: {'image': '1001773457.jpg', 'caption': ['二匹の犬が道路でフリスビーをしている']} # No frisbee

finetuned: {'image': '1001773457.jpg', 'caption': ['二匹の犬が道路で喧嘩をしている']}

pretrained: {'image': '1001573224.jpg', 'caption': ['6 人の女性が屋内で飛び跳ねている']} # Wrong head count

finetuned: {'image': '1001573224.jpg', 'caption': ['黒い服を着た女性たちが飛び跳ねている']}

In general, captions generated by the finetuned model are more accurate.

Use Case: Image Retrieval

Refer to the example.ipynb notebooks for more details. The idea is to get the average cosine similarity of query tokens between the image embeddings and the multimodal embeddings.

Model training

The model was trained on a single GTX4080 GPU(laptop). Hence the config during training is modified as follows:

In blip2_pretrain.yaml: vit_precision = 'fp16'

In pretrain_stage1.yaml: batch_size = 25

During evaluation you have to change vit_precision back to fp32.

The pretrained and finetuned weights may be updated without prior notice. So if you cannot reproduce the results in the exmaple notebook, please re-download the weights and try again.

User Interface for Japanese Caption Generator

A simple interface for demo purpose can be found in generator-ui.py. To run the UI:

   python generator-ui.py

About

Modifying LAVIS' BLIP2 Q-former with models pretrained on Japanese datasets.

Packages

No packages published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

BLIP2-Japanese

Quick Start

Use Case: Generate Japanese Captions for Captioning Datasets

Use Case: Image Retrieval

Model training

User Interface for Japanese Caption Generator

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Packages

Uh oh!

Languages

Movatterモバイル変換

License

ZhaoPeiduo/BLIP2-Japanese

Folders and files

Latest commit

History

Repository files navigation

BLIP2-Japanese

Quick Start

Use Case: Generate Japanese Captions for Captioning Datasets

Use Case: Image Retrieval

Model training

User Interface for Japanese Caption Generator

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Packages0

Uh oh!

Languages

Packages