anisha2102/docvqaPublic

NotificationsYou must be signed in to change notification settings
Fork25
Star120

Document Visual Question Answering

License

MIT license

120 stars 25 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
models		models
LICENSE		LICENSE
README.md		README.md
create_dataset.py		create_dataset.py
demo.ipynb		demo.ipynb
modeling_layoutlm.py		modeling_layoutlm.py
requirements.txt		requirements.txt
run_docvqa.py		run_docvqa.py
tokenization.py		tokenization.py
utils_docvqa.py		utils_docvqa.py

Repository files navigation

Document Visual Question Answering (DocVQA)

This repo hosts the basic functional code for our approach entitledHyperDQA in theDocument Visual Question Answering competition hosted as a part ofWorkshop on Text and Documents in Deep Learning Era atCVPR2020. Our approach stands at position 4 on theLeaderboard.

Read more about our approach in thisblogpost!

Installation

Virtual Environment Python 3 (Recommended)

Clone the repository

git clone https://github.com/anisha2102/docvqa.git

Install libraries

pip install -r requirements.txt

Downloads

Download the datasetThe dataset for Task 1 can be downloaded from the CompetitionWebsite from the Downloads Section.The dataset consists of document images and their corresponding OCR transcriptions.
Download the pretrained modelDownload the pretrained model for LayoutLM-Base, Uncased fromhere

Prepare dataset

python create_dataset.py \         <data-ocr-folder> \         <data-documents-folder> \         <path-to-train_v1.0.json> \         <train-output-json-path> \         <validation-output-json-path>

Train the model

CUDA_VISIBLE_DEVICES=0 python run_docvqa.py \    --data_dir <data-folder> \    --model_type layoutlm \    --model_name_or_path <pretrained-model-path> \ #example ./models/layoutlm-base-uncased    --do_lower_case \    --max_seq_length 512 \    --do_train \    --num_train_epochs 15 \    --logging_steps 500 \    --evaluate_during_training \    --save_steps 500 \    --do_eval \    --output_dir  <data-folder>/<exp-folder> \    --per_gpu_train_batch_size 8 \    --overwrite_output_dir \    --cache_dir <data-folder>/models \    --skip_match_answers \    --val_json <train-output-json-path> \    --train_json <train-output-json-path> \

Model Checkpoints

Download the pytorch_model.bin file from the link below and copy it to the models folder.Google Drive Link

Demo

Try out the demo on a sample datapoint with demo.ipynb

Acknowledgements

The code and pretrained models are based onLayoutLM andHuggingFace Transformers. Many thanks for their amazing open source contributions.

About

Document Visual Question Answering

Releases

No releases published

Packages

No packages published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Document Visual Question Answering (DocVQA)

Installation

Virtual Environment Python 3 (Recommended)

Downloads

Prepare dataset

Train the model

Model Checkpoints

Demo

Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Languages

Movatterモバイル変換

License

anisha2102/docvqa

Folders and files

Latest commit

History

Repository files navigation

Document Visual Question Answering (DocVQA)

Installation

Virtual Environment Python 3 (Recommended)

Downloads

Prepare dataset

Train the model

Model Checkpoints

Demo

Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Languages

Packages