- Notifications
You must be signed in to change notification settings - Fork25
Document Visual Question Answering
License
anisha2102/docvqa
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This repo hosts the basic functional code for our approach entitledHyperDQA in theDocument Visual Question Answering competition hosted as a part ofWorkshop on Text and Documents in Deep Learning Era atCVPR2020. Our approach stands at position 4 on theLeaderboard.
Read more about our approach in thisblogpost!
- Clone the repository
git clone https://github.com/anisha2102/docvqa.git
- Install libraries
pip install -r requirements.txt
Download the datasetThe dataset for Task 1 can be downloaded from the CompetitionWebsite from the Downloads Section.The dataset consists of document images and their corresponding OCR transcriptions.
Download the pretrained modelDownload the pretrained model for LayoutLM-Base, Uncased fromhere
python create_dataset.py \ <data-ocr-folder> \ <data-documents-folder> \ <path-to-train_v1.0.json> \ <train-output-json-path> \ <validation-output-json-path>
CUDA_VISIBLE_DEVICES=0 python run_docvqa.py \ --data_dir <data-folder> \ --model_type layoutlm \ --model_name_or_path <pretrained-model-path> \ #example ./models/layoutlm-base-uncased --do_lower_case \ --max_seq_length 512 \ --do_train \ --num_train_epochs 15 \ --logging_steps 500 \ --evaluate_during_training \ --save_steps 500 \ --do_eval \ --output_dir <data-folder>/<exp-folder> \ --per_gpu_train_batch_size 8 \ --overwrite_output_dir \ --cache_dir <data-folder>/models \ --skip_match_answers \ --val_json <train-output-json-path> \ --train_json <train-output-json-path> \
Download the pytorch_model.bin file from the link below and copy it to the models folder.Google Drive Link
Try out the demo on a sample datapoint with demo.ipynb
The code and pretrained models are based onLayoutLM andHuggingFace Transformers. Many thanks for their amazing open source contributions.