Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Multi-Aspect Vision Language Pretraining - CVPR2024

NotificationsYou must be signed in to change notification settings

HieuPhan33/CVPR2024_MAVL

Repository files navigation

Introduction

Welcome to the official implementation code for "Decomposing Disease Descriptions for Enhanced Pathology Detection: A Multi-Aspect Vision-Language Matching Framework", accepted at CVPR2024 🎉

Arxiv Version

This work leverages LLM 🤖 to decompose disease descriptions into a set of visual aspects. Our visual aspect vision-language pre-training framework, dubbed MAVL, achieves the state-of-the-art performance across 7 datasets for zero-shot and low-shot fine-tuning settings for disease classification and segmentation.

📝 Citation

If you find our work useful, please cite our paper.

@inproceedings{phan2024decomposing,  title={Decomposing Disease Descriptions for Enhanced Pathology Detection: A Multi-Aspect Vision-Language Pre-training Framework},  author={Phan, Vu Minh Hieu and Xie, Yutong and Qi, Yuankai and Liu, Lingqiao and Liu, Liyang and Zhang, Bowen and Liao, Zhibin and Wu, Qi and To, Minh-Son and Verjans, Johan W},  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},  pages={11492--11501},  year={2024}}

Comparisons with SOTA image-text pre-training models under zero-shot classification on 5 datasets.

DatasetCheXpertChestXray-14PadChest-seenRSNA PneumoniaSIIM-ACR
MethodAUCF1ACCAUCF1ACCAUCF1ACCAUCF1ACCAUCF1ACC
ConVIRT52.1035.6157.4353.1512.3857.8863.7214.5673.4779.2155.6775.0864.2542.8753.42
GLoRIA54.8437.8660.7055.9214.2059.4764.0914.8373.8670.3748.1970.5454.7140.3947.15
BioViL60.0142.1066.1357.8215.6461.3360.3510.6370.4884.1254.5974.4370.2846.4568.22
BioViL-T70.9347.2169.9660.4317.2962.1265.7815.3777.5286.0362.5680.0475.5660.1873.72
CheXzero87.9061.9081.1766.9921.9965.3873.2419.5383.4985.1361.4978.3484.6065.9777.34
MedKLIP87.9763.6784.3272.3324.1879.4077.8726.6392.4485.9462.5779.9789.7972.7383.99
MAVL (Proposed)90.1365.4786.4473.5726.2582.7778.7928.4892.5686.3165.2681.2892.0477.9587.14

💡 Download Necessary Files

To get started, install the gdown library:

pip install -U --no-cache-dir gdown --pre

Then, runbash download.sh

The MIMIC-CXR2 needs to be downloaded fromphysionet.

🚀 Library Installation

We have pushed the docker image with necessary environments.You can directly create a docker container using our docker image:

docker pull stevephan46/mavl:latestdocker run --runtime=nvidia --name mavl -it -v /your/data/root/folder:/data --shm-size=4g stevephan46/mavl:latest

You may need to reinstall opencv-python, as there is some conflicting problem with the docker environmentpip install opencv-python==4.2.0.32

If you prefer manual installation over docker, please run the following installation:

pip install -r requirements.txtpip install torch==1.12.0+cu113 torchvision==0.13.0+cu113 torchaudio==0.12.0 --extra-index-url https://download.pytorch.org/whl/cu113pip install opencv-python==4.2.0.32

🤖 LLM Disease's Visual Concept Generation

The script to generate diseases' visual aspects using LLM - GPT can be foundhere.

🔥 Pre-train:

Our pre-train code is given inPretrain.

  • Run download.sh to download necessary files

  • Modify the path in config file configs/MAVL_resnet.yaml, andpython train_mavl.py to pre-train.

  • Runaccelerate launch --multi_gpu --num_processes=4 --num_machines=1 --num_cpu_threads_per_process=8 train_MAVL.py --root /data/2019.MIMIC-CXR-JPG/2.0.0 --config configs/MAVL_resnet.yaml --bs 124 --num_workers 8

Note: The reported results in our paper are obtained by pre-training on 4 x A100 for 60 epochs. We provided the checkpointshere. We found that ckpts at later stage (checkpoint_full_46.pth) yields higher zero-shot classification accuracy. Ckpt at earlier stage (checkpoint_full_40.pth) yields more stable accuracy on visual grounding.

We also conducted a lighter pre-training schedule with 2 x A100 for 40 epochs using mixed precision training, achieving similar zero-shot classification results. Checkpoint for this setup is also availablehere.

accelerate launch --multi_gpu --num_processes=2 --num_machines=1 --num_cpu_threads_per_process=8 --mixed_precision=fp16 train_MAVL.py --root /data/2019.MIMIC-CXR-JPG/2.0.0 --config configs/MAVL_short.yaml --bs 124 --num_workers 8

📦 Downstream datasets:

Links to download downstream datasets are:

  • CheXpert.
  • ChestXray-14.
  • PadChest.
  • RSNA - Download images from initial annotations.
  • SIIM.
  • COVIDx-CXR-2 - Theofficial link on Kaggle is down. The publicly available expanded version, called COVIDx-CXR4 is releasedhere. They encompass COVIDx-CXR-2 as subset. Please use our dataset csv splits to reproduce the results on COVIDx-CXR-2 subset version.
  • Covid Rural - Theofficial link includes raw DICOM datasets. We use preprocessed data providedhere.

🌟 Quick Start:

Check thislink to download MAVL checkpoints. It can be used for all zero-shot && finetuning tasks

  • Zero-Shot Classification:

    We give examples inSample_Zero-Shot_Classification. Modify the path, and test our model bypython test.py --config configs/dataset_name_mavl.yaml

  • Zero-Shot Grounding:

    We give examples inSample_Zero-Shot_Grounding. Modify the path, and test our model bypython test.py

  • Finetuning:

    We give segmentation and classification finetune code on inSample_Finetuning_SIIMACR. Modify the path, and finetune our model bypython I1_classification/train_res_ft.py --config configs/dataset_name_mavl.yaml orpython I2_segementation/train_res_ft.py --config configs/dataset_name_mavl.yaml

🙏 Acknowledgement

Our code is built uponhttps://github.com/MediaBrain-SJTU/MedKLIP. We thank the authors for open-sourcing their code.

Feel free to reach out if you have any questions or need further assistance!

Releases

No releases published

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp