elkhouryk/RS-TransCLIPPublic

NotificationsYou must be signed in to change notification settings
Fork2
Star61

[ICASSP 2025] Open-source code for the paper "Enhancing Remote Sensing Vision-Language Models for Zero-Shot Scene Classification"

License

MIT license

61 stars 2 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 105 Commits
TransCLIP_solver		TransCLIP_solver
averageprompt_generator		averageprompt_generator
datasets		datasets
feature_generator		feature_generator
github_data		github_data
results		results
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
run_TransCLIP.py		run_TransCLIP.py
run_averageprompt.py		run_averageprompt.py
run_dataset_download.py		run_dataset_download.py
run_dataset_formatting.py		run_dataset_formatting.py
run_featuregeneration.py		run_featuregeneration.py

Repository files navigation

Enhancing Remote Sensing Vision-Language Models for Zero-Shot Scene Classification - [ICASSP 2025]

Welcome to the GitHub repository forEnhancing Remote Sensing Vision-Language Models for Zero-Shot Scene Classification.

Authors:

K. El Khoury*,M. Zanella*,B. Gérin*,T. Godelaine*,B. Macq,S. Mahmoudi,C. De Vleeschouwer,I. Ben Ayed

*Denotes equal contribution

Updates

Paper accepted to ICASSP 2025. [December 20, 2024]
Paper uploaded on arXiv. [September 1, 2024]

We introduce RS-TransCLIP, a transductive approach inspired fromTransCLIP, that enhances Remote Sensing Vison-Language Modelswithout requiring any labels, only incurring a negligible computational cost to the overall inference time.

Figure 1: Top-1 accuracy of RS-TransCLIP, on ViT-L/14 Remote Sensing Vision-Language Models, for zero-shot scene classification across 10 benchmark datasets.

Contents 📑

Setup 🔧

NB: the Python version used is 3.10.12.

Create a virtual environment and activate it:

# Example using the virtualenv package on linuxpython3 -m pip install --user virtualenvpython3 -m virtualenv RS-TransCLIP-venvsource RS-TransCLIP-venv/bin/activate.csh

Install Pytorch:

pip3 install torch==2.2.2 torchaudio==2.2.2 torchvision==0.17.2

Clone GitHub and move to the appropriate directory:

git clone https://github.com/elkhouryk/RS-TransCLIPcd RS-TransCLIP

Install the remaining Python packages requirements:

pip3 install -r requirements.txt

You are ready to start! 🎉

Datasets 🗂️

10 Remote Sensing Scene Classification datasets are already available for evaluation:

The WHURS19 dataset is already uploaded to the repository for reference and can be used directly.
The following 6 datasets (EuroSAT, OPTIMAL31, PatternNet, RESISC45, RSC11, RSICB256) will be automatically downloaded and formatted from Hugging Face using therun_dataset_download.py script.

# <dataset_name> can take the following values: EuroSAT, OPTIMAL31, PatternNet, RESISC45, RSC11, RSICB256python3 run_dataset_download.py --dataset_name<dataset_name>

Dataset directory structure should be as follows:

$datasets/└── <dataset_name>/  └── classes.txt  └── class_changes.txt  └── images/    └── <classname>_<id>.jpg    └── ...

You must download the AID, MLRSNet and RSICB128 datasets manually from Kaggle and place them in '/datasets/' directory. You can format them manually to follow the dataset directory structure listed above and use them for evaluationOR you can use therun_dataset_formatting.py script by placing the .zip files from Kaggle in the '/datasets/' directory.

# <dataset_name> can take the following values: AID, MLRSNet, RSICB128python3 run_dataset_formatting.py --dataset_name<dataset_name>

Download links:AID |RSICB128 |MLRSNet ---NB: On the Kaggle website, click on the downloadArrow in the center of the page instead of theDownload button to preserve the data structure needed to use the run_dataset_formatting.py_ script (check figure bellow).

Notes:
The class_changes.txt file inserts a space between combined class names. For example, the class name "railwaystation" becomes "railway station." This change is applied consistently across all datasets.
The WHURS19 dataset is already uploaded to the repository for reference.

User Manual 📘

Running RS-TransCLIP consist of three major steps:

We consider 10 scene classification datasets (AID, EuroSAT, MLRSNet, OPTIMAL31, PatternNet, RESISC45, RSC11, RSICB128, RSICB256, WHURS19), 4 VLM models (CLIP, GeoRSCLIP, RemoteCLIP, SkyCLIP50) and 4 model architectures (RN50, ViT-B-32, ViT-L-14, ViT-H-14) for our experiments.

Generating Image and Text Embeddings 🖼️📄

To generate Image embeddings for each dataset/VLM/architecture trio:

python3 run_featuregeneration.py --image_fg

To generate Text embeddings for each dataset/VLM/architecture trio:

python3 run_featuregeneration.py --text_fg

All results for each dataset/VLM/architecture trio will be stored as follows:

$results/└── <dataset_name>/  └── <model_name>    └── <model_architecture>      └── images.pt      └── classes.pt      └── texts_<prompt1>.pt      └── ....      └── texts_<prompt106>.pt

Notes:
Text embeddings will generate 106 individual text embeddings for each VLM/dataset combination, the exhaustive list of all text prompts can be found in run_featuregeneration.py.
When generating Image embeddings, the run_featuregeneration.py script will also generate the ground truth labels and store them in "classes.pt". These labels will be used for evaluation.
Please refer to run_featuregeneration.py to control all the respective arguments.
The embeddings for the WHURS19 dataset are already uploaded to the repository for reference.

Generating the Average Text Embedding ⚖️📄

To generate the Average Text embedding each dataset/VLM/architecture trio:

python3 run_averageprompt.py

Notes:
The run_averageprompt.py script will average out all embeddings with the following name structure "texts_*.pt" for each dataset/VLM/architecture trio and create a file called "texts_averageprompt.pt".
The Average Text embeddings for the WHURS19 dataset are already uploaded to the repository for reference.

Running Transductive Zero-Shot Classification ⚙️🚀

To run Transductive zero-shot classification using RS-TransCLIP:

python3 run_TransCLIP.py

Notes:
The run_TransCLIP.py script will use the Image embeddings "images.pt", the Average Text embedding "texts_averageprompt.pt" and the class ground truth labels "classes.pt" to run Transductive zero-shot classification using RS-TransCLIP.
The run_TransCLIP.py script will also generate the Inductive zero-shot classification for performance comparison.
Both Inductive and Transductive results will be stored in "results/results_averageprompt.csv".
The results for the WHURS19 dataset are already uploaded to the repository for reference.

Table 1: Top-1 accuracy for zero-shot scene classification without (white) and with (blue) RS-TransCLIP on 10 RS datasets.

Citations 📚

Support our work by citing our paper if you use this repository:

@inproceedings{el2025enhancing,  title={Enhancing remote sensing vision-language models for zero-shot scene classification},  author={El Khoury, Karim and Zanella, Maxime and G{\'e}rin, Beno{\^\i}t and Godelaine, Tiffanie and Macq, Beno{\^\i}t and Mahmoudi, Sa{\"\i}d and De Vleeschouwer, Christophe and Ben Ayed, Ismail},  booktitle={ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},  pages={1--5},  year={2025},  organization={IEEE}}

Please also consider citing the original TransCLIP paper:

@article{zanella2024boosting,  title={Boosting vision-language models with transduction},  author={Zanella, Maxime and G{\'e}rin, Beno{\^\i}t and Ben Ayed, Ismail},  journal={Advances in Neural Information Processing Systems},  volume={37},  pages={62223--62256},  year={2024}}

For more details on transductive inference in VLMs, visit the TransCLIP comprehensiverepository.

Contributing 🤝

Feel free to open an issue or pull request if you have any questions or suggestions.

You can also contact us by Email:

karim.elkhoury@uclouvain.bemaxime.zanella@uclouvain.bebenoit.gerin@uclouvain.betiffanie.godelaine@uclouvain.be

About

[ICASSP 2025] Open-source code for the paper "Enhancing Remote Sensing Vision-Language Models for Zero-Shot Scene Classification"

arxiv.org/abs/2409.00698

Contributors4

Languages

Python100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Enhancing Remote Sensing Vision-Language Models for Zero-Shot Scene Classification - [ICASSP 2025]

Updates

Contents 📑

Setup 🔧

Datasets 🗂️

User Manual 📘

Generating Image and Text Embeddings 🖼️📄

Generating the Average Text Embedding ⚖️📄

Running Transductive Zero-Shot Classification ⚙️🚀

Citations 📚

Contributing 🤝

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors4

Uh oh!

Languages