ywyue/FiT3DPublic

NotificationsYou must be signed in to change notification settings
Fork9
Star287

[ECCV 2024] Improving 2D Feature Representations by 3D-Aware Fine-Tuning

License

MIT license

287 stars 9 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
arguments		arguments
assets		assets
datasets		datasets
eval_scripts		eval_scripts
evaluation		evaluation
gaussian_renderer		gaussian_renderer
mmcv		mmcv
mmsegmentation		mmsegmentation
scannetpp_preprocess		scannetpp_preprocess
scene		scene
submodules		submodules
utils		utils
.gitignore		.gitignore
FiT3D_demo.ipynb		FiT3D_demo.ipynb
LICENSE		LICENSE
README.md		README.md
app.py		app.py
engine.py		engine.py
finetune.py		finetune.py
gen_commands.py		gen_commands.py
hubconf.py		hubconf.py
linear_evaluate_depth.py		linear_evaluate_depth.py
linear_evaluate_fit3d.py		linear_evaluate_fit3d.py
linear_evaluate_segmentation.py		linear_evaluate_segmentation.py
requirements.txt		requirements.txt
train_feat_gaussian.py		train_feat_gaussian.py
write_feat_gaussian.py		write_feat_gaussian.py

Repository files navigation

Improving 2D Feature Representations by 3D-Aware Fine-Tuning

ECCV 2024

Yuanwen Yue¹,Anurag Das²,Francis Engelmann^1,3,Siyu Tang¹,Jan Eric Lenssen²

¹ETH Zurich,²Max Planck Institute for Informatics,³Google

Project Page |Paper

This is the official repository for the paper Improving 2D Feature Representations by 3D-Aware Fine-Tuning.

Changelog

Add Colab Notebook and Hugging Face demo
Release ScanNet++ preprocessing code
Release feature Gaussian training code
Release fine-tuning code
Release evaluation code

Table of Contents

Demo

We provide aColab Notebook with step-by-step guides to make inference and visualize the PCA features and K-Means clustering of original 2D models and our fine-tuned models.We also provide an onlineHugging Face demo 🤗 where users can upload their own images and check the visualizations online. Alternatively, to run the demo locally, just trypython app.py.

Preparation

Environment

The code has been tested on Linux with Python 3.10.14, torch 1.9.0, and cuda 11.8.

Create an environment and install pytorch and other required packages:

git clone https://github.com/ywyue/FiT3D.gitcd FiT3Dconda create -n fit3d python=3.10conda activate fit3dpip install torch==2.0.0 torchvision==0.15.1 --index-url https://download.pytorch.org/whl/cu118pip install -r requirements.txt

Compile the feature rasterization modules and the knn module for feature lifting:

cd submodules/diff-feature-gaussian-rasterizationpython setup.py installcd ../simple-knn/python setup.py install

Installmmcv andmmsegmentation, required for downstream evaluation. Note we modifed the source code so please build them from source as follows:
```
cd mmcvMMCV_WITH_OPS=1 pip install -e. -vcd ../mmsegmentationpip install -e. -v
```

Data

We train feature Gaussians and fine-tuning on ScanNet++ scenes. Preprocessing code and instructions arehere. After preprocessing, the ScanNet++ data is expected to be organized as following:

FiT3D/└── db/    └── scannetpp/        ├── metadata/        |    ├── nvs_sem_train.txt  # Training set for NVS and semantic tasks with 230 scenes        |    ├── nvs_sem_val.txt # Validation set for NVS and semantic tasks with 50 scenes        |    ├── train_samples.txt  # Training sample list, formatted as sceneID_imageID        |    ├── val_samples.txt # Validation sample list, formatted as sceneID_imageID        |    ├── train_view_info.npy  # Training sample camera info, e.g. projection matrices        |    └── val_view_info.npy # Validation sample camera info, e.g. projection matrices        └── scenes/            ├── 0a5c013435  # scene id            ├── ...            └── 0a7cc12c0e              ├── images  # undistorted and downscaled images              ├── masks # undistorted and downscaled anonymized masks              ├── points3D.txt  # 3D feature points used by COLMAP              └── transforms_train.json # camera poses in the format used by Nerfstudio

For all other evaluation datasets (ScanNet, NYUd, NYUv2, ADE20k, Pascal VOC, KITTI), please follow their official websites for downloading instructions.

Training

Stage I: Lifting Features to 3D

Example command to train the feature Gaussians for a single scene:

python train_feat_gaussian.py --run_name=example_feature_gaussian_training \                    --model_name=dinov2_small \                    --source_path=db/scannetpp/scenes/0a5c013435 \                    --low_sem_dim=64

model_name indicates the 2D feature extractor and can be selected fromdinov2_small,dinov2_base,dinov2_reg_small,clip_base,mae_base,deit3_base.low_sem_dim is the dimension of the semantic feature vector attached to each Gaussian. Note it should have the same value withNUM_CHANNELS_FEAT insubmodules/diff-feature-gaussian-rasterization/cuda_rasterizer/config.h.

To generate the commands for training Gaussians for all scenes in ScanNet++, run:

python gen_commands.py --train_fgs_commands_folder=train_fgs_commands --model_name=dinov2_small --low_sem_dim=64

Training commands for all scenes will be stored intrain_fgs_commands.

After training, we need to write the parameters of all feature Gaussians to a single file, which will be used in the 2nd stage. To do that, run:

python write_feat_gaussian.py

After that, all the pretrained Gaussians of training scenes are stored aspretrained_feat_gaussians_train.pth and all the pretrained Gaussians of validation scenes are stored aspretrained_feat_gaussians_val.pth. Both files will be stored indb/scannetpp/metadata.

Stage II: Fine-Tuning

In this stage, we use the pretrained Gaussians to render features and use those features as target to finetune the 2D feature extractor. To do that, run

python finetune.py --model_name=dinov2_small \                   --output_dir=output_finemodel \                   --job_name=finetuning_dinov2_small \                   --train_gaussian_list=db/scannetpp/metadata/pretrained_feat_gaussians_train.pth \                   --val_gaussian_list=db/scannetpp/metadata/pretrained_feat_gaussians_val.pth

model_name indicates the 2D feature extractor and should be consistent with the feature extractor used in the first stage. The default fine-tuning epoch is 1, after which the weights of the finetuned model will be saved inoutput_dir/date_job_name.

Evaluation

For visual comparison of PCA features and K-means clustering results, please check ourColab Notebook and demoapp.py.

For quantitative evaluation on downstream tasks, we conduct linear probing evaluation on semantic segmentation and depth estimation.

First, download all evaluation dataset and put them ineval_data as follows:

eval_data/├── scannetpp # semantic segmentation and depth estimation├── scannet # semantic segmentation and depth estimation├── nyu # depth estimation├── nyuv2 # semantic segmentation├── kitti # depth estimation├── ADEChallengeData2016 # semantic segmentation├── VOC2012 # semantic segmentation└── kitti # depth estimation

Processed scannetpp dataset with annotations can be downloaded fromhere. For other datasets, please download from their official websites and their annotations are already provided so no (or only little) preprocessing needed.

Launch the linear probing evaluation. We provide two example scripts:eval_scripts/fit3d/linear_eval_sem.sh for semantic segmentation andeval_scripts/fit3d/linear_eval_depth.sh for depth estimation. Note:
- Changemodel anddataset to adapt the script for evaluation with different models and datasets. See comments in the script.
- The default linear probing evaluation requires8 GPUs for40K iterations for semantic segmentation and38400 iterations for depth estimation. If you use fewer GPUs then the number of iterations needs to be linearly increased. For example, ifngpu is set as4, then80K iterations are required for semantic segmentation (76800 iterations for depth estimation). The number of iterations (parameter calledmax_iters) can be modified from respectiveconfig files.
To evaluate baseline models (i.e. original 2D models), seeeval_scripts/baseline/linear_eval_sem.sh andeval_scripts/baseline/linear_eval_depth.sh

Citation

If you find our code or paper useful, please cite:

@inproceedings{yue2024improving,  title     = {{Improving 2D Feature Representations by 3D-Aware Fine-Tuning}},  author    = {Yue, Yuanwen and Das, Anurag and Engelmann, Francis and Tang, Siyu and Lenssen, Jan Eric},  booktitle = {European Conference on Computer Vision (ECCV)},  year      = {2024}}

About

[ECCV 2024] Improving 2D Feature Representations by 3D-Aware Fine-Tuning

ywyue.github.io/FiT3D/

Releases

No releases published

Packages

No packages published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Improving 2D Feature Representations by 3D-Aware Fine-Tuning

ECCV 2024

Project Page |Paper

Changelog

Demo

Preparation

Environment

Data

Training

Stage I: Lifting Features to 3D

Stage II: Fine-Tuning

Evaluation

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Languages

Movatterモバイル変換

License

ywyue/FiT3D

Folders and files

Latest commit

History

Repository files navigation

Improving 2D Feature Representations by 3D-Aware Fine-Tuning

ECCV 2024

Project Page |Paper

Changelog

Demo

Preparation

Environment

Data

Training

Stage I: Lifting Features to 3D

Stage II: Fine-Tuning

Evaluation

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Languages

Packages