wvangansbeke/Unsupervised-Semantic-SegmentationPublic

NotificationsYou must be signed in to change notification settings
Fork55
Star412

Unsupervised Semantic Segmentation by Contrasting Object Mask Proposals. [ICCV 2021]

openaccess.thecvf.com/content/ICCV2021/html/Van_Gansbeke_Unsupervised_Semantic_Segmentation_by_Contrasting_Object_Mask_Proposals_ICCV_2021_paper.html

License

View license

412 stars 55 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
images		images
pretrain		pretrain
saliency		saliency
segmentation		segmentation
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Repository files navigation

Unsupervised Semantic Segmentation by Contrasting Object Mask Proposals

This repo contains the Pytorch implementation of our paper:

Unsupervised Semantic Segmentation by Contrasting Object Mask Proposals
Wouter Van Gansbeke,Simon Vandenhende,Stamatios Georgoulis, andLuc Van Gool.

Accepted at ICCV 2021 (Slides).
🏆SOTA for unsupervised semantic segmentation.
Check outPapers With Code for theUnsupervised Semantic Segmentation benchmark and more details.
Check out our follow-up workMaskDistill with improvements upto +11% mIoU on PASCAL VOC.

Introduction

Being able to learn dense semantic representations of images without supervision is an important problem in computer vision. However, despite its significance, this problem remains rather unexplored, with a few exceptions that considered unsupervised semantic segmentation on small-scale datasets with a narrow visual domain. We make a first attempt to tackle the problem on datasets that have been traditionally utilized for the supervised case (e.g. PASCAL VOC). To achieve this, we introduce a novel two-step framework that adopts a predetermined prior in a contrastive optimization objective to learn pixel embeddings.Additionally, we argue about the importance of having a prior that contains information about objects, or their parts, and discuss several possibilities to obtain such a prior in an unsupervised manner. In particular, we adopt a mid-level visual prior to group pixels together and contrast the obtained object mask porposals. For this reason we name the methodMaskContrast.

Installation

The Python code is compatible with Pytorch version 1.4 (version 1.5 should work as well).AssumingAnaconda, the most important packages can be installed as:

conda install pytorch=1.4.0 torchvision=0.5.0 cudatoolkit=10.0 -c pytorchconda install -c conda-forge opencv# For image transformationsconda install matplotlib scipy scikit-learn# For evaluationconda install pyyaml easydict# For using config filesconda install termcolor# For colored print statements

We refer to therequirements.txt file for an overview of the packages in the environment we used to produce our results.The code was run on 2 Tesla V100 GPUs.

Training MaskContrast

Setup

The PASCAL VOCdataset will be downloaded automatically when running the code for the first time. The dataset includes the precomputed supervised and unsupervised saliency masks, following the implementation from the paper.

The following files (in thepretrain/ andsegmentation/ directories) need to be adapted in order to run the code on your own machine:

Change the file path for the datasets indata/util/mypath.py. The PASCAL VOC dataset will be saved to this path.
Specify the output directory inconfigs/env.yml. All results will be stored under this directory.

Pre-train model

The training procedure consists of two steps. First, pixels are grouped together based upon a mid-level visual prior (saliency is used). Then, a pre-training strategy is proposed to contrast the pixel-embeddings of the obtained object masks. The code for the pre-training can be found in thepretrain/ directory and the configuration files are located in thepretrain/configs/ directory. You can choose to run the model with the masks from the supervised or unsupervised saliency model.For example, run the following command to perform the pre-training step on PASCAL VOC with the supervised saliency model:

cd pretrainpython main.py --config_env configs/env.yml --config_exp configs/VOCSegmentation_supervised_saliency_model.yml

Evaluation

Linear Classifier (LC)

We freeze the weights of the pre-trained model and train a 1 x 1 convolutional layer to predict the class assignments from the generated feature representations. Since the discriminative power of a linear classifier is low, the pixel embeddings need to be informative of the semantic class to solve the task in this way. To train the classifier run the following command:

cd segmentationpython linear_finetune.py --config_env configs/env.yml --config_exp configs/linear_finetune/linear_finetune_VOCSegmentation_supervised_saliency.yml

Note, make sure that thepretraining variable inlinear_finetune_VOCSegmentation_supervised_saliency.yml points to the location of your pre-trained model.You should get the following results:

mIoU is 63.95IoU class background is 90.95IoU class aeroplane is 83.78IoU class bicycle is 30.66IoU class bird is 78.79IoU class boat is 64.57IoU class bottle is 67.31IoU class bus is 84.24IoU class car is 76.77IoU class cat is 79.10IoU class chair is 21.24IoU class cow is 66.45IoU class diningtable is 46.63IoU class dog is 73.25IoU class horse is 62.61IoU class motorbike is 69.66IoU class person is 72.30IoU class pottedplant is 40.15IoU class sheep is 74.70IoU class sofa is 30.43IoU class train is 74.67IoU class tvmonitor is 54.66

Unsurprisingly, the model has not learned a good representation for every class since some classes are hard to distinguish, e.g.chair orsofa.

We visualize a few examples after CRF post-processing below.

Clustering (K-means)

The feature representations are clustered with K-means. If the pixel embeddings are disentangled according to the defined class labels, we can match the predicted clusters with the ground-truth classes using the Hungarian matching algorithm.

cd segmentationpython kmeans.py --config_env configs/env.yml --config_exp configs/kmeans/kmeans_VOCSegmentation_supervised_saliency.yml

Remarks: Note that we perform the complete K-means fitting on the validation set to save memory and that the reported results were averaged over 5 different runs.You should get the following results (21 clusters):

IoU class background is 88.17IoU class aeroplane is 77.41IoU class bicycle is 26.18IoU class bird is 68.27IoU class boat is 47.89IoU class bottle is 56.99IoU class bus is 80.63IoU class car is 66.80IoU class cat is 46.13IoU class chair is 0.73IoU class cow is 0.10IoU class diningtable is 0.57IoU class dog is 35.93IoU class horse is 48.68IoU class motorbike is 60.60IoU class person is 32.24IoU class pottedplant is 23.88IoU class sheep is 36.76IoU class sofa is 26.85IoU class train is 69.90IoU class tvmonitor is 27.56

Semantic Segment Retrieval

We examine our representations on PASCAL through segment retrieval. First, we compute a feature vector for every object mask in theval set by averaging the pixel embeddings within the predicted mask. Next, we retrieve the nearest neighbors on thetrain_aug set for each object.

cd segmentationpython retrieval.py --config_env configs/env.yml --config_exp configs/retrieval/retrieval_VOCSegmentation_unsupervised_saliency.yml

Method	MIoU (7 classes)	MIoU (21 classes)
MoCo v2	48.0	39.0
MaskContrast* (unsup sal.)	53.4	43.3
MaskContrast* (sup sal.)	62.3	49.6

* Denotes MoCo init.

Model Zoo

Download the pretrained and linear finetuned models here.

Dataset	Pixel Grouping Prior	mIoU (LC)	mIoU (K-means)	Download link
PASCAL VOC	Supervised Saliency	-	44.2	Pretrained Model 🔗
PASCAL VOC	Supervised Saliency	63.9 (65.5*)	44.2	Linear Finetuned 🔗
PASCAL VOC	Unsupervised Saliency	-	35.0	Pretrained Model 🔗
PASCAL VOC	Unsupervised Saliency	58.4 (59.5*)	35.0	Linear Finetuned 🔗

* Denotes CRF post-processing.

To evaluate and visualize the predictions of the finetuned model, run the following command:

cd segmentationpython eval.py --config_env configs/env.yml --config_exp configs/linear_finetune/linear_finetune_VOCSegmentation_supervised_saliency.yml --state-dict$PATH_TO_MODEL

You can optionally append the--crf-postprocess flag.

Citation

This code is based on theSCAN andMoCo repositories.If you find this repository useful for your research, please consider citing the following paper(s):

@inproceedings{vangansbeke2020unsupervised,title={Unsupervised Semantic Segmentation by Contrasting Object Mask Proposals},author={Van Gansbeke, Wouter and Vandenhende, Simon and Georgoulis, Stamatios and Van Gool, Luc},booktitle={International Conference on Computer Vision},year={2021}}@inproceedings{vangansbeke2020scan,title={Scan: Learning to classify images without labels},author={Van Gansbeke, Wouter and Vandenhende, Simon and Georgoulis, Stamatios and Proesmans, Marc and Van Gool, Luc},booktitle={European Conference on Computer Vision},year={2020}}@inproceedings{he2019moco,title={Momentum Contrast for Unsupervised Visual Representation Learning},author={Kaiming He and Haoqi Fan and Yuxin Wu and Saining Xie and Ross Girshick},booktitle ={Conference on Computer Vision and Pattern Recognition},year={2019}}

For any enquiries, please contact the main authors.

For an overview on self-supervised learning, have a look at theoverview repository.

License

This software is released under a creative commons license which allows for personal and research use only. For a commercial license please contact the authors. You can view a license summaryhere.

Acknoledgements

This work was supported by Toyota, and was carried out at the TRACE Lab at KU Leuven (Toyota Research on Automated Cars in Europe - Leuven).

About

Unsupervised Semantic Segmentation by Contrasting Object Mask Proposals. [ICCV 2021]

openaccess.thecvf.com/content/ICCV2021/html/Van_Gansbeke_Unsupervised_Semantic_Segmentation_by_Contrasting_Object_Mask_Proposals_ICCV_2021_paper.html

Languages

Python100.0%

Movatterモバイル変換

License

wvangansbeke/Unsupervised-Semantic-Segmentation

Folders and files

Latest commit

History

Repository files navigation

Unsupervised Semantic Segmentation by Contrasting Object Mask Proposals

Contents

Introduction

Installation

Training MaskContrast

Setup

Pre-train model

Evaluation

Linear Classifier (LC)

Clustering (K-means)

Semantic Segment Retrieval

Model Zoo

Citation

License

Acknoledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Contributors3

Uh oh!

Languages

Packages