facebookresearch/DeticPublic

NotificationsYou must be signed in to change notification settings
Fork217
Star2k

Code release for "Detecting Twenty-thousand Classes using Image-level Supervision".

License

Apache-2.0 license

2k stars 217 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
configs		configs
datasets		datasets
detic		detic
docs		docs
third_party		third_party
tools		tools
.gitignore		.gitignore
.gitmodules		.gitmodules
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
cog.yaml		cog.yaml
demo.py		demo.py
lazy_train_net.py		lazy_train_net.py
predict.py		predict.py
requirements.txt		requirements.txt
train_net.py		train_net.py

Repository files navigation

Detecting Twenty-thousand Classes using Image-level Supervision

Detic: ADetector withimageclasses that can use image-level labels to easily train detectors.

Detecting Twenty-thousand Classes using Image-level Supervision,
Xingyi Zhou, Rohit Girdhar, Armand Joulin, Philipp Krähenbühl, Ishan Misra,
ECCV 2022 (arXiv 2201.02605)

Features

Detectsany class given class names (usingCLIP).
We train the detector on ImageNet-21K dataset with 21K classes.
Cross-dataset generalization to OpenImages and Objects365without finetuning.
State-of-the-art results on Open-vocabulary LVIS and Open-vocabulary COCO.
Works for DETR-style detectors.

Installation

Seeinstallation instructions.

Demo

Update April 2022: we released more real-time modelshere.

Replicate web demo and docker image:

Integrated intoHuggingface Spaces 🤗 usingGradio. Try out the web demo:

Run our demo using Colab (no GPU needed):

We use the default detectron2demo interface.For example, to run our21K model on amessy desk image (image creditDavid Fouhey) with the lvis vocabulary, run

mkdir modelswget https://dl.fbaipublicfiles.com/detic/Detic_LCOCOI21k_CLIP_SwinB_896b32_4x_ft4x_max-size.pth -O models/Detic_LCOCOI21k_CLIP_SwinB_896b32_4x_ft4x_max-size.pthwget https://eecs.engin.umich.edu/~fouhey/fun/desk/desk.jpgpython demo.py --config-file configs/Detic_LCOCOI21k_CLIP_SwinB_896b32_4x_ft4x_max-size.yaml --input desk.jpg --output out.jpg --vocabulary lvis --opts MODEL.WEIGHTS models/Detic_LCOCOI21k_CLIP_SwinB_896b32_4x_ft4x_max-size.pth

If setup correctly, the output should look like:

The same model can run with other vocabularies (COCO, OpenImages, or Objects365), or acustom vocabulary. For example:

python demo.py --config-file configs/Detic_LCOCOI21k_CLIP_SwinB_896b32_4x_ft4x_max-size.yaml --input desk.jpg --output out2.jpg --vocabulary custom --custom_vocabulary headphone,webcam,paper,coffe --confidence-threshold 0.3 --opts MODEL.WEIGHTS models/Detic_LCOCOI21k_CLIP_SwinB_896b32_4x_ft4x_max-size.pth

The output should look like:

Note thatheadphone,paper andcoffe (typo intended) arenot LVIS classes. Despite the misspelled class name, our detector can produce a reasonable detection forcoffe.

Benchmark evaluation and training

Please firstprepare datasets, then check ourMODEL ZOO to reproduce results in our paper. We highlight key results below:

Open-vocabulary LVIS
mask mAP mask mAP_novel
Box-Supervised 30.2 16.4
Detic 32.4 24.9
Standard LVIS
Detector/ Backbone mask mAP mask mAP_rare
Box-Supervised CenterNet2-ResNet50 31.5 25.6
Detic CenterNet2-ResNet50 33.2 29.7
Box-Supervised CenterNet2-SwinB 40.7 35.9
Detic CenterNet2-SwinB 41.7 41.7
Detector/ Backbone box mAP box mAP_rare
Box-Supervised DeformableDETR-ResNet50 31.7 21.4
Detic DeformableDETR-ResNet50 32.5 26.2
Cross-dataset generalization
Backbone Objects365 box mAP OpenImages box mAP50
Box-Supervised SwinB 19.1 46.2
Detic SwinB 21.4 55.2

	mask mAP	mask mAP_novel
Box-Supervised	30.2	16.4
Detic	32.4	24.9

	Detector/ Backbone	mask mAP	mask mAP_rare
Box-Supervised	CenterNet2-ResNet50	31.5	25.6
Detic	CenterNet2-ResNet50	33.2	29.7
Box-Supervised	CenterNet2-SwinB	40.7	35.9
Detic	CenterNet2-SwinB	41.7	41.7

	Detector/ Backbone	box mAP	box mAP_rare
Box-Supervised	DeformableDETR-ResNet50	31.7	21.4
Detic	DeformableDETR-ResNet50	32.5	26.2

	Backbone	Objects365 box mAP	OpenImages box mAP50
Box-Supervised	SwinB	19.1	46.2
Detic	SwinB	21.4	55.2

License

The majority of Detic is licensed under theApache 2.0 license, however portions of the project are available under separate license terms: SWIN-Transformer, CLIP, and TensorFlow Object Detection API are licensed under the MIT license; UniDet is licensed under the Apache 2.0 license; and the LVIS API is licensed under acustom license. If you later add other third party code, please keep this license info updated, and please let us know if that component is licensed under something other than CC-BY-NC, MIT, or CC0

Ethical Considerations

Detic's wide range of detection capabilities may introduce similar challenges to many other visual recognition and open-set recognition methods.As the user can define arbitrary detection classes, class design and semantics may impact the model output.

Citation

If you find this project useful for your research, please use the following BibTeX entry.

@inproceedings{zhou2022detecting,  title={Detecting Twenty-thousand Classes using Image-level Supervision},  author={Zhou, Xingyi and Girdhar, Rohit and Joulin, Armand and Kr{\"a}henb{\"u}hl, Philipp and Misra, Ishan},  booktitle={ECCV},  year={2022}}

About

Code release for "Detecting Twenty-thousand Classes using Image-level Supervision".

Resources

Readme

License

Apache-2.0 license

Code of conduct

Languages

Python100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Detecting Twenty-thousand Classes using Image-level Supervision

Features

Installation

Demo

Benchmark evaluation and training

License

Ethical Considerations

Citation

About

Resources

License

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Contributors7

Languages

Movatterモバイル変換

License

facebookresearch/Detic

Folders and files

Latest commit

History

Repository files navigation

Detecting Twenty-thousand Classes using Image-level Supervision

Features

Installation

Demo

Benchmark evaluation and training

License

Ethical Considerations

Citation

About

Resources

License

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Contributors7

Languages

Packages