- Notifications
You must be signed in to change notification settings - Fork213
Code release for "Detecting Twenty-thousand Classes using Image-level Supervision".
License
facebookresearch/Detic
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Detic: ADetector withimageclasses that can use image-level labels to easily train detectors.
Detecting Twenty-thousand Classes using Image-level Supervision,
Xingyi Zhou, Rohit Girdhar, Armand Joulin, Philipp Krähenbühl, Ishan Misra,
ECCV 2022 (arXiv 2201.02605)
Detectsany class given class names (usingCLIP).
We train the detector on ImageNet-21K dataset with 21K classes.
Cross-dataset generalization to OpenImages and Objects365without finetuning.
State-of-the-art results on Open-vocabulary LVIS and Open-vocabulary COCO.
Works for DETR-style detectors.
Update April 2022: we released more real-time modelshere.
Replicate web demo and docker image:
Integrated intoHuggingface Spaces 🤗 usingGradio. Try out the web demo:
Run our demo using Colab (no GPU needed):
We use the default detectron2demo interface.For example, to run our21K model on amessy desk image (image creditDavid Fouhey) with the lvis vocabulary, run
mkdir modelswget https://dl.fbaipublicfiles.com/detic/Detic_LCOCOI21k_CLIP_SwinB_896b32_4x_ft4x_max-size.pth -O models/Detic_LCOCOI21k_CLIP_SwinB_896b32_4x_ft4x_max-size.pthwget https://eecs.engin.umich.edu/~fouhey/fun/desk/desk.jpgpython demo.py --config-file configs/Detic_LCOCOI21k_CLIP_SwinB_896b32_4x_ft4x_max-size.yaml --input desk.jpg --output out.jpg --vocabulary lvis --opts MODEL.WEIGHTS models/Detic_LCOCOI21k_CLIP_SwinB_896b32_4x_ft4x_max-size.pth
If setup correctly, the output should look like:
The same model can run with other vocabularies (COCO, OpenImages, or Objects365), or acustom vocabulary. For example:
python demo.py --config-file configs/Detic_LCOCOI21k_CLIP_SwinB_896b32_4x_ft4x_max-size.yaml --input desk.jpg --output out2.jpg --vocabulary custom --custom_vocabulary headphone,webcam,paper,coffe --confidence-threshold 0.3 --opts MODEL.WEIGHTS models/Detic_LCOCOI21k_CLIP_SwinB_896b32_4x_ft4x_max-size.pth
The output should look like:
Note thatheadphone
,paper
andcoffe
(typo intended) arenot LVIS classes. Despite the misspelled class name, our detector can produce a reasonable detection forcoffe
.
Please firstprepare datasets, then check ourMODEL ZOO to reproduce results in our paper. We highlight key results below:
Open-vocabulary LVIS
mask mAP mask mAP_novel Box-Supervised 30.2 16.4 Detic 32.4 24.9 Standard LVIS
Detector/ Backbone mask mAP mask mAP_rare Box-Supervised CenterNet2-ResNet50 31.5 25.6 Detic CenterNet2-ResNet50 33.2 29.7 Box-Supervised CenterNet2-SwinB 40.7 35.9 Detic CenterNet2-SwinB 41.7 41.7 Detector/ Backbone box mAP box mAP_rare Box-Supervised DeformableDETR-ResNet50 31.7 21.4 Detic DeformableDETR-ResNet50 32.5 26.2 Cross-dataset generalization
Backbone Objects365 box mAP OpenImages box mAP50 Box-Supervised SwinB 19.1 46.2 Detic SwinB 21.4 55.2
The majority of Detic is licensed under theApache 2.0 license, however portions of the project are available under separate license terms: SWIN-Transformer, CLIP, and TensorFlow Object Detection API are licensed under the MIT license; UniDet is licensed under the Apache 2.0 license; and the LVIS API is licensed under acustom license. If you later add other third party code, please keep this license info updated, and please let us know if that component is licensed under something other than CC-BY-NC, MIT, or CC0
Detic's wide range of detection capabilities may introduce similar challenges to many other visual recognition and open-set recognition methods.As the user can define arbitrary detection classes, class design and semantics may impact the model output.
If you find this project useful for your research, please use the following BibTeX entry.
@inproceedings{zhou2022detecting, title={Detecting Twenty-thousand Classes using Image-level Supervision}, author={Zhou, Xingyi and Girdhar, Rohit and Joulin, Armand and Kr{\"a}henb{\"u}hl, Philipp and Misra, Ishan}, booktitle={ECCV}, year={2022}}