Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

Code release for "Detecting Twenty-thousand Classes using Image-level Supervision".

License

NotificationsYou must be signed in to change notification settings

facebookresearch/Detic

Detic: ADetector withimageclasses that can use image-level labels to easily train detectors.

Detecting Twenty-thousand Classes using Image-level Supervision,
Xingyi Zhou, Rohit Girdhar, Armand Joulin, Philipp Krähenbühl, Ishan Misra,
ECCV 2022 (arXiv 2201.02605)

Features

  • Detectsany class given class names (usingCLIP).

  • We train the detector on ImageNet-21K dataset with 21K classes.

  • Cross-dataset generalization to OpenImages and Objects365without finetuning.

  • State-of-the-art results on Open-vocabulary LVIS and Open-vocabulary COCO.

  • Works for DETR-style detectors.

Installation

Seeinstallation instructions.

Demo

Update April 2022: we released more real-time modelshere.

Replicate web demo and docker image:Replicate

Integrated intoHuggingface Spaces 🤗 usingGradio. Try out the web demo:Hugging Face Spaces

Run our demo using Colab (no GPU needed):Open In Colab

We use the default detectron2demo interface.For example, to run our21K model on amessy desk image (image creditDavid Fouhey) with the lvis vocabulary, run

mkdir modelswget https://dl.fbaipublicfiles.com/detic/Detic_LCOCOI21k_CLIP_SwinB_896b32_4x_ft4x_max-size.pth -O models/Detic_LCOCOI21k_CLIP_SwinB_896b32_4x_ft4x_max-size.pthwget https://eecs.engin.umich.edu/~fouhey/fun/desk/desk.jpgpython demo.py --config-file configs/Detic_LCOCOI21k_CLIP_SwinB_896b32_4x_ft4x_max-size.yaml --input desk.jpg --output out.jpg --vocabulary lvis --opts MODEL.WEIGHTS models/Detic_LCOCOI21k_CLIP_SwinB_896b32_4x_ft4x_max-size.pth

If setup correctly, the output should look like:

The same model can run with other vocabularies (COCO, OpenImages, or Objects365), or acustom vocabulary. For example:

python demo.py --config-file configs/Detic_LCOCOI21k_CLIP_SwinB_896b32_4x_ft4x_max-size.yaml --input desk.jpg --output out2.jpg --vocabulary custom --custom_vocabulary headphone,webcam,paper,coffe --confidence-threshold 0.3 --opts MODEL.WEIGHTS models/Detic_LCOCOI21k_CLIP_SwinB_896b32_4x_ft4x_max-size.pth

The output should look like:

Note thatheadphone,paper andcoffe (typo intended) arenot LVIS classes. Despite the misspelled class name, our detector can produce a reasonable detection forcoffe.

Benchmark evaluation and training

Please firstprepare datasets, then check ourMODEL ZOO to reproduce results in our paper. We highlight key results below:

  • Open-vocabulary LVIS

    mask mAPmask mAP_novel
    Box-Supervised30.216.4
    Detic32.424.9
  • Standard LVIS

    Detector/ Backbonemask mAPmask mAP_rare
    Box-SupervisedCenterNet2-ResNet5031.525.6
    DeticCenterNet2-ResNet5033.229.7
    Box-SupervisedCenterNet2-SwinB40.735.9
    DeticCenterNet2-SwinB41.741.7
    Detector/ Backbonebox mAPbox mAP_rare
    Box-SupervisedDeformableDETR-ResNet5031.721.4
    DeticDeformableDETR-ResNet5032.526.2
  • Cross-dataset generalization

    BackboneObjects365 box mAPOpenImages box mAP50
    Box-SupervisedSwinB19.146.2
    DeticSwinB21.455.2

License

The majority of Detic is licensed under theApache 2.0 license, however portions of the project are available under separate license terms: SWIN-Transformer, CLIP, and TensorFlow Object Detection API are licensed under the MIT license; UniDet is licensed under the Apache 2.0 license; and the LVIS API is licensed under acustom license. If you later add other third party code, please keep this license info updated, and please let us know if that component is licensed under something other than CC-BY-NC, MIT, or CC0

Ethical Considerations

Detic's wide range of detection capabilities may introduce similar challenges to many other visual recognition and open-set recognition methods.As the user can define arbitrary detection classes, class design and semantics may impact the model output.

Citation

If you find this project useful for your research, please use the following BibTeX entry.

@inproceedings{zhou2022detecting,  title={Detecting Twenty-thousand Classes using Image-level Supervision},  author={Zhou, Xingyi and Girdhar, Rohit and Joulin, Armand and Kr{\"a}henb{\"u}hl, Philipp and Misra, Ishan},  booktitle={ECCV},  year={2022}}

About

Code release for "Detecting Twenty-thousand Classes using Image-level Supervision".

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages


[8]ページ先頭

©2009-2025 Movatter.jp