- Notifications
You must be signed in to change notification settings - Fork81
Official Implementation of CVPR24 highligt paper: Matching Anything by Segmenting Anything
License
siyuanliii/masa
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
[Project Page ][ArXiv ]
Computer Vision Lab, ETH Zurich
- 2024.09: Update a repoTETA to make evaluation on TAO TETA benchmark, Open-vocabulary MOT benchmark and BDD100K MOT and MOTS benchmarks easier!
- 2024.06: MASA code is released!
- 2024.04: MASA is awarded CVPR highlight!
This is a repository for MASA, a universal instance appearance model for matching any object in any domain. MASA can be added atop of any detection and segmentation models to help them track any objects they have detected.
The robust association of the same objects across video frames in complex scenes is crucial for many applications, especially Multiple Object Tracking (MOT). Current methods predominantly rely on labeled domain-specific video datasets, which limits the cross-domain generalization of learned similarity embeddings.We propose MASA, a novel method for robust instance association learning, capable of matching any objects within videos across diverse domains without tracking labels. Leveraging the rich object segmentation from the Segment Anything Model (SAM), MASA learns instance-level correspondence through exhaustive data transformations. We treat the SAM outputs as dense object region proposals and learn to match those regions from a vast image collection.We further design a universal MASA adapter which can work in tandem with foundational segmentation or detection models and enable them to track any detected objects. Those combinations present strong zero-shot tracking ability in complex domains.Extensive tests on multiple challenging MOT and MOTS benchmarks indicate that the proposed method, using only unlabeled static images, achieves even better performance than state-of-the-art methods trained with fully annotated in-domain video sequences, in zero-shot association.
Method | Base | Novel | model | ||
---|---|---|---|---|---|
TETA | AssocA | TETA | AssocA | ||
OVTrack (CVPR23) | 35.5 | 36.9 | 27.8 | 33.6 | - |
MASA-R50 🔥 | 46.5 | 43.0 | 41.1 | 42.7 | HF🤗 |
MASA-Sam-vitB | 47.2 | 44.5 | 41.4 | 42.3 | HF🤗 |
MASA-Sam-vitH | 47.5 | 45.1 | 40.5 | 40.5 | HF🤗 |
MASA-Detic | 47.7 | 44.1 | 41.5 | 41.6 | HF🤗 |
MASA-GroundingDINO 🔥 | 47.3 | 44.7 | 41.9 | 44.0 | HF🤗 |
- We use theDetic-SwinB as the open-vocabulary detector to provide detections for all our variants.
- MASA-R50: MASA with ResNet-50 backbone. It is a fast and independent model that do not use the backbone features from other detection or segmentation foundation models. It needs to be used with any other detectors. It is trained in the same way as other masa variants.
Check out ourmodel zoo for more detailed benchmark performance for different models.
If you want to test our tracker on standard benchmarks, please refer to thebenchmark_test.md.
If you want to compare with MASA and evaluate your own tracker's results on TAO TETA benchmark, Open-vocabulary MOT benchmark and BDD100K MOT and MOTS benchmarks. Please refer to theTETA repo for quick evaluation.
If you want to train the MASA model, please refer to thetrain.md.
See more results on ourproject page!
Please refer toINSTALL.md
First, create a folder named
saved_models
in the root directory of the project. Then, download the following models and put them in thesaved_models
folder.a). Download theMASA-GroundingDINO and put it in
saved_models/masa_models/gdino_masa.pth
folder.(Optional) Second, download the demo videos and put them in the
demo
folder.We provide two short videos for testing (minions_rush_out.mp4 and giraffe_short.mp4). You can download more demo videoshere.Finally, create the
demo_outputs
folder in the root directory of the project to save the output videos.
python demo/video_demo_with_text.py demo/minions_rush_out.mp4 --out demo_outputs/minions_rush_out_outputs.mp4 --masa_config configs/masa-gdino/masa_gdino_swinb_inference.py --masa_checkpoint saved_models/masa_models/gdino_masa.pth --texts"yellow_minions" --score-thr 0.2 --unified --show_fps
--texts
: the object class you want to track. If there are multiple classes, separate them like this:"giraffe . lion . zebra"
. Please note that texts option is currently only available for the open-vocabulary detectors.--out
: the output video path.--score-thr
: the threshold for the visualize object confidence.--detector_type
: the detector type. We supportmmdet
andyolo-world
(soon).--unified
: whether to use the unified model.--no-post
: not to use the postprocessing. Default is to use, adding this will disable it. The postprocessing uses masa tracking to reduce the jittering effect caused by the detector.--show_fps
: whether to show the fps.--sam_mask
: whether to visualize the mask results generated by SAM.--fp16
: whether to use fp16 mode.
The hyperparameters of the tracker can be found in corresponding config files such asconfigs/masa-gdino/masa_gdino_swinb_inference.py
. Current ones are set for the best performance on the demo video. You can adjust them according to your own video and needs.
Download thesora_fish_10s.mp4 and put it in thedemo
folder.
python demo/video_demo_with_text.py demo/sora_fish_10s.mp4 --out demo_outputs/msora_fish_10s_outputs.mp4 --masa_config configs/masa-gdino/masa_gdino_swinb_inference.py --masa_checkpoint saved_models/masa_models/gdino_masa.pth --texts"fish" --score-thr 0.1 --unified --show_fps
a). DownloadSAM-H weights and put it insaved_models/pretrain_weights/sam_vit_h_4b8939.pth
folder.
b). Download thecarton_kangaroo_dance.mp4 and put it in thedemo
folder.
python demo/video_demo_with_text.py demo/carton_kangaroo_dance.mp4 --out demo_outputs/carton_kangaroo_dance_outputs.mp4 --masa_config configs/masa-gdino/masa_gdino_swinb_inference.py --masa_checkpoint saved_models/masa_models/gdino_masa.pth --texts"kangaroo" --score-thr 0.4 --unified --show_fps --sam_mask
You can directly use any detector along with our different MASA variants to track any object.
Here is an example of how to use the MASA adapter with the YoloX detector pretrained on COCO.
Download the YoloX COCO detector weights fromhere and put it in thesaved_models/pretrain_weights/yolox_x_8x8_300e_coco_20211126_140254-1ef88d67.pth
.
Download theMASA-R50 orMASA-GroundingDINO weights and put it in thesaved_models/masa_models/
.
python demo/video_demo_with_text.py demo/giraffe_short.mp4 --out demo_outputs/giraffe_short_outputs.mp4 --det_config projects/mmdet_configs/yolox/yolox_x_8xb8-300e_coco.py --det_checkpoint saved_models/pretrain_weights/yolox_x_8x8_300e_coco_20211126_140254-1ef88d67.pth --masa_config configs/masa-one/masa_r50_plug_and_play.py --masa_checkpoint saved_models/masa_models/masa_r50.pth --score-thr 0.3 --show_fps
Here are examples of how to use the MASA adapter with the CO-DETR detector pretrained on COCO.
Download theCO-DETR-R50 COCO detector weights fromhere and put it in thesaved_models/pretrain_weights/co_dino_5scale_lsj_r50_3x_coco-fe5a6829.pth
.
Download thedriving_10s.mp4 and put it in thedemo
folder.
python demo/video_demo_with_text.py demo/driving_10s.mp4 --out demo_outputs/driving_10s_outputs.mp4 --det_config projects/CO-DETR/configs/codino/co_dino_5scale_r50_lsj_8xb2_3x_coco.py --det_checkpoint saved_models/pretrain_weights/co_dino_5scale_lsj_r50_3x_coco-fe5a6829.pth --masa_config configs/masa-one/masa_r50_plug_and_play.py --masa_checkpoint saved_models/masa_models/masa_r50.pth --score-thr 0.3 --show_fps
Download thezebra-drone.mp4 and put it in thedemo
folder.
python demo/video_demo_with_text.py demo/zebra-drone.mp4 --out demo_outputs/zebra-drone_outputs.mp4 --det_config projects/CO-DETR/configs/codino/co_dino_5scale_r50_lsj_8xb2_3x_coco.py --det_checkpoint saved_models/pretrain_weights/co_dino_5scale_lsj_r50_3x_coco-fe5a6829.pth --masa_config configs/masa-one/masa_r50_plug_and_play.py --masa_checkpoint saved_models/masa_models/masa_r50.pth --score-thr 0.2 --show_fps
Here are some of the things we are working on and please let us know if you have any suggestions or requests:
- [] Release the unified model with YOLO-world detector for fast open-vocabulary tracking.
- Release the training code for turning your own detector to a strong tracker on unlabeled images from your domain.
- Release the plug-and-play MASA model, compatible any detection and segmentation models.
- Release the benchmark testing on TAO and BDD100K.
- Release pre-trained unified models in the paper and the inference demo code.
MASA is a universal instance appearance model that can be added atop of any detection and segmentation models to help them track any objects they have detected. However, there are still some limitations:
- MASA does not have the ability to track objects that are not detected by the detector.
- MASA cannot fix inconsistent detections from the detector. If the detector produces inconsistent detections on different video frames, results look flickering.
- MASA trains on pure unlabeled static images and may not work well in some scenarios with heavy occlusions and noisy detections. Directly using ROI Align for the noisy or occluded objects yields suboptimal features for occlusion handling. We are working on improving the tracking performance in such scenarios.
For questions, please contact theSiyuan Li.
@article{masa,author ={Li, Siyuan and Ke, Lei and Danelljan, Martin and Piccinelli, Luigi and Segu, Mattia and Van Gool, Luc and Yu, Fisher},title ={Matching Anything By Segmenting Anything},journal ={CVPR},year ={2024},}
The authors would like to thank:Bin Yan for helping and discussion;Our code is built onmmdetection,OVTrack,TETA,yolo-world. If you find our work useful, consider checking out their work.
About
Official Implementation of CVPR24 highligt paper: Matching Anything by Segmenting Anything