Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Awesome OVD-OVS - A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future

NotificationsYou must be signed in to change notification settings

seanzhuh/Awesome-Open-Vocabulary-Detection-and-Segmentation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

65 Commits
 
 
 
 

Repository files navigation

Static BadgeStatic BadgeStatic BadgearXiv PDF

Chaoyang Zhu,Long Chen*


News

Please remain tuned, this repo will be maintained on a week-to-week basis.

  • 27/06/2024: NeRF and 3DGS based 3D scene understanding is added.
  • 05/06/2024: Our 2nd version manuscript is accepted by TPAMI.

Bibtex

If you find our survey helpful, please consider citing our paper:

@article{survey-ovd-ovs,title={A survey on open-vocabulary detection and segmentation: Past, present, and future},author={Zhu, Chaoyang and Chen, Long},journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},year={2024}}

✨ PR is welcome!

Though we aim to cover every paper, still chances may happen that some works are missing. We believe the repository should be maintained by the community. Peer review is welcome and will be highly appreciated, if you are the authors and find our recordings incorrect, don't hesitate to contact me and fire a PR.

General Overview

In this survey, we cover two settings (zero-shot and open-vocabulary) and six tasks (object detection, semantic/instance/panoptic segmentation, 3D scene understanding, and video understanding). We pivot on the permission to weak supervision signals and the usage of weak supervision signals to build a taxonomy that is universal across these diverse settings and tasks. The weak supervision signals can be image-text pairs or large vision-language models. Below is a general overview of each methodology.

In current literature, zero-shot and open-vocabulary are used interchangeably, however, we highlight their subtle differences through the evolvement from traditional zero-shot to the newly formulated open-vocabulary setting.

Table of Contents

Zero-Shot Object Detection

Visual-Semantic Space Mapping

VenuePaper AbbrFull TitleProject
ECCV'18ZSDv1Zero-Shot Object DetectionN/A
ACCV'18 & IJCV'20ZSDv2Zero-Shot Object Detection: Learning to Simultaneously Recognize and Localize Novel ConceptsN/A
AAAI'20CA-ZSRContext-Aware Zero-Shot RecognitionCode
AAAI'19ZSD-TDZero-Shot Object Detection with Textual DescriptionsN/A
ACCV'20BLCBackground Learnable Cascade for Zero-Shot Object DetectionCode
ICCV'19TL-ZSDTransductive Learning for Zero-Shot Object DetectionN/A
arXiv'23SSBFrustratingly Simple but Effective Zero-shot Detection and Segmentation: Analysis and a Strong BaselineN/A
WACV'20MS-ZeroA Multi-Space Approach to Zero-Shot Object DetectionN/A
TCSVT'19ZS-YOLOZero Shot DetectionN/A
AAAI'21DPIFInference Fusion with Associative Semantics for Unseen Object DetectionCode
TPAMI'21ContrastZSDSemantics-Guided Contrastive Network for Zero-Shot Object detectionN/A
IJCAI'20ZSD-CNNZero-Shot Object Detection via Learning an Embedding from Semantic Space to Visual SpaceN/A

Novel Visual Feature Synthesis

VenuePaper AbbrPaper TitleProject
CVPR'20DELODont Even Look Once: Synthesizing Features for Zero-Shot DetectionN/A
ACCV'20SUSynthesizing the Unseen for Zero-shot Object DetectionCode
AAAI'20GTNetGTNet: Generative Transfer Network for Zero-Shot Object DetectionCode
CVPR'22RRFSRobust Region Feature Synthesizer for Zero-Shot Object DetectionCode

Zero-Shot Segmentation

Zero-Shot Semantic Segmentation

Visual-Semantic Space Mapping

VenuePaper AbbrPaper TitleProject
CVPR'20SPNetSemantic Projection Network for Zero- and Few-Label Semantic SegmentationCode
NeurIPS'20ULZSSUncertainty-Aware Learning for Zero-Shot Semantic SegmentationCode
ICCV'21JoEmExploiting a Joint Embedding Space for Generalized Zero-Shot Semantic SegmentationCode
ICCVW'19VMZero-Shot Semantic Segmentation via Variational MappingN/A
ICCV'21PMOSRPrototypical Matching and Open Set Rejection for Zero-Shot Semantic SegmentationN/A

Novel Visual Feature Synthesis

VenuePaper AbbrPaper TitleProject
NeurIPS'19ZS3NetZero-Shot Semantic SegmentationCode
NeurIPS'20CSRLConsistent Structural Relation Learning for Zero-Shot SegmentationN/A
MM'20CaGNetContext-aware Feature Generation for Zero-shot Semantic SegmentationCode
ICCV'21SIGNSIGN: Spatial-information Incorporated Generative Network for Generalized Zero-shot Semantic SegmentationCode

Zero-Shot Instance Segmentation

VenuePaper AbbrPaper TitleProject
CVPR'21ZSISZero-Shot Instance SegmentationCode

Open-Vocabulary Object Detection

Region-Aware Training

VenuePaper AbbrPaper TitleProject
CVPR'21OVR-CNNOpen-Vocabulary Object Detection Using CaptionsCode
GCPR'22LocOvLocalized Vision-Language Matching for Open-vocabulary Object DetectionCode
arXiv'23MMC-DetExploring Multi-Modal Contextual Knowledge for Open-Vocabulary Object DetectionN/A
NeurIPS'22DetCLIPDetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for Open-world DetectionN/A
CVPR'23DetCLIPv2DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-training via Word-Region AlignmentN/A
CVPR'24DetCLIPv3DetCLIPv3: Towards Versatile Generative Open-vocabulary Object DetectionN/A
AAAI'24WSOVODWeakly Supervised Open-Vocabulary Object DetectionCode
CVPR'23RO-ViTRegion-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers
N/A
ICCV'23CFM-ViTContrastive Feature Masking Open-Vocabulary Vision TransformerN/A
ICCV'23DITODetection-Oriented Image-Text Pretraining for Open-Vocabulary DetectionCode
ICLR'23VLDetLearning Object-Language Alignments for Open-Vocabulary Object DetectionCode
ICCV'23GOATOpen-Vocabulary Object Detection With an Open Corpus
N/A
ECCV'22OV-DETROpen-Vocabulary DETR with Conditional MatchingCode
arXiv'23Prompt-OVDPrompt-Guided Transformers for End-to-End Open-Vocabulary Object DetectionN/A
CVPR'23CORACORA: Adapting CLIP for Open-Vocabulary Detection with Region Prompting and Anchor Pre-MatchingN/A
ICCV'23EdaDetEdaDet: Open-Vocabulary Object Detection Using Early Dense AlignmentCode
ICCV'21MDETRMDETR: Modulated Detection for End-to-End Multi-Modal UnderstandingCode
ECCV'22MAVLClass-agnostic Object Detection with Multi-modal TransformerCode
NeurIPS'24MQ-DetMulti-modal Queried Object Detection in the WildCode
CVPR'24YOLO-WorldReal-Time Open-Vocabulary Object DetectionCode
MM'23SGDNOpen-Vocabulary Object Detection via Scene Graph DiscoveryN/A
CVPR'24USEUSE: Universal Segment Embeddings for Open-Vocabulary Image SegmentationN/A
ICLR'25CCKT-DetCyclic Contrastive Knowledge Transfer for Open-Vocabulary Object DetectionCode

Pseudo-Labeling

VenuePaper AbbrPaper TitleProject
CVPR'22RegionCLIPRegionCLIP: Region-based Language-Image PretrainingCode
ECCV'22VL-PLMExploiting Unlabeled Data with Vision and Language Models for Object DetectionCode
CVPR'22GLIPGrounded Language-Image Pre-trainingCode
NeurIPS'22GLIPv2GLIPv2: Unifying Localization and VL
Understanding
Code
arXiv'23Grounding-DINOGrounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object DetectionCode
ECCV'22PromptDetPromptDet: Towards Open-vocabulary Detection using Uncurated ImagesCode
arXiv'23SAS-DetTaming Self-Training for Open-Vocabulary Object DetectionCode
ECCV'22PB-OVDOpen Vocabulary Object Detection with Pseudo Bounding-Box LabelsCode
AAAI'24CLIMCLIM: Contrastive Language-Image Mosaic for Region RepresentationCode
arXiv'22VTP-OVDFine-grained Visual-Text Prompt-Driven Self-Training for Open-Vocabulary Object DetectionN/A
AAAI'24ProxyDetProxyDet: Synthesizing Proxy Novel Classes via Classwise Mixup for Open-Vocabulary Object DetectionCode
NeurIPS'23CoDetCoDet: Co-Occurrence Guided Region-Word Alignment for Open-Vocabulary Object DetectionCode
ECCV'22DeticDetecting Twenty-thousand Classes using Image-level SupervisionCode
ICML'23MMCMulti-Modal Classifiers for Open-Vocabulary Object DetectionCode
arXiv'233WaysThree ways to improve feature alignment for open vocabulary detectioN/A
arXiv'23PLACLearning Pseudo-Labeler beyond Noun Concepts for Open-Vocabulary Object DetectionN/A
arXiv'23PCLOpen-Vocabulary Object Detection using Pseudo Caption Labels
N/A
NeurIPS'24OWLv2Scaling Open-Vocabulary Object DetectionCode

Knowledge Distillation

VenuePaper AbbrPaper TitleProject
ICLR'22ViLDOpen-vocabulary Object Detection via Vision and Language Knowledge DistillationCode
ICDMW'22ZSD-YOLOZero-shot Object Detection Through Vision-Language Embedding AlignmentCode
WACV'24LP-OVODLP-OVOD: Open-Vocabulary Object Detection by Linear ProbingCode
arXiv'23EZSDEfficient Feature Distillation for Zero-shot Annotation Object DetectionCode
AAAI'24SIC-CADSSimple Image-level Classification Improves Open-vocabulary Object DetectionCode
CVPR'23BARONAligning Bag of Regions for Open-Vocabulary Object DetectionCode
CVPR'23OADPObject-Aware Distillation Pyramid for Open-Vocabulary Object DetectionCode
arXiv'23GridCLIPGridCLIP: One-Stage Object Detection by Grid-Level CLIP Representation LearningN/A
NeurIPS'22RKDWTFBridging the Gap between Object and Image-level Representations for Open-Vocabulary DetectionCode
ICCV'23DK-DETRDistilling DETR with Visual-Linguistic Knowledge for Open-Vocabulary Object DetectionCode
CVPR'22HierKDOpen-Vocabulary One-Stage Detection with Hierarchical Visual-Language Knowledge DistillationCode
CVPR'22DetProLearning to Prompt for Open-Vocabulary Object Detection with Vision-Language ModelCode
arXiv'23CLIPSelfCLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense PredictionCode
CVPR'24SAMPScene-adaptive and Region-aware Multi-modal Prompt for Open Vocabulary Object DetectionN/A
IJCV'24OV-DAROV-DAR: Open-Vocabulary Object Detection and Attributes RecognitionN/A
CVPR'24LBPLearning Background Prompts to Discover Implicit Knowledge for Open Vocabulary Object DetectionN/A

Transfer Learning

VenuePaper AbbrPaper TitleProject
ECCV'22OWL-ViTSimple Open-Vocabulary Object Detection with Vision TransformersCode
CVPR'23UniDetectorDetecting Everything in the Open World: Towards Universal Object DetectionCode
ICLR'23F-VLMF-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language ModelsCode
CVPR'23ScaleDetScaleDet: A Scalable Multi-Dataset Object DetectorN/A
ICCV'23OpenSeedA Simple Framework for Open-Vocabulary Segmentation and DetectionCode
arXiv'23DRRWhat Makes Good Open-Vocabulary Detector: A Disassembling PerspectiveN/A
arXiv'23SamborBoosting Segment Anything Model Towards Open-Vocabulary LearningCode
AAAI'25LAE-DINOLocate Anything on Earth: Advancing Open-Vocabulary Object Detection for Remote Sensing CommunityCode

Open-Vocabulary Segmentation

Open-Vocabulary Semantic Segmentation

Region-Aware Training

VenuePaper AbbrPaper TitleProject
ECCV'22OpenSegScaling Open-Vocabulary Image Segmentation with Image-Level LabelsN/A
arXiv'23SLICSILC: Improving Vision Language Pretraining with Self-DistillationN/A
CVPR'22GroupViTGroupViT: Semantic Segmentation Emerges from Text SupervisionCode
ECCV'22ViL-SegOpen-world Semantic Segmentation via Contrasting and Clustering Vision-Language EmbeddingN/A
ICML'23SegCLIPSegCLIP: Patch Aggregation with Learnable Centers for Open-Vocabulary Semantic SegmentationCode
CVPR'23OVSegmentorLearning Open-vocabulary Semantic Segmentation Models From Natural Language SupervisionCode
CVPR'23PACLOpen Vocabulary Semantic Segmentation with Patch Aligned Contrastive LearningN/A
CVPR'23TCLLearning to Generate Text-grounded Mask for Open-world Semantic Segmentation from Only Image-Text Pairs
Code
ECCV'22SimSegA Simple Baseline for Open-Vocabulary Semantic Segmentation with Pre-trained Vision-language ModelCode

Pseudo-Labeling

VenuePaper AbbrPaper TitleProject
ECCV'22TTDOpen-Vocabulary Semantic Segmentation Using Test-Time DistillationN/A

Knowledge Distillation

VenuePaper AbbrPaper TitleProject
arXiv'23GKCGlobal Knowledge Calibration for Fast Open-Vocabulary SegmentationN/A
arXiv'23SAM-CLIPSAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial UnderstandingN/A
ICCV'23ZeroSegExploring Open-Vocabulary Semantic Segmentation from CLIP Vision Encoder Distillation OnlyCode

Transfer Learning

VenuePaper AbbrPaper TitleProject
ICLR'22LSegLanguage-driven Semantic SegmentationCode
CVPR'23SAZSDelving Into Shape-Aware Zero-Shot Semantic SegmentationCode
MM'23CELClass Enhancement Losses with Pseudo Labels for Zero-shot Semantic SegmentationN/A
CVPR'22ZegFormerDecoupling Zero-Shot Semantic SegmentationCode
NeurIPS'22ReCoReCo: Retrieve and Co-segment for Zero-shot TransferProject
arXiv'23SCANOpen-Vocabulary Segmentation with Semantic-Assisted CalibrationN/A
ECCV'22ZSSegA Simple Baseline for Open-Vocabulary Semantic Segmentation with Pre-trained Vision-language ModelCode
ECCV'22MaskCLIPExtract Free Dense Labels from CLIPCode
arXiv'23CLIP-DINOiserCLIP-DINOiser: Teaching CLIP a few DINO tricks for open-vocabulary semantic segmentationCode
PRCV'23MVP-SEGMVP-SEG: Multi-View Prompt Learning for Open-Vocabulary Semantic SegmentationN/A
arXiv'23OVDiffDiffusion Models for Zero-Shot Open-Vocabulary SegmentationProject
WACV'24FOSSILFOSSIL: Free Open-Vocabulary Semantic Segmentation Through Synthetic References RetrievalN/A
NeurIPS'24POMPPrompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary Visual RecognitionCode
NeurIPS'24AttrSegAttrSeg: Open-Vocabulary Semantic Segmentation via Attribute Decomposition-AggregationN/A
arXiv'23PnP-OVSSEmergent Open-Vocabulary Semantic Segmentation from Off-the-shelf Vision-Language ModelsCode
arXiv'23TagAlignTagAlign: Improving Vision-Language Alignment with Multi-Tag ClassificationProject
arXiv'23SelfSegAuto-Vocabulary Semantic SegmentationN/A
CVPR'22DenseCLIPDenseCLIP: Language-Guided Dense Prediction with Context-Aware PromptingCode
CVPR'23OVSegOpen-Vocabulary Semantic Segmentation with Mask-adapted CLIPCode
arXiv'23CAT-SegCAT-Seg: Cost Aggregation for Open-Vocabulary Semantic SegmentationCode
arXiv'23SEDSED: A Simple Encoder-Decoder for Open-Vocabulary Semantic SegmentationCode
NeurIPS'23MAFTLearning Mask-aware CLIP Representations for Zero-Shot SegmentationCode
arXiv'23TagCLIPTagCLIP: Improving Discrimination Ability of Open-Vocabulary Semantic SegmentationN/A
CVPR'23ZegCLIPZegCLIP: Towards Adapting CLIP for Zero-Shot Semantic SegmentationCode
CVPR'22CLIPSegImage Segmentation Using Text and Image PromptsCode
CVPR'23SANSide Adapter Network for Open-Vocabulary Semantic SegmentationCode
arXiv'23CLIP SurgeryCLIP Surgery for Better Explainability with Enhancement in Open-Vocabulary TasksCode
arXiv'23CaRCLIP as RNN: Segment Countless Visual Concepts without Training EndeavorProject
arXiv'24Cascade-CLIPCascade-CLIP: Cascaded Vision-Language Embeddings Alignment for Zero-Shot Semantic SegmentationCode
arXiv'24OpenDASOpenDAS: Domain Adaptation for Open-Vocabulary SegmentationProject
arXiv'24H-CLIPParameter-efficient Fine-tuning in Hyperspherical
Space for Open-vocabulary Semantic SegmentationN/A

Open-Vocabulary Instance Segmentation

Region-Aware Training

VenuePaper AbbrPaper TitleProject
ICCV'23CGGBetrayed by Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance SegmentationCode
CVPR'23D2ZeroSemantic-Promoted Debiasing and Background Disambiguation for Zero-Shot Instance SegmentationCode

Pseudo-Labeling

VenuePaper AbbrPaper TitleProject
CVPR'23XPMOpen-Vocabulary Instance Segmentation via Robust Cross-Modal Pseudo-LabelingCode
CVPR'23Mask-free OVISMask-free OVIS: Open-Vocabulary Instance Segmentation without Manual Mask AnnotationsCode
arXiv'23MosaicFusionMosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance SegmentationCode

Knowledge Distillation

VenuePaper AbbrPaper TitleProject
arXiv'24OV-SAMOpen-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes InteractivelyCode

Open-Vocabulary Panoptic Segmentation

Region-Aware Training

VenuePaper AbbrPaper TitleProject
arXiv'24Uni-OVSegOpen-Vocabulary Segmentation with Unpaired Mask-Text SupervisionCode
CVPR'23X-DecoderGeneralized Decoding for Pixel, Image, and LanguageCode
CVPR'24APELearning active tactile perception through belief-space controlCode

Knowledge Distillation

VenuePaper AbbrPaper TitleProject
CVPR'23PADingPrimitive Generation and Semantic-related Alignment for Universal Zero-Shot SegmentationCode

Transfer Learning

VenuePaper AbbrPaper TitleProject
NeurIPS'23FC-CLIPConvolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIPCode
CVPR'23FreeSegFreeSeg: Unified, Universal and Open-Vocabulary Image SegmentationProject
arXiv'24PosSAMPosSAM: Panoptic Open-vocabulary Segment AnythingProject
ICCV'23MasQCLIPMasQCLIP for Open-Vocabulary Universal Image SegmentationProject
CVPR'23OMG-SegOMG-Seg: Is One Model Good Enough For All Segmentation?Code
arXiv'23Semantic-SAMSemantic-SAM: Segment and Recognize Anything at Any GranularityCode
CVPR'23ODISEOpen-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion ModelsCode
NeurIPS'23HIPIEHierarchical Open-vocabulary Universal Image SegmentationCode
ICML'23MaskCLIPOpen-Vocabulary Universal Image Segmentation with MaskCLIPProject
ICCV'23OPSNetOpen-vocabulary Panoptic Segmentation with Embedding ModulationN/A

Open-Vocabulary 3D Scene Understanding

Open-Vocabulary 3D Detection

VenuePaper AbbrPaper TitleProject
CVPR'23OV-3DETOpen-Vocabulary Point-Cloud Object Detection without 3D AnnotationCode
AAAI'24FM-OV3DFM-OV3D: Foundation Model-based Cross-modal Knowledge Blending for Open-Vocabulary 3D DetectionCode
arXiv'23OpenSightOpenSight: A Simple Open-Vocabulary Framework for LiDAR-Based Object DetectionN/A
NeurIPS'23CoDACoDA: Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object DetectionCode
arXiv'23L3DetObject2Scene: Putting Objects in Context for Open-Vocabulary 3D DetectionN/A

Open-Vocabulary 3D Segmentation

Open-Vocabulary 3D Semantic Segmentation

VenuePaper AbbrPaper TitleProject
arXiv'21SeCondPointLanguage-Level Semantics Conditioned 3D Point Cloud SegmentationN/A
3DV'213DGenZGenerative Zero-Shot Learning for Semantic Segmentation of 3D Point CloudsCode
CVPR'23OpenSceneOpenScene: 3D Scene Understanding with Open VocabulariesProject
CVPR'23PLAPLA: Language-Driven Open-Vocabulary 3D Scene Understanding
Code
arXiv'23RegionPLCRegionPLC: Regional Point-Language Contrastive Learning for Open-World 3D Scene UnderstandingProject

Open-Vocabulary 3D Instance Segmentation

VenuePaper AbbrPaper TitleProject
NeurIPS'23OpenMask3DOpenMask3D: Open-Vocabulary 3D Instance SegmentationProject
CVPR'24MaskClusteringMaskClustering: View Consensus based Mask Graph Clustering for Open-Vocabulary 3D Instance SegmentationProject
arXiv'23OpenIns3DOpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance SegmentationProject
arXiv'23Open3DISOpen3DIS: Open-Vocabulary 3D Instance Segmentation with 2D Mask GuidanceProject
arXiv'24OpenSU3DOpenSU3D: Open World 3D Scene Understanding using Foundation ModelsProject
arXiv'24Search3DSearch3D: Hierarchical Open-Vocabulary 3D SegmentationN/A

NeRF and 3DGS based

NeRF (Neural Radiance Field) and 3DGS (3D Gaussian Splatting) are hot topics for novel view synthesis in a holistic scene. They leverage multi-view consistency learning inherently imposed in the 3D model to help 2D image segmentation or directly perform 3D semantic segmentation over points (voxel or gaussian) in the scene.

VenuePaper AbbrPaper TitleProject
ICCV'21Semantic-NeRFIn-Place Scene Labelling and Understanding With Implicit Scene RepresentationCode
NeurIPS'22FFDDecomposing NeRF for Editing via Feature Field DistillationCode
arXiv'23Gaussian GroupingGaussian Grouping: Segment and Edit Anything in 3D ScenesCode
ICCV'23LERFLERF: Language Embedded Radiance FieldsProject
NeurIPS'233DOVSWeakly Supervised 3D Open-vocabulary SegmentationCode
arXiv'24OpenGaussianOpenGaussian: Towards Point-Level 3D Gaussian-based Open Vocabulary UnderstandingProject
arXiv'24OV-NeRFOV-NeRF: Open-vocabulary Neural Radiance Fields with Vision and Language Foundation Models for 3D Semantic UnderstandingCode
arXiv'24Semantic GaussiansSemantic Gaussians: Open-Vocabulary Scene Understanding with 3D Gaussian SplattingProject
arXiv'24FMGSFMGS: Foundation Model Embedded 3D Gaussian Splatting for Holistic 3D Scene UnderstandingProject
CVPR'24LEGaussiansLanguage Embedded 3D Gaussians for Open-Vocabulary Scene UnderstandingCode
CVPR'24LangSplatLangSplat: 3D Language Gaussian SplattingProject
CVPR'24Feature 3DGSFeature 3DGS: Supercharging 3D Gaussian Splatting to Enable Distilled Feature FieldsCode

Open-Vocabulary Video Understanding

Open-Vocabulary Video Instance Segmentation

VenuePaper AbbrPaper TitleProject
ICCV'23OV2SegTowards Open-Vocabulary Video Instance SegmentationCode
arXiv'23OpenVISOpenVIS: Open-vocabulary Video Instance SegmentationCode
arXiv'24BriVISInstance Brownian Bridge as Texts for Open-vocabulary Video Instance SegmentationCode

About

Awesome OVD-OVS - A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors3

  •  
  •  
  •  

[8]ページ先頭

©2009-2025 Movatter.jp