Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

[ICML'24 Oral] "MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions"

License

NotificationsYou must be signed in to change notification settings

google-deepmind/magiclens

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This repo contains implementation of MagicLens. The code here uses Jax and Flax.Note that the current implementation does not yet support training.Refer to thewebsite for dataset examples.

Abstract

We introduce MagicLens, a series of self-supervised image retrieval models that supportopen-ended instructions. The core thesis of MagicLens is that textinstructions can enable retrieving images withricher relations beyond visual similarity. MagicLens is built on akey novel insight: image pairs that naturally occuron the same web pages contain a wide range of implicit relations (e.g., inside view of), and wecan bring those implicit relations explicit by synthesizing instructions via large multimodal models (LMMs) and large language models (LLMs).Trained on 36.7M (query image, instruction, target image) triplets with rich semantic relationsmined from the web, MagicLens achieves comparable or better results on eight benchmarks ofvarious image retrieval tasks than prior state-of-the-art (SOTA) methods. Remarkably, it outperforms previous SOTA but with a 50× smallermodel size on multiple benchmarks. Additionalhuman analyses on a 1.4M-image unseen corpusfurther demonstrate the diversity of search intentssupported by MagicLens.Intro image

Setup

conda create --name magic_lens python=3.9conda activate magic_lensgit clone https://github.com/google-research/scenic.gitcd scenicpip install .pip install -r scenic/projects/baselines/clip/requirements.txt# you may need to install corresponding GPU version of jax following https://jax.readthedocs.io/en/latest/installation.html# e.g.,# # CUDA 12 installation# Note: wheels only available on linux.# pip install --upgrade "jax[cuda12_pip]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html# # CUDA 11 installation# Note: wheels only available on linux.# pip install --upgrade "jax[cuda11_pip]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

Model Download

Download model via:

cd .. # in main folder `magiclens`# you may need to use `gcloud auth login` for access, any gmail account should work.gsutil cp -R gs://gresearch/magiclens/models ./

OR viagoogle drive

Data Preparation

Please follow each dataset folder in./data. Currently we have successfully tested FIQ, CIRCO, and DTIN:

Inference

python inference.py \--model_size large \--model_path ./models/magic_lens_clip_large.pkl \--dataset circo

Due to the weight conversion, the performance may be slightly different:

InCIRCO

Modelmap@5map@10map@25map@50
Prior SOTA26.827.630.031.0
Base (original)23.123.825.826.7
Base (converted)22.323.225.026.0
Large (original)29.630.833.434.4
Large (converted)29.530.833.234.3

Citing this work

Add citation details here, usually a pastable BibTeX snippet:

@inproceedings{zhang2024magiclens,  title =  {{M}agic{L}ens: Self-Supervised Image Retrieval with Open-Ended Instructions},  author =       {Zhang, Kai and Luan, Yi and Hu, Hexiang and Lee, Kenton and Qiao, Siyuan and Chen, Wenhu and Su, Yu and Chang, Ming-Wei},  booktitle =  {Proceedings of the 41st International Conference on Machine Learning},  pages =  {59403--59420},  year =  {2024},  editor =  {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},  volume =  {235},  series =  {Proceedings of Machine Learning Research},  month =  {21--27 Jul},  publisher =    {PMLR},  url =  {https://proceedings.mlr.press/v235/zhang24an.html}}

License and disclaimer

Copyright 2024 DeepMind Technologies Limited

All software is licensed under the Apache License, Version 2.0 (Apache 2.0);you may not use this file except in compliance with the Apache 2.0 license.You may obtain a copy of the Apache 2.0 license at:https://www.apache.org/licenses/LICENSE-2.0

All other materials are licensed under the Creative Commons Attribution 4.0International License (CC-BY). You may obtain a copy of the CC-BY license at:https://creativecommons.org/licenses/by/4.0/legalcode

Unless required by applicable law or agreed to in writing, all software andmaterials distributed here under the Apache 2.0 or CC-BY licenses aredistributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND,either express or implied. See the licenses for the specific language governingpermissions and limitations under those licenses.

This is not an official Google product.

About

[ICML'24 Oral] "MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages


[8]ページ先頭

©2009-2025 Movatter.jp