google-deepmind/magiclensPublic

NotificationsYou must be signed in to change notification settings
Fork13
Star168

[ICML'24 Oral] "MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions"

License

Apache-2.0 license

168 stars 13 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
data		data
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
data_utils.py		data_utils.py
inference.py		inference.py
layers.py		layers.py
model.py		model.py

Repository files navigation

MagicLens

This repo contains implementation of MagicLens. The code here uses Jax and Flax.Note that the current implementation does not yet support training.Refer to thewebsite for dataset examples.

Abstract

We introduce MagicLens, a series of self-supervised image retrieval models that supportopen-ended instructions. The core thesis of MagicLens is that textinstructions can enable retrieving images withricher relations beyond visual similarity. MagicLens is built on akey novel insight: image pairs that naturally occuron the same web pages contain a wide range of implicit relations (e.g., inside view of), and wecan bring those implicit relations explicit by synthesizing instructions via large multimodal models (LMMs) and large language models (LLMs).Trained on 36.7M (query image, instruction, target image) triplets with rich semantic relationsmined from the web, MagicLens achieves comparable or better results on eight benchmarks ofvarious image retrieval tasks than prior state-of-the-art (SOTA) methods. Remarkably, it outperforms previous SOTA but with a 50× smallermodel size on multiple benchmarks. Additionalhuman analyses on a 1.4M-image unseen corpusfurther demonstrate the diversity of search intentssupported by MagicLens.

Setup

conda create --name magic_lens python=3.9conda activate magic_lensgit clone https://github.com/google-research/scenic.gitcd scenicpip install .pip install -r scenic/projects/baselines/clip/requirements.txt# you may need to install corresponding GPU version of jax following https://jax.readthedocs.io/en/latest/installation.html# e.g.,# # CUDA 12 installation# Note: wheels only available on linux.# pip install --upgrade "jax[cuda12_pip]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html# # CUDA 11 installation# Note: wheels only available on linux.# pip install --upgrade "jax[cuda11_pip]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

Model Download

Download model via:

cd .. # in main folder `magiclens`# you may need to use `gcloud auth login` for access, any gmail account should work.gsutil cp -R gs://gresearch/magiclens/models ./

OR viagoogle drive

Data Preparation

Please follow each dataset folder in./data. Currently we have successfully tested FIQ, CIRCO, and DTIN:

Inference

python inference.py \--model_size large \--model_path ./models/magic_lens_clip_large.pkl \--dataset circo

Due to the weight conversion, the performance may be slightly different:

InCIRCO

Model	map@5	map@10	map@25	map@50
Prior SOTA	26.8	27.6	30.0	31.0
Base (original)	23.1	23.8	25.8	26.7
Base (converted)	22.3	23.2	25.0	26.0
Large (original)	29.6	30.8	33.4	34.4
Large (converted)	29.5	30.8	33.2	34.3

Citing this work

Add citation details here, usually a pastable BibTeX snippet:

@inproceedings{zhang2024magiclens,  title =  {{M}agic{L}ens: Self-Supervised Image Retrieval with Open-Ended Instructions},  author =       {Zhang, Kai and Luan, Yi and Hu, Hexiang and Lee, Kenton and Qiao, Siyuan and Chen, Wenhu and Su, Yu and Chang, Ming-Wei},  booktitle =  {Proceedings of the 41st International Conference on Machine Learning},  pages =  {59403--59420},  year =  {2024},  editor =  {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},  volume =  {235},  series =  {Proceedings of Machine Learning Research},  month =  {21--27 Jul},  publisher =    {PMLR},  url =  {https://proceedings.mlr.press/v235/zhang24an.html}}

License and disclaimer

All software is licensed under the Apache License, Version 2.0 (Apache 2.0);you may not use this file except in compliance with the Apache 2.0 license.You may obtain a copy of the Apache 2.0 license at:https://www.apache.org/licenses/LICENSE-2.0

All other materials are licensed under the Creative Commons Attribution 4.0International License (CC-BY). You may obtain a copy of the CC-BY license at:https://creativecommons.org/licenses/by/4.0/legalcode

Unless required by applicable law or agreed to in writing, all software andmaterials distributed here under the Apache 2.0 or CC-BY licenses aredistributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND,either express or implied. See the licenses for the specific language governingpermissions and limitations under those licenses.

This is not an official Google product.

About

[ICML'24 Oral] "MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions"

Languages

Python100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Folders and files

Latest commit

History

Repository files navigation

MagicLens

Abstract

Setup

Model Download

Data Preparation

Inference

Citing this work

License and disclaimer

About

Resources

License

Stars

Watchers

Forks

Releases

Packages

Contributors3

Languages

Movatterモバイル変換

License

google-deepmind/magiclens

Folders and files

Latest commit

History

Repository files navigation

MagicLens

Abstract

Setup

Model Download

Data Preparation

Inference

Citing this work

License and disclaimer

About

Resources

License

Stars

Watchers

Forks

Releases

Packages0

Contributors3

Languages

Packages