- Notifications
You must be signed in to change notification settings - Fork13
[ICML'24 Oral] "MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions"
License
google-deepmind/magiclens
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This repo contains implementation of MagicLens. The code here uses Jax and Flax.Note that the current implementation does not yet support training.Refer to thewebsite for dataset examples.
We introduce MagicLens, a series of self-supervised image retrieval models that supportopen-ended instructions. The core thesis of MagicLens is that textinstructions can enable retrieving images withricher relations beyond visual similarity. MagicLens is built on akey novel insight: image pairs that naturally occuron the same web pages contain a wide range of implicit relations (e.g., inside view of), and wecan bring those implicit relations explicit by synthesizing instructions via large multimodal models (LMMs) and large language models (LLMs).Trained on 36.7M (query image, instruction, target image) triplets with rich semantic relationsmined from the web, MagicLens achieves comparable or better results on eight benchmarks ofvarious image retrieval tasks than prior state-of-the-art (SOTA) methods. Remarkably, it outperforms previous SOTA but with a 50× smallermodel size on multiple benchmarks. Additionalhuman analyses on a 1.4M-image unseen corpusfurther demonstrate the diversity of search intentssupported by MagicLens.
conda create --name magic_lens python=3.9conda activate magic_lensgit clone https://github.com/google-research/scenic.gitcd scenicpip install .pip install -r scenic/projects/baselines/clip/requirements.txt# you may need to install corresponding GPU version of jax following https://jax.readthedocs.io/en/latest/installation.html# e.g.,# # CUDA 12 installation# Note: wheels only available on linux.# pip install --upgrade "jax[cuda12_pip]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html# # CUDA 11 installation# Note: wheels only available on linux.# pip install --upgrade "jax[cuda11_pip]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
Download model via:
cd .. # in main folder `magiclens`# you may need to use `gcloud auth login` for access, any gmail account should work.gsutil cp -R gs://gresearch/magiclens/models ./
OR viagoogle drive
Please follow each dataset folder in./data
. Currently we have successfully tested FIQ, CIRCO, and DTIN:
python inference.py \--model_size large \--model_path ./models/magic_lens_clip_large.pkl \--dataset circo
Due to the weight conversion, the performance may be slightly different:
InCIRCO
Model | map@5 | map@10 | map@25 | map@50 |
---|---|---|---|---|
Prior SOTA | 26.8 | 27.6 | 30.0 | 31.0 |
Base (original) | 23.1 | 23.8 | 25.8 | 26.7 |
Base (converted) | 22.3 | 23.2 | 25.0 | 26.0 |
Large (original) | 29.6 | 30.8 | 33.4 | 34.4 |
Large (converted) | 29.5 | 30.8 | 33.2 | 34.3 |
Add citation details here, usually a pastable BibTeX snippet:
@inproceedings{zhang2024magiclens, title = {{M}agic{L}ens: Self-Supervised Image Retrieval with Open-Ended Instructions}, author = {Zhang, Kai and Luan, Yi and Hu, Hexiang and Lee, Kenton and Qiao, Siyuan and Chen, Wenhu and Su, Yu and Chang, Ming-Wei}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {59403--59420}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, url = {https://proceedings.mlr.press/v235/zhang24an.html}}
Copyright 2024 DeepMind Technologies Limited
All software is licensed under the Apache License, Version 2.0 (Apache 2.0);you may not use this file except in compliance with the Apache 2.0 license.You may obtain a copy of the Apache 2.0 license at:https://www.apache.org/licenses/LICENSE-2.0
All other materials are licensed under the Creative Commons Attribution 4.0International License (CC-BY). You may obtain a copy of the CC-BY license at:https://creativecommons.org/licenses/by/4.0/legalcode
Unless required by applicable law or agreed to in writing, all software andmaterials distributed here under the Apache 2.0 or CC-BY licenses aredistributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND,either express or implied. See the licenses for the specific language governingpermissions and limitations under those licenses.
This is not an official Google product.