Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

[NeurIPS 2023] This repository includes the official implementation of our paper "An Inverse Scaling Law for CLIP Training"

License

NotificationsYou must be signed in to change notification settings

UCSC-VLAA/CLIPA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

79 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This repo contains official Pytorch and JAX implementation ofCLIPA in our paper:An Inverse Scaling Law for CLIP Training

Overview of the Inverse Scaling Law: larger image/text encodersenable training with fewer image/text tokens while maintaining competitive performance

📰 News

[2023.10.4] We have achieved a successful scale-up of our model tobigG/14, attaining an impressive83.0% zero-shot top-1 accuracy on the ImageNet-1K dataset.For the detailed training configuration, please refer to thet5x branch. Additionally, you can access the pre-trained and fine-tuned weights for both JAX and PyTorch version in theGoogle Drive.

[2023.9.21]Our paper is accepted byNeurIPS 2023!

[2023.6.16] We releaseCLIPA-v2. Compared to the prior best publicly available CLIP model, our CLIPA-v2 can be trained significantly faster and yields stronger performance. Our best model is H/14@336x336 on DataComp-1B with an accuracy of 81.8, and its estimated training cost is <$15k!

[Note] All of ourCLIPA-v2 models were trained on TPU using ourJAX codebase. We followed the same pre-training process as CLIPA-v1, but with a more efficient fine-tuning strategy. In order to replicate our results, we have provided thetraining configuration (e.g. the H-14 model in this folderhere), along with thepre-trained weights, configuration, and logs, which can be foundhere.

dataScheduleGPU HoursEstimated Costzero-shot IN-1Kmodel weight
H/14LAION-2B12.8B@84 + 512M@224 + 128M@3368640$1361379.1PyTorch /JAX
L/14DataCOMP-1B12.8B@84 + 512M@224 +128M@3364520$712480.3PyTorch /JAX
H/14DataCOMP-1B12.8B@84 + 512M@224 + 128M@3368640$1361381.8PyTorch /JAX
bigG/14DataCOMP-1B12.8B@84 + 512M@224 + 128M@33623742$3905683.0PyTorch /JAX

Our CLIPA-v2’s GPU hour is estimated using an 8-A100 80GB GPU machine on Google Cloud.The corresponding training cost is estimated based on 80GB A100’s cloud pricing.

Introduction

CLIP, the first foundation model that connects images and text, has enabled many recent breakthroughs in computer vision.However, its associated training cost is prohibitively high, imposing a significant barrier to its widespread exploration.In this paper, we present a surprising finding that there exists aninverse scaling law for CLIP training,whereby the larger the image/text encoders used, the shorter the sequence length of image/text tokens that can be applied in training.Moreover, we showcase that the strategy for reducing image/text token length plays a crucial role in determining the quality of this scaling law.

As a result of this finding, we are able to successfully train CLIP even by using academic resources.For example, on an A100 eight-GPU server, our CLIP models achieve zero-shot top-1 ImageNet accuracies of63.2% in about2 days,67.8% in about3 days, and69.3% in about4 days.By reducing the computation barrier associated with CLIP, we hope to inspire more research in this field, particularly from academics.

TPU Usage

Our experiments are conducted on both GPUs and TPUs. Both the JAX and PyTorch implementations enable TPU training.But how to gain access and setup TPU machines? Check thisbrief doc.In a nutshell, you can access TPU machines on Google Cloudfor free!

License

This project is under the Apache 2.0 License.

Acknowledgement

The jax repo is built onbig vision, and the pytorch repo is built onOpenCLIP.We've also borrowed some code fromTIMM andMAE.Many thanks to the awesome works from the open-source community!

We are also very grateful that this work is supported by a gift from Open Philanthropy, TPU Research Cloud (TRC) program, and Google Cloud Research Credits program.

Citation

@inproceedings{li2023clipa,      title={An Inverse Scaling Law for CLIP Training},       author={Xianhang Li and Zeyu Wang and Cihang Xie},      booktitle={NeurIPS},      year={2023},}@article{li2023clipav2,      title={CLIPA-v2: Scaling CLIP Training with 81.1% Zero-shot ImageNet Accuracy within a $10,000 Budget; An Extra $4,000 Unlocks 81.8% Accuracy},       author={Xianhang Li and Zeyu Wang and Cihang Xie},      journal={arXiv preprint arXiv:2306.15658},      year={2023},}

Contact

If you have any questions, please feel free to raise an issue or contact us directly:Xianhang Li:xli421@ucsc.edu;Zeyu Wang:zwang615@ucsc.edu

About

[NeurIPS 2023] This repository includes the official implementation of our paper "An Inverse Scaling Law for CLIP Training"

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors3

  •  
  •  
  •  

[8]ページ先頭

©2009-2025 Movatter.jp