- Notifications
You must be signed in to change notification settings - Fork14
[NeurIPS 2023] This repository includes the official implementation of our paper "An Inverse Scaling Law for CLIP Training"
License
UCSC-VLAA/CLIPA
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
This repo contains official Pytorch and JAX implementation ofCLIPA in our paper:An Inverse Scaling Law for CLIP Training
Overview of the Inverse Scaling Law: larger image/text encodersenable training with fewer image/text tokens while maintaining competitive performance
[2023.10.4] We have achieved a successful scale-up of our model tobigG/14, attaining an impressive83.0% zero-shot top-1 accuracy on the ImageNet-1K dataset.For the detailed training configuration, please refer to thet5x branch. Additionally, you can access the pre-trained and fine-tuned weights for both JAX and PyTorch version in theGoogle Drive.
[2023.9.21]Our paper is accepted byNeurIPS 2023!
[2023.6.16] We releaseCLIPA-v2. Compared to the prior best publicly available CLIP model, our CLIPA-v2 can be trained significantly faster and yields stronger performance. Our best model is H/14@336x336 on DataComp-1B with an accuracy of 81.8, and its estimated training cost is <$15k!
[Note] All of ourCLIPA-v2 models were trained on TPU using ourJAX codebase. We followed the same pre-training process as CLIPA-v1, but with a more efficient fine-tuning strategy. In order to replicate our results, we have provided thetraining configuration (e.g. the H-14 model in this folderhere), along with thepre-trained weights, configuration, and logs, which can be foundhere.
| data | Schedule | GPU Hours | Estimated Cost | zero-shot IN-1K | model weight | |
|---|---|---|---|---|---|---|
| H/14 | LAION-2B | 12.8B@84 + 512M@224 + 128M@336 | 8640 | $13613 | 79.1 | PyTorch /JAX |
| L/14 | DataCOMP-1B | 12.8B@84 + 512M@224 +128M@336 | 4520 | $7124 | 80.3 | PyTorch /JAX |
| H/14 | DataCOMP-1B | 12.8B@84 + 512M@224 + 128M@336 | 8640 | $13613 | 81.8 | PyTorch /JAX |
| bigG/14 | DataCOMP-1B | 12.8B@84 + 512M@224 + 128M@336 | 23742 | $39056 | 83.0 | PyTorch /JAX |
Our CLIPA-v2’s GPU hour is estimated using an 8-A100 80GB GPU machine on Google Cloud.The corresponding training cost is estimated based on 80GB A100’s cloud pricing.
CLIP, the first foundation model that connects images and text, has enabled many recent breakthroughs in computer vision.However, its associated training cost is prohibitively high, imposing a significant barrier to its widespread exploration.In this paper, we present a surprising finding that there exists aninverse scaling law for CLIP training,whereby the larger the image/text encoders used, the shorter the sequence length of image/text tokens that can be applied in training.Moreover, we showcase that the strategy for reducing image/text token length plays a crucial role in determining the quality of this scaling law.
As a result of this finding, we are able to successfully train CLIP even by using academic resources.For example, on an A100 eight-GPU server, our CLIP models achieve zero-shot top-1 ImageNet accuracies of63.2% in about2 days,67.8% in about3 days, and69.3% in about4 days.By reducing the computation barrier associated with CLIP, we hope to inspire more research in this field, particularly from academics.
Our experiments are conducted on both GPUs and TPUs. Both the JAX and PyTorch implementations enable TPU training.But how to gain access and setup TPU machines? Check thisbrief doc.In a nutshell, you can access TPU machines on Google Cloudfor free!
This project is under the Apache 2.0 License.
The jax repo is built onbig vision, and the pytorch repo is built onOpenCLIP.We've also borrowed some code fromTIMM andMAE.Many thanks to the awesome works from the open-source community!
We are also very grateful that this work is supported by a gift from Open Philanthropy, TPU Research Cloud (TRC) program, and Google Cloud Research Credits program.
@inproceedings{li2023clipa, title={An Inverse Scaling Law for CLIP Training}, author={Xianhang Li and Zeyu Wang and Cihang Xie}, booktitle={NeurIPS}, year={2023},}@article{li2023clipav2, title={CLIPA-v2: Scaling CLIP Training with 81.1% Zero-shot ImageNet Accuracy within a $10,000 Budget; An Extra $4,000 Unlocks 81.8% Accuracy}, author={Xianhang Li and Zeyu Wang and Cihang Xie}, journal={arXiv preprint arXiv:2306.15658}, year={2023},}If you have any questions, please feel free to raise an issue or contact us directly:Xianhang Li:xli421@ucsc.edu;Zeyu Wang:zwang615@ucsc.edu
About
[NeurIPS 2023] This repository includes the official implementation of our paper "An Inverse Scaling Law for CLIP Training"
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
Packages0
Uh oh!
There was an error while loading.Please reload this page.
Contributors3
Uh oh!
There was an error while loading.Please reload this page.

