UCSC-VLAA/CLIPAPublic

NotificationsYou must be signed in to change notification settings
Fork14
Star320

[NeurIPS 2023] This repository includes the official implementation of our paper "An Inverse Scaling Law for CLIP Training"

License

Apache-2.0 license

320 stars 14 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 79 Commits
clipa_jax		clipa_jax
clipa_torch		clipa_torch
data		data
.DS_Store		.DS_Store
LICENSE.md		LICENSE.md
README.MD		README.MD
TPU_USAGE.md		TPU_USAGE.md

Repository files navigation

An Inverse Scaling Law for CLIP Training

This repo contains official Pytorch and JAX implementation ofCLIPA in our paper:An Inverse Scaling Law for CLIP Training

Overview of the Inverse Scaling Law: larger image/text encodersenable training with fewer image/text tokens while maintaining competitive performance

📰 News

[2023.10.4] We have achieved a successful scale-up of our model tobigG/14, attaining an impressive83.0% zero-shot top-1 accuracy on the ImageNet-1K dataset.For the detailed training configuration, please refer to thet5x branch. Additionally, you can access the pre-trained and fine-tuned weights for both JAX and PyTorch version in theGoogle Drive.

[2023.9.21]Our paper is accepted byNeurIPS 2023!

[2023.6.16] We releaseCLIPA-v2. Compared to the prior best publicly available CLIP model, our CLIPA-v2 can be trained significantly faster and yields stronger performance. Our best model is H/14@336x336 on DataComp-1B with an accuracy of 81.8, and its estimated training cost is <$15k!

[Note] All of ourCLIPA-v2 models were trained on TPU using ourJAX codebase. We followed the same pre-training process as CLIPA-v1, but with a more efficient fine-tuning strategy. In order to replicate our results, we have provided thetraining configuration (e.g. the H-14 model in this folderhere), along with thepre-trained weights, configuration, and logs, which can be foundhere.

	data	Schedule	GPU Hours	Estimated Cost	zero-shot IN-1K	model weight
H/14	LAION-2B	12.8B@84 + 512M@224 + 128M@336	8640	$13613	79.1	PyTorch /JAX
L/14	DataCOMP-1B	12.8B@84 + 512M@224 +128M@336	4520	$7124	80.3	PyTorch /JAX
H/14	DataCOMP-1B	12.8B@84 + 512M@224 + 128M@336	8640	$13613	81.8	PyTorch /JAX
bigG/14	DataCOMP-1B	12.8B@84 + 512M@224 + 128M@336	23742	$39056	83.0	PyTorch /JAX

Our CLIPA-v2’s GPU hour is estimated using an 8-A100 80GB GPU machine on Google Cloud.The corresponding training cost is estimated based on 80GB A100’s cloud pricing.

Introduction

CLIP, the first foundation model that connects images and text, has enabled many recent breakthroughs in computer vision.However, its associated training cost is prohibitively high, imposing a significant barrier to its widespread exploration.In this paper, we present a surprising finding that there exists aninverse scaling law for CLIP training,whereby the larger the image/text encoders used, the shorter the sequence length of image/text tokens that can be applied in training.Moreover, we showcase that the strategy for reducing image/text token length plays a crucial role in determining the quality of this scaling law.

As a result of this finding, we are able to successfully train CLIP even by using academic resources.For example, on an A100 eight-GPU server, our CLIP models achieve zero-shot top-1 ImageNet accuracies of63.2% in about2 days,67.8% in about3 days, and69.3% in about4 days.By reducing the computation barrier associated with CLIP, we hope to inspire more research in this field, particularly from academics.

TPU Usage

Our experiments are conducted on both GPUs and TPUs. Both the JAX and PyTorch implementations enable TPU training.But how to gain access and setup TPU machines? Check thisbrief doc.In a nutshell, you can access TPU machines on Google Cloudfor free!

License

This project is under the Apache 2.0 License.

Acknowledgement

The jax repo is built onbig vision, and the pytorch repo is built onOpenCLIP.We've also borrowed some code fromTIMM andMAE.Many thanks to the awesome works from the open-source community!

We are also very grateful that this work is supported by a gift from Open Philanthropy, TPU Research Cloud (TRC) program, and Google Cloud Research Credits program.

Citation

@inproceedings{li2023clipa,      title={An Inverse Scaling Law for CLIP Training},       author={Xianhang Li and Zeyu Wang and Cihang Xie},      booktitle={NeurIPS},      year={2023},}@article{li2023clipav2,      title={CLIPA-v2: Scaling CLIP Training with 81.1% Zero-shot ImageNet Accuracy within a $10,000 Budget; An Extra $4,000 Unlocks 81.8% Accuracy},       author={Xianhang Li and Zeyu Wang and Cihang Xie},      journal={arXiv preprint arXiv:2306.15658},      year={2023},}

Contact

If you have any questions, please feel free to raise an issue or contact us directly:Xianhang Li:xli421@ucsc.edu;Zeyu Wang:zwang615@ucsc.edu

About

[NeurIPS 2023] This repository includes the official implementation of our paper "An Inverse Scaling Law for CLIP Training"

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

An Inverse Scaling Law for CLIP Training

📰 News

Introduction

TPU Usage

License

Acknowledgement

Citation

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Contributors3

Uh oh!

Languages

Movatterモバイル変換

License

UCSC-VLAA/CLIPA

Folders and files

Latest commit

History

Repository files navigation

An Inverse Scaling Law for CLIP Training

📰 News

Introduction

TPU Usage

License

Acknowledgement

Citation

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Contributors3

Uh oh!

Languages

Packages