tencent-ailab/IP-AdapterPublic

NotificationsYou must be signed in to change notification settings
Fork404
Star6.3k

The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.

License

Apache-2.0 license

6.3k stars 404 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 148 Commits
assets		assets
ip_adapter		ip_adapter
LICENSE		LICENSE
README.md		README.md
ip_adapter-full-face_demo.ipynb		ip_adapter-full-face_demo.ipynb
ip_adapter-plus-face_demo.ipynb		ip_adapter-plus-face_demo.ipynb
ip_adapter-plus_demo.ipynb		ip_adapter-plus_demo.ipynb
ip_adapter-plus_sdxl_demo.ipynb		ip_adapter-plus_sdxl_demo.ipynb
ip_adapter_controlnet_demo_new.ipynb		ip_adapter_controlnet_demo_new.ipynb
ip_adapter_demo.ipynb		ip_adapter_demo.ipynb
ip_adapter_multimodal_prompts_demo.ipynb		ip_adapter_multimodal_prompts_demo.ipynb
ip_adapter_sdxl_controlnet_demo.ipynb		ip_adapter_sdxl_controlnet_demo.ipynb
ip_adapter_sdxl_demo.ipynb		ip_adapter_sdxl_demo.ipynb
ip_adapter_sdxl_plus-face_demo.ipynb		ip_adapter_sdxl_plus-face_demo.ipynb
ip_adapter_t2i-adapter_demo.ipynb		ip_adapter_t2i-adapter_demo.ipynb
ip_adapter_t2i_demo.ipynb		ip_adapter_t2i_demo.ipynb
pyproject.toml		pyproject.toml
tutorial_train.py		tutorial_train.py
tutorial_train_faceid.py		tutorial_train_faceid.py
tutorial_train_plus.py		tutorial_train_plus.py
tutorial_train_sdxl.py		tutorial_train_sdxl.py
visualization_attnmap_faceid.ipynb		visualization_attnmap_faceid.ipynb
visualization_attnmap_sdxl_plus-face.ipynb		visualization_attnmap_sdxl_plus-face.ipynb

Repository files navigation

IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models

Introduction

we present IP-Adapter, an effective and lightweight adapter to achieve image prompt capability for the pre-trained text-to-image diffusion models. An IP-Adapter with only 22M parameters can achieve comparable or even better performance to a fine-tuned image prompt model. IP-Adapter can be generalized not only to other custom models fine-tuned from the same base model, but also to controllable generation using existing controllable tools. Moreover, the image prompt can also work well with the text prompt to accomplish multimodal image generation.

Release

[2024/01/19] 🔥 Add IP-Adapter-FaceID-Portrait, more information can be foundhere.
[2024/01/17] 🔥 Add an experimental version of IP-Adapter-FaceID-PlusV2 for SDXL, more information can be foundhere.
[2024/01/04] 🔥 Add an experimental version of IP-Adapter-FaceID for SDXL, more information can be foundhere.
[2023/12/29] 🔥 Add an experimental version of IP-Adapter-FaceID-PlusV2, more information can be foundhere.
[2023/12/27] 🔥 Add an experimental version of IP-Adapter-FaceID-Plus, more information can be foundhere.
[2023/12/20] 🔥 Add an experimental version of IP-Adapter-FaceID, more information can be foundhere.
[2023/11/22] IP-Adapter is available inDiffusers thanks to Diffusers Team.
[2023/11/10] 🔥 Add an updated version of IP-Adapter-Face. The demo ishere.
[2023/11/05] 🔥 Add text-to-imagedemo with IP-Adapter andKandinsky 2.2 Prior
[2023/11/02] Supportsafetensors
[2023/9/08] 🔥 Update a new version of IP-Adapter with SDXL_1.0. More information can be foundhere.
[2023/9/05] 🔥🔥🔥 IP-Adapter is supported inWebUI andComfyUI (orComfyUI_IPAdapter_plus).
[2023/8/30] 🔥 Add an IP-Adapter with face image as prompt. The demo ishere.
[2023/8/29] 🔥 Release the training code.
[2023/8/23] 🔥 Add code and models of IP-Adapter with fine-grained features. The demo ishere.
[2023/8/18] 🔥 Add code and models forSDXL 1.0. The demo ishere.
[2023/8/16] 🔥 We release the code and models.

Installation

# install latest diffuserspip install diffusers==0.22.1# install ip-adapterpip install git+https://github.com/tencent-ailab/IP-Adapter.git# download the modelscd IP-Adaptergit lfs installgit clone https://huggingface.co/h94/IP-Adaptermv IP-Adapter/models modelsmv IP-Adapter/sdxl_models sdxl_models# then you can use the notebook

Download Models

you can download models fromhere. To run the demo, you should also download the following models:

How to Use

SD_1.5

ip_adapter_demo: image variations, image-to-image, and inpainting with image prompt.

ip_adapter_controlnet_demo,ip_adapter_t2i-adapter: structural generation with image prompt.

ip_adapter_multimodal_prompts_demo: generation with multimodal prompts.

ip_adapter-plus_demo: the demo of IP-Adapter with fine-grained features.

ip_adapter-plus-face_demo: generation with face image as prompt.

Best Practice

If you only use the image prompt, you can set thescale=1.0 andtext_prompt=""(or some generic text prompts, e.g. "best quality", you can also use any negative text prompt). If you lower thescale, more diverse images can be generated, but they may not be as consistent with the image prompt.
For multimodal prompts, you can adjust thescale to get the best results. In most cases, settingscale=0.5 can get good results. For the version of SD 1.5, we recommend using community models to generate good images.

IP-Adapter for non-square images

As the image is center cropped in the default image processor of CLIP, IP-Adapter works best for square images. For the non square images, it will miss the information outside the center. But you can just resize to 224x224 for non-square images, the comparison is as follows:

SDXL_1.0

ip_adapter_sdxl_demo: image variations with image prompt.
ip_adapter_sdxl_controlnet_demo: structural generation with image prompt.

The comparison ofIP-Adapter_XL withReimagine XL is shown as follows:

Improvements in new version (2023.9.8):

Switch to CLIP-ViT-H: we trained the new IP-Adapter withOpenCLIP-ViT-H-14 instead ofOpenCLIP-ViT-bigG-14. Although ViT-bigG is much larger than ViT-H, our experimental results did not find a significant difference, and the smaller model can reduce the memory usage in the inference phase.
A Faster and better training recipe: In our previous version, training directly at a resolution of 1024x1024 proved to be highly inefficient. However, in the new version, we have implemented a more effective two-stage training strategy. Firstly, we perform pre-training at a resolution of 512x512. Then, we employ a multi-scale strategy for fine-tuning. (Maybe this training strategy can also be used to speed up the training of controlnet).

How to Train

For training, you should installaccelerate and make your own dataset into a json file.

accelerate launch --num_processes 8 --multi_gpu --mixed_precision "fp16" \  tutorial_train.py \  --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5/" \  --image_encoder_path="{image_encoder_path}" \  --data_json_file="{data.json}" \  --data_root_path="{image_path}" \  --mixed_precision="fp16" \  --resolution=512 \  --train_batch_size=8 \  --dataloader_num_workers=4 \  --learning_rate=1e-04 \  --weight_decay=0.01 \  --output_dir="{output_dir}" \  --save_steps=10000

Once training is complete, you can convert the weights with the following code:

importtorchckpt="checkpoint-50000/pytorch_model.bin"sd=torch.load(ckpt,map_location="cpu")image_proj_sd= {}ip_sd= {}forkinsd:ifk.startswith("unet"):passelifk.startswith("image_proj_model"):image_proj_sd[k.replace("image_proj_model.","")]=sd[k]elifk.startswith("adapter_modules"):ip_sd[k.replace("adapter_modules.","")]=sd[k]torch.save({"image_proj":image_proj_sd,"ip_adapter":ip_sd},"ip_adapter.bin")

Third-party Usage

IP-Adapter for WebUI [release notes]
IP-Adapter for ComfyUI [IPAdapter-ComfyUI orComfyUI_IPAdapter_plus]
IP-Adapter for InvokeAI [release notes]
IP-Adapter for AnimateDiff prompt travel
Diffusers_IPAdapter: more features such as supporting multiple input images
Official Diffusers
InstantStyle: Style transfer based on IP-Adapter

Disclaimer

This project strives to positively impact the domain of AI-driven image generation. Users are granted the freedom to create images using this tool, but they are expected to comply with local laws and utilize it in a responsible manner.The developers do not assume any responsibility for potential misuse by users.

Citation

If you find IP-Adapter useful for your research and applications, please cite using this BibTeX:

@article{ye2023ip-adapter,title={IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models},author={Ye, Hu and Zhang, Jun and Liu, Sibo and Han, Xiao and Yang, Wei},booktitle={arXiv preprint arxiv:2308.06721},year={2023}}

About

The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models

Introduction

Release

Installation

Download Models

How to Use

SD_1.5

SDXL_1.0

How to Train

Third-party Usage

Disclaimer

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Contributors16

Languages

Movatterモバイル変換

License

tencent-ailab/IP-Adapter

Folders and files

Latest commit

History

Repository files navigation

IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models

Introduction

Release

Installation

Download Models

How to Use

SD_1.5

SDXL_1.0

How to Train

Third-party Usage

Disclaimer

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Contributors16

Languages

Packages