Movatterモバイル変換

jina-ai/dalle-flowPublic

NotificationsYou must be signed in to change notification settings
Fork212
Star2.8k

🌊 A Human-in-the-Loop workflow for creating HD images from text

grpcs://dalle-flow.dev.jina.ai

2.8k stars 212 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 216 Commits
.github		.github
executors		executors
k8s_flow		k8s_flow
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
client.ipynb		client.ipynb
flow-jcloud.yml		flow-jcloud.yml
flow.yml		flow.yml
flow_parser.py		flow_parser.py
requirements.txt		requirements.txt
start.sh		start.sh

Repository files navigation

A Human-in-the-loop^? workflow for creating HD images from text

DALL·E Flow is an interactive workflow for generating high-definition images from text prompt. First, it leveragesDALL·E-Mega,GLID-3 XL, andStable Diffusion to generate image candidates, and then callsCLIP-as-service to rank the candidates w.r.t. the prompt. The preferred candidate is fed toGLID-3 XL for diffusion, which often enriches the texture and background. Finally, the candidate is upscaled to 1024x1024 viaSwinIR.

DALL·E Flow is built withJina in a client-server architecture, which gives it high scalability, non-blocking streaming, and a modern Pythonic interface. Client can interact with the server via gRPC/Websocket/HTTP with TLS.

Why Human-in-the-loop? Generative art is a creative process. While recent advances of DALL·E unleash people's creativity, having a single-prompt-single-output UX/UI locks the imagination to asingle possibility, which is bad no matter how fine this single result is. DALL·E Flow is an alternative to the one-liner, by formalizing the generative art as an iterative procedure.

Usage

DALL·E Flow is in client-server architecture.

Updates

🌟2022/10/27RealESRGAN upscalers have been added.
⚠️2022/10/26 To use CLIP-as-service available atgrpcs://api.clip.jina.ai:2096 (requiresjina >= v3.11.0), you need first get an access token fromhere. SeeUse the CLIP-as-service for more details.
🌟2022/9/25 AutomatedCLIP-based segmentation from a prompt has been added.
🌟2022/8/17 Text to image forStable Diffusion has been added. In order to use it you will need to agree to their ToS, download the weights, then enable the flag in docker orflow_parser.py.
⚠️2022/8/8 Started using CLIP-as-service as anexternal executor. Now you can easilydeploy your own CLIP executor if you want. There isa small breaking change as a result of this improvement, sopleasereopen the notebook in Google Colab.
⚠️2022/7/6 Demo server migration to AWS EKS for better availability and robustness,server URL is now changing togrpcs://dalle-flow.dev.jina.ai. All connections are now with TLS encryption,pleasereopen the notebook in Google Colab.
⚠️2022/6/25 Unexpected downtime between 6/25 0:00 - 12:00 CET due to out of GPU quotas. The new server now has 2 GPUs, add healthcheck in client notebook.
2022/6/3 Reduce default number of images to 2 per pathway, 4 for diffusion.
🐳2022/6/21A prebuilt image is now available on Docker Hub! This image can be run out-of-the-box on CUDA 11.6. Fix an upstream bug in CLIP-as-service.
⚠️2022/5/23 Fix an upstream bug in CLIP-as-service. This bug makes the 2nd diffusion step irrelevant to the given texts. New Dockerfile proved to be reproducible on a AWS EC2p2.x8large instance.
2022/5/13b Removing TLS as Cloudflare gives 100s timeout, making DALLE Flow in usablePleasereopen the notebook in Google Colab!.
🔐2022/5/13 New Mega checkpoint! All connections are now with TLS,Pleasereopen the notebook in Google Colab!.
🐳2022/5/10A Dockerfile is added! Now you can easily deploy your own DALL·E Flow. New Mega checkpoint! Smaller memory-footprint, the whole Flow can now fit intoone GPU with 21GB memory.
🌟2022/5/7 New Mega checkpoint & multiple optimization on GLID3: less memory-footprint, useViT-L/14@336px from CLIP-as-service,steps 100->200.
🌟2022/5/6 DALL·E Flow just got updated!Pleasereopen the notebook in Google Colab!
- Revised the first step: 16 candidates are generated, 8 from DALL·E Mega, 8 from GLID3-XL; then ranked by CLIP-as-service.
- Improved the flow efficiency: the overall speed, including diffusion and upscaling are much faster now!

Gallery

Client

Using client is super easy. The following steps are best run inJupyter notebook orGoogle Colab.

You will need to installDocArray andJina first:

pip install"docarray[common]>=0.13.5" jina

We have provided a demo server for you to play:

⚠️Due to the massive requests, our server may be delay in response. Yet we arevery confident on keeping the uptime high. You can also deploy your own server byfollowing the instruction here.

server_url='grpcs://dalle-flow.dev.jina.ai'

Step 1: Generate via DALL·E Mega

Now let's define the prompt:

prompt='an oil painting of a humanoid robot playing chess in the style of Matisse'

Let's submit it to the server and visualize the results:

fromdocarrayimportDocumentdoc=Document(text=prompt).post(server_url,parameters={'num_images':8})da=doc.matchesda.plot_image_sprites(fig_size=(10,10),show_index=True)

Here we generate 24 candidates, 8 from DALLE-mega, 8 from GLID3 XL, and 8 from Stable Diffusion, this is as defined innum_images, which takes about ~2 minutes. You can use a smaller value if it is too long for you.

Step 2: Select and refinement via GLID3 XL

The 24 candidates are sorted byCLIP-as-service, with index-0 as the best candidate judged by CLIP. Of course, you may think differently. Notice the number in the top-left corner? Select the one you like the most and get a better view:

fav_id=3fav=da[fav_id]fav.embedding=doc.embeddingfav.display()

Now let's submit the selected candidates to the server for diffusion.

diffused=fav.post(f'{server_url}',parameters={'skip_rate':0.5,'num_images':36},target_executor='diffusion').matchesdiffused.plot_image_sprites(fig_size=(10,10),show_index=True)

This will give 36 images based on the selected image. You may allow the model to improvise more by givingskip_rate a near-zero value, or a near-one value to force its closeness to the given image. The whole procedure takes about ~2 minutes.

Step 3: Select and upscale via SwinIR

Select the image you like the most, and give it a closer look:

dfav_id=34fav=diffused[dfav_id]fav.display()

Finally, submit to the server for the last step: upscaling to 1024 x 1024px.

fav=fav.post(f'{server_url}/upscale')fav.display()

That's it! It isthe one. If not satisfied, please repeat the procedure.

Btw, DocArray is a powerful and easy-to-use data structure for unstructured data. It is super productive for data scientists who work in cross-/multi-modal domain. To learn more about DocArray,please check out the docs.

Server

You can host your own server by following the instruction below.

Hardware requirements

DALL·E Flow needs one GPU with 21GB VRAM at its peak. All services are squeezed into this one GPU, this includes (roughly)

DALLE ~9GB
GLID Diffusion ~6GB
Stable Diffusion ~8GB (batch_size=4 inconfig.yml, 512x512)
SwinIR ~3GB
CLIP ViT-L/14-336px ~3GB

The following reasonable tricks can be used for further reducing VRAM:

SwinIR can be moved to CPU (-3GB)
CLIP can be delegated toCLIP-as-service free server (-3GB)

It requires at least 50GB free space on the hard drive, mostly for downloading pretrained models.

High-speed internet is required. Slow/unstable internet may throw frustrating timeout when downloading models.

CPU-only environment is not tested and likely won't work. Google Colab is likely throwing OOM hence also won't work.

Server architecture

If you have installed Jina, the above flowchart can be generated via:

# pip install jinajinaexport flowchart flow.yml flow.svg

Stable Diffusion weights

If you want to use Stable Diffusion, you will first need to register an account on the websiteHuggingface and agree to the terms and conditions for the model. After logging in, you can find the version of the model required by going here:

CompVis / sd-v1-5-inpainting.ckpt

Under theDownload the Weights section, click the link forsd-v1-x.ckpt. The latest weights at the time of writing aresd-v1-5.ckpt.

DOCKER USERS: Put this file into a folder namedldm/stable-diffusion-v1 and rename itmodel.ckpt. Follow the instructions below carefully because SD is not enabled by default.

NATIVE USERS: Put this file intodalle/stable-diffusion/models/ldm/stable-diffusion-v1/model.ckpt after finishing the rest of the steps under "Run natively". Follow the instructions below carefully because SD is not enabled by default.

Run in Docker

Prebuilt image

We have provideda prebuilt Docker image that can be pull directly.

docker pull jinaai/dalle-flow:latest

Build it yourself

We have provideda Dockerfile which allows you to run a server out of the box.

Our Dockerfile is using CUDA 11.6 as the base image, you may want to adjust it according to your system.

git clone https://github.com/jina-ai/dalle-flow.gitcd dalle-flowdocker build --build-arg GROUP_ID=$(id -g${USER}) --build-arg USER_ID=$(id -u${USER}) -t jinaai/dalle-flow.

The building will take 10 minutes with average internet speed, which results in a 18GB Docker image.

Run container

To run it, simply do:

docker run -p 51005:51005 \  -it \  -v$HOME/.cache:/home/dalle/.cache \  --gpus all \  jinaai/dalle-flow

Alternatively, you may also run with some workflows enabled or disabled to prevent out-of-memory crashes. To do that, pass one of these environment variables:

DISABLE_DALLE_MEGADISABLE_GLID3XLDISABLE_SWINIRENABLE_STABLE_DIFFUSIONENABLE_CLIPSEGENABLE_REALESRGAN

For example, if you would like to disable GLID3XL workflows, run:

docker run -e DISABLE_GLID3XL='1' \  -p 51005:51005 \  -it \  -v$HOME/.cache:/home/dalle/.cache \  --gpus all \  jinaai/dalle-flow

The first run will take ~10 minutes with average internet speed.
-v $HOME/.cache:/root/.cache avoids repeated model downloading on every docker run.
The first part of-p 51005:51005 is your host public port. Make sure people can access this port if you are serving publicly. The second par of it isthe port defined in flow.yml.
If you want to use Stable Diffusion, it must be enabled manually with theENABLE_STABLE_DIFFUSION.
If you want to use clipseg, it must be enabled manually with theENABLE_CLIPSEG.
If you want to use RealESRGAN, it must be enabled manually with theENABLE_REALESRGAN.

Special instructions for Stable Diffusion and Docker

Stable Diffusion may only be enabled if you have downloaded the weights and make them available as a virtual volume while enabling the environmental flag (ENABLE_STABLE_DIFFUSION) for SD.

You should have previously put the weights into a folder namedldm/stable-diffusion-v1 and labeled themmodel.ckpt. ReplaceYOUR_MODEL_PATH/ldm below with the path on your own system to pipe the weights into the docker image.

docker run -e ENABLE_STABLE_DIFFUSION="1" \  -e DISABLE_DALLE_MEGA="1" \  -e DISABLE_GLID3XL="1" \  -p 51005:51005 \  -it \  -v YOUR_MODEL_PATH/ldm:/dalle/stable-diffusion/models/ldm/ \  -v$HOME/.cache:/home/dalle/.cache \  --gpus all \  jinaai/dalle-flow

You should see the screen like following once running:

Note that unlike running natively, running inside Docker may give less vivid progressbar, color logs, and prints. This is due to the limitations of the terminal in a Docker container. It does not affect the actual usage.

Run natively

Running natively requires some manual steps, but it is often easier to debug.

Clone repos

mkdir dalle&&cd dallegit clone https://github.com/jina-ai/dalle-flow.gitgit clone https://github.com/jina-ai/SwinIR.gitgit clone --branch v0.0.15 https://github.com/AmericanPresidentJimmyCarter/stable-diffusion.gitgit clone https://github.com/CompVis/latent-diffusion.gitgit clone https://github.com/jina-ai/glid-3-xl.gitgit clone https://github.com/timojl/clipseg.git

You should have the following folder structure:

dalle/ | |-- Real-ESRGAN/ |-- SwinIR/ |-- clipseg/ |-- dalle-flow/ |-- glid-3-xl/ |-- latent-diffusion/ |-- stable-diffusion/

Install auxiliary repos

cd dalle-flowpython3 -m virtualenv envsource env/bin/activate&&cd -pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116pip install numpy tqdm pytorch_lightning einops numpy omegaconfpip install https://github.com/crowsonkb/k-diffusion/archive/master.zippip install git+https://github.com/AmericanPresidentJimmyCarter/stable-diffusion.git@v0.0.15pip install basicsr facexlib gfpganpip install realesrganpip install https://github.com/AmericanPresidentJimmyCarter/xformers-builds/raw/master/cu116/xformers-0.0.14.dev0-cp310-cp310-linux_x86_64.whl&& \cd latent-diffusion&& pip install -e.&&cd -cd stable-diffusion&& pip install -e.&&cd -cd SwinIR&& pip install -e.&&cd -cd glid-3-xl&& pip install -e.&&cd -cd clipseg&& pip install -e.&&cd -

There are couple models we need to download for GLID-3-XL if you are using that:

cd glid-3-xlwget https://dall-3.com/models/glid-3-xl/bert.ptwget https://dall-3.com/models/glid-3-xl/kl-f8.ptwget https://dall-3.com/models/glid-3-xl/finetune.ptcd -

Bothclipseg andRealESRGAN require you to set a correct cache folder path,typically something like $HOME/.

Install flow

cd dalle-flowpip install -r requirements.txtpip install jax~=0.3.24

Start the server

Now you are underdalle-flow/, run the following command:

# Optionally disable some generative models with the following flags when# using flow_parser.py:# --disable-dalle-mega# --disable-glid3xl# --disable-swinir# --enable-stable-diffusionpython flow_parser.pyjina flow --uses flow.tmp.yml

You should see this screen immediately:

On the first start it will take ~8 minutes for downloading the DALL·E mega model and other necessary models. The proceeding runs should only take ~1 minute to reach the success message.

When everything is ready, you will see:

Congrats! Now you should be able torun the client.

You can modify and extend the server flow as you like, e.g. changing the model, adding persistence, or even auto-posting to Instagram/OpenSea. With Jina and DocArray, you can easily make DALL·E Flowcloud-native and ready for production.

Use the CLIP-as-service

To reduce the usage of vRAM, you can use theCLIP-as-service as an external executor freely available atgrpcs://api.clip.jina.ai:2096.
First, make sure you have created an access token fromconsole website, or CLI as following

jina auth token create<name of PAT> -e<expiration days>

Then, you need to change the executor related configs (host,port,external,tls andgrpc_metadata) fromflow.yml.

...  -name:clip_encoderuses:jinahub+docker://CLIPTorchEncoder/latest-gpuhost:'api.clip.jina.ai'port:2096tls:trueexternal:truegrpc_metadata:authorization:"<your access token>"needs:[gateway]...  -name:rerankuses:jinahub+docker://CLIPTorchEncoder/latest-gpuhost:'api.clip.jina.ai'port:2096uses_requests:'/':ranktls:trueexternal:truegrpc_metadata:authorization:"<your access token>"needs:[dalle, diffusion]

You can also use theflow_parser.py to automatically generate and run the flow with using theCLIP-as-service as external executor:

python flow_parser.py --cas-token"<your access token>'jina flow --uses flow.tmp.yml

⚠️grpc_metadata is only available after Jinav3.11.0. If you are using an older version, please upgrade to the latest version.

Now, you can use the freeCLIP-as-service in your flow.

Support

To extend DALL·E Flow you will need to get familiar withJina andDocArray.
Join ourDiscord community and chat with other community members about ideas.
Join ourEngineering All Hands meet-up to discuss your use case and learn Jina's new features.
- When? The second Tuesday of every month
- Where?Zoom (see our public events calendar/.ical)andlive stream on YouTube
Subscribe to the latest video tutorials on ourYouTube channel

Join Us

DALL·E Flow is backed byJina AI and licensed underApache-2.0.We are actively hiring AI engineers, solution engineers to build the next neural search ecosystem in open-source.

About

🌊 A Human-in-the-Loop workflow for creating HD images from text

grpcs://dalle-flow.dev.jina.ai

Movatterモバイル変換

jina-ai/dalle-flow

Folders and files

Latest commit

History

Repository files navigation

Usage

Updates

Gallery

Client

Step 1: Generate via DALL·E Mega

Step 2: Select and refinement via GLID3 XL

Step 3: Select and upscale via SwinIR

Server

Hardware requirements

Server architecture

Stable Diffusion weights

Run in Docker

Prebuilt image

Build it yourself

Run container

Special instructions for Stable Diffusion and Docker

Run natively

Clone repos

Install auxiliary repos

Install flow

Start the server

Use the CLIP-as-service

Support

Join Us

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors14

Uh oh!

Languages