Movatterモバイル変換

NotificationsYou must be signed in to change notification settings
Fork319
Star4.4k

Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network). Technique was originally created byhttps://twitter.com/advadnoun

License

MIT license

4.4k stars 319 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 231 Commits
.github/workflows		.github/workflows
deep_daze		deep_daze
instruction_images/Windows		instruction_images/Windows
samples		samples
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
setup.py		setup.py

Repository files navigation

Deep Daze

mist over green hills

shattered plates on the grass

cosmic love and attention

a time traveler in the crowd

life during the plague

meditative peace in a sunlit forest

a man painting a completely red image

a psychedelic experience on LSD

What is this?

Simple command line tool for text to image generation using OpenAI'sCLIP andSiren. Credit goes toRyan Murdock for the discovery of this technique (and for coming up with the great name)!

Original notebook

New simplified notebook

This will require that you have an Nvidia GPU or AMD GPU

Recommended: 16GB VRAM
Minimum Requirements: 4GB VRAM (Using VERY LOW settings, see usage instructions below)

Install

$ pip install deep-daze

Windows Install

Presuming Python is installed:

Open command prompt and navigate to the directory of your current version of Python

  pip install deep-daze

Examples

$ imagine"a house in the forest"

For Windows:

Open command prompt as administrator

  imagine"a house in the forest"

That's it.

If you have enough memory, you can get better quality by adding a--deeper flag

$ imagine"shattered plates on the ground" --deeper

Advanced

In true deep learning fashion, more layers will yield better results. Default is at16, but can be increased to32 depending on your resources.

$ imagine"stranger in strange lands" --num-layers 32

Usage

CLI

NAME imagineSYNOPSIS imagine TEXT<flags>POSITIONAL ARGUMENTS TEXT (required) A phrase less than 77 tokens which you would like to visualize.FLAGS --img=IMAGE_PATH Default: None Path to png/jpg image or PIL image to optimize on --encoding=ENCODING Default: None User-created custom CLIP encoding. If used, replaces any text or image that was used. --create_story=CREATE_STORY Default: False Creates a story by optimizing each epoch on a new sliding-window of the input words. If this is enabled, much longer texts than 77 tokens can be used. Requires save_progress to visualize the transitions of the story. --story_start_words=STORY_START_WORDS Default: 5 Only usedif create_story is True. How many words to optimize onfor the first epoch. --story_words_per_epoch=STORY_WORDS_PER_EPOCH Default: 5 Only usedif create_story is True. How many words to add to the optimization goal per epoch after the first one. --story_separator: Default: None Only usedif create_story is True. Defines a separator like'.' that splits the text into groupsforeach epoch. Separator needs to bein the text otherwise it will be ignored --lower_bound_cutout=LOWER_BOUND_CUTOUT Default: 0.1 Lower bound of the sampling of the size of the random cut-out of the SIREN image per batch. Should be smaller than 0.8. --upper_bound_cutout=UPPER_BOUND_CUTOUT Default: 1.0 Upper bound of the sampling of the size of the random cut-out of the SIREN image per batch. Should probably stay at 1.0. --saturate_bound=SATURATE_BOUND Default: False If True, the LOWER_BOUND_CUTOUT is linearly increased to 0.75 during training. --learning_rate=LEARNING_RATE Default: 1e-05 The learning rate of the neural net. --num_layers=NUM_LAYERS Default: 16 The number of hidden layers to usein the Siren neural net. --batch_size=BATCH_SIZE Default: 4 The number of generated images to pass into Siren before calculating loss. Decreasing this can lower memory and accuracy. --gradient_accumulate_every=GRADIENT_ACCUMULATE_EVERY Default: 4 Calculate a weighted loss of n samplesfor each iteration. Increasing this canhelp increase accuracy with lower batch sizes. --epochs=EPOCHS Default: 20 The number of epochs to run. --iterations=ITERATIONS Default: 1050 The number oftimes to calculate and backpropagate lossin a given epoch. --save_every=SAVE_EVERY Default: 100 Generate an image everytime iterations is a multiple of this number. --image_width=IMAGE_WIDTH Default: 512 The desired resolution of the image. --deeper=DEEPER Default: False Uses a Siren neural net with 32 hidden layers. --overwrite=OVERWRITE Default: False Whether or not to overwrite existing generated images of the same name. --save_progress=SAVE_PROGRESS Default: False Whether or not to save images generated before training Siren is complete. --seed=SEED Type: Optional[] Default: None A seed to be usedfor deterministic runs. --open_folder=OPEN_FOLDER Default: True Whether or not to open a folder showing your generated images. --save_date_time=SAVE_DATE_TIME Default: False Save files with a timestamp prepended e.g.`%y%m%d-%H%M%S-my_phrase_here` --start_image_path=START_IMAGE_PATH Default: None The generator is trained first on a starting image before steered towards the textual input --start_image_train_iters=START_IMAGE_TRAIN_ITERS Default: 50 The number of stepsfor the initial training on the starting image --theta_initial=THETA_INITIAL Default: 30.0 Hyperparameter describing the frequency of the color space. Only applies to the first layer of the network. --theta_hidden=THETA_INITIAL Default: 30.0 Hyperparameter describing the frequency of the color space. Only applies to the hidden layers of the network. --save_gif=SAVE_GIF Default: False Whether or not to save a GIF animation of the generation procedure. Only worksif save_progress isset to True.

Priming

Technique first devised and shared byMario Klingemann, it allows you to prime the generator network with a starting image, before being steered towards the text.

Simply specify the path to the image you wish to use, and optionally the number of initial training steps.

$ imagine'a clear night sky filled with stars' --start_image_path ./cloudy-night-sky.jpg

Primed starting image

Then trained with the promptA pizza with green pepper.

Optimize for the interpretation of an image

We can also feed in an image as an optimization goal, instead of only priming the generator network. Deepdaze will then render its own interpretation of that image:

$ imagine --img samples/Autumn_1875_Frederic_Edwin_Church.jpg

Original image:

The network's interpretation:

Original image:

The network's interpretation:

Optimize for text and image combined

$ imagine"A psychedelic experience." --img samples/hot-dog.jpg

The network's interpretation:

New: Create a story

The regular mode for texts only allows 77 tokens. If you want to visualize a full story/paragraph/song/poem, setcreate_story toTrue.

Given the poem “Stopping by Woods On a Snowy Evening” by Robert Frost -"Whose woods these are I think I know. His house is in the village though; He will not see me stopping here To watch his woods fill up with snow. My little horse must think it queer To stop without a farmhouse near Between the woods and frozen lake The darkest evening of the year. He gives his harness bells a shake To ask if there is some mistake. The only other sound’s the sweep Of easy wind and downy flake. The woods are lovely, dark and deep, But I have promises to keep, And miles to go before I sleep, And miles to go before I sleep.".

We get:

Whose_woods_these_are_I_think_I_know._His_house_is_in_the_village_though._He_.mp4

Python

Invoke`deep_daze.Imagine` in Python

fromdeep_dazeimportImagineimagine=Imagine(text='cosmic love and attention',num_layers=24,)imagine()

Save progress every fourth iteration

Save images in the format insert_text_here.00001.png, insert_text_here.00002.png, ...up to(total_iterations % save_every)

imagine=Imagine(text=text,save_every=4,save_progress=True)

Prepend current timestamp on each image.

Creates files with both the timestamp and the sequence number.

e.g. 210129-043928_328751_insert_text_here.00001.png, 210129-043928_512351_insert_text_here.00002.png, ...

imagine=Imagine(text=text,save_every=4,save_progress=True,save_date_time=True,)

High GPU memory usage

If you have at least 16 GiB of vram available, you should be able to run these settings with some wiggle room.

imagine=Imagine(text=text,num_layers=42,batch_size=64,gradient_accumulate_every=1,)

Average GPU memory usage

imagine=Imagine(text=text,num_layers=24,batch_size=16,gradient_accumulate_every=2)

Very low GPU memory usage (less than 4 GiB)

If you are desperate to run this on a card with less than 8 GiB vram, you can lower the image_width.

imagine=Imagine(text=text,image_width=256,num_layers=16,batch_size=1,gradient_accumulate_every=16# Increase gradient_accumulate_every to correct for loss in low batch sizes)

VRAM and speed benchmarks:

These experiments were conducted with a 2060 Super RTX and a 3700X Ryzen 5. We first mention the parameters (bs = batch size), then the memory usage and in some cases the training iterations per second:

For an image resolution of 512:

bs 1, num_layers 22: 7.96 GB
bs 2, num_layers 20: 7.5 GB
bs 16, num_layers 16: 6.5 GB

For an image resolution of 256:

bs 8, num_layers 48: 5.3 GB
bs 16, num_layers 48: 5.46 GB - 2.0 it/s
bs 32, num_layers 48: 5.92 GB - 1.67 it/s
bs 8, num_layers 44: 5 GB - 2.39 it/s
bs 32, num_layers 44, grad_acc 1: 5.62 GB - 4.83 it/s
bs 96, num_layers 44, grad_acc 1: 7.51 GB - 2.77 it/s
bs 32, num_layers 66, grad_acc 1: 7.09 GB - 3.7 it/s

@NotNANtoN recommends a batch size of 32 with 44 layers and training 1-8 epochs.

Where is this going?

This is just a teaser. We will be able to generate images, sound, anything at will, with natural language. The holodeck is about to become real in our lifetimes.

Please join replication efforts for DALL-E forPytorch orMesh Tensorflow if you are interested in furthering this technology.

Alternatives

Big Sleep - CLIP and the generator from Big GAN

Citations

@misc{unpublished2021clip,title  ={CLIP: Connecting Text and Images},author ={Alec Radford, Ilya Sutskever, Jong Wook Kim, Gretchen Krueger, Sandhini Agarwal},year   ={2021}}

@misc{sitzmann2020implicit,title   ={Implicit Neural Representations with Periodic Activation Functions},author  ={Vincent Sitzmann and Julien N. P. Martel and Alexander W. Bergman and David B. Lindell and Gordon Wetzstein},year    ={2020},eprint  ={2006.09661},archivePrefix ={arXiv},primaryClass ={cs.CV}}

About

Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network). Technique was originally created byhttps://twitter.com/advadnoun

Releases67

0.11.1 Latest

Mar 13, 2022

+ 66 releases

Packages

No packages published

Contributors12

Languages

Python100.0%

Movatterモバイル変換

License

lucidrains/deep-daze

Folders and files

Latest commit

History

Repository files navigation

Deep Daze

What is this?

Install

Windows Install

Examples

Advanced

Usage

CLI

Priming

Optimize for the interpretation of an image

Optimize for text and image combined

New: Create a story

Python

Invokedeep_daze.Imagine in Python

Save progress every fourth iteration

Prepend current timestamp on each image.

High GPU memory usage

Average GPU memory usage

Very low GPU memory usage (less than 4 GiB)

VRAM and speed benchmarks:

Where is this going?

Alternatives

Citations

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases67

Packages0

Uh oh!

Contributors12

Uh oh!

Languages

Invoke`deep_daze.Imagine` in Python

Packages