Implements the Align Your Steps noise schedule as described here:https://research.nvidia.com/labs/toronto-ai/AlignYourSteps/howto.html. This includes the sigmas for SDXL and SD 1.5, as well as the recommended interpolation for using larger step sizes.

Description

According to the original work (https://research.nvidia.com/labs/toronto-ai/AlignYourSteps/), AYS can provide better image quality over schedulers such as Karras and Exponential at low step counts (~10). This does appear to bear out in limited testing as can be seen below, though in some cases (such as the tower), it's debatable. It's certain not a panacea; you'll still want to perform at least 15 samples for more consistent, coherent images.

Note I've used 11 steps in the examples below to account for the appending of zero to the sigmas, which is consistent with other schedulers. The alternative would be to truncate/replace the final sigma with zero, but that doesn't seem correct.

Screenshots/videos:

Checklist:

I have readcontributing wiki page
I have performed a self-review of my own code
My code follows thestyle guidelines
My code passestests

Add Align Your Steps to available schedulers

73d1caf

* Include both SDXL and SD 1.5 variants (https://research.nvidia.com/labs/toronto-ai/AlignYourSteps/howto.html)

LoganBooker requested a review fromAUTOMATIC1111 as acode owner

May 10, 2024 03:15

missionfloyd reviewed

May 10, 2024

View reviewed changes

modules/sd_schedulers.py OutdatedShow resolvedHide resolved

Use shared.sd_model.is_sdxl to determine base AYS sigmas

d6b4444

Copy link

AG-w commentedMay 10, 2024

what's the difference between this?
#15608

I see this pull request use quick start guide numbers and drhead implemented Theorem 3.1 in the paper

Copy link

ContributorAuthor

LoganBooker commentedMay 10, 2024

@AG-w I think the main difference is that this implements the schedule as recommended by the authors. My understanding from reading the material is that the provided schedules are the optimized ones using the techniques described in the paper (https://arxiv.org/pdf/2404.14507). The section "B.1. Practical Implementation Details" explains in more detail.

Happy to be corrected if I've misinterpreted or missed anything.

blob42 mentioned this pull request

May 11, 2024

implement align your steps schedulerlllyasviel/stable-diffusion-webui-forge#726

Merged

4 tasks

Panchovix mentioned this pull request

May 17, 2024

[Performance] Keep sigmas on CPU#15823

Merged

4 tasks

Default device for sigma tensor to CPU

1d74482

* Consistent with implementations in k-diffusion.* Makes this compatible withAUTOMATIC1111#15823

Copy link

Contributor

v0xie commentedMay 21, 2024•
edited
Loading

Just wanted to put this out there:https://arxiv.org/abs/2405.11326

It's a new method "GITS" that purports to beat AYS in~~generation speed and~~ sampling quality increase.

These are the sigmas I was able to get from model_wrap.sigmas for the recommended timesteps:
Timesteps: [999, 783, 632, 483, 350, 233, 133, 67, 33, 17, 0]
Sigmas: [14.615, 4.734, 2.567, 1.529, 0.987, 0.652, 0.418, 0.268, 0.179, 0.127, 0.029]

I'm not sure they're correct because they didn't change when I loaded a SDXL model.

Copy link

AG-w commentedMay 24, 2024•
edited
Loading

I'm not sure they're correct because they didn't change when I loaded a SDXL model.

what if you calculate the scale between SD1.5 and SDXL sigma in AYS then apply that scale to GITS so you get SDXL version of that sigmas?

something likesigma * (sdxl_ays_sigma / sd15_ays_sigma)

I use this way generated a result for sdxl, need testing though
[14.615, 4.617, 2.507, 1.236, 0.702, 0.402, 0.240, 0.156, 0.104, 0.094, 0.029]

Copy link

Koitenshin commentedJun 2, 2024•
edited
Loading

@LoganBooker I had to type in my own sigmas for 32 steps, which leads me to a feature request for this scheduler. Can someone better than me modify the code to use "script_callbacks.CFGDenoiserParams" in a "while loop" to pull the "total_sampling_steps" variable from "CFGDenoiserParams" and automatically scale down to 0? I can't share my results but they are amazing.

The following is an Edit, I took the time to run some tests and uploaded to Imgur:

First prompt is from here:https://prompthero.com/prompt/cf5ed5a0881
Second prompt is from here:https://prompthero.com/prompt/1107ce59578
Third prompt is from here:https://prompthero.com/prompt/cef4653ee67

Here is a link to the 4 grids for side by side comparison. I used multiple samplers (DPM++ 2S a, DPM2, Euler, & Heun) in different images so you can see better results. 11 Sigmas only performs really well under Heun with complex prompts.

https://imgur.com/a/NQLCD4M

As you can see, Sigmas should be stretched over the amount of steps you use for better prompt coherence.

As for the testing, you will not be able to replicate my results. I'm using a lot of custom forked & edited code that I haven't uploaded to a repo yet, along with 2k generation using a 64k resized seed. I'd use a higher resized seed but 8GB of VRAM on my 3060TI gets maxed out by 64k. I'm also using an SD v1.5 based model for these results, SDXL will have to wait until my setup plays nice with it.

You can test the sigmas yourselves.

def ays_11_sigmas(n, sigma_min, sigma_max, device='cpu'):    # https://research.nvidia.com/labs/toronto-ai/AlignYourSteps/howto.html    def loglinear_interp(t_steps, num_steps):        """        Performs log-linear interpolation of a given array of decreasing numbers.        """        xs = np.linspace(0, 1, len(t_steps))        ys = np.log(t_steps[::-1])        new_xs = np.linspace(0, 1, num_steps)        new_ys = np.interp(new_xs, xs, ys)        interped_ys = np.exp(new_ys)[::-1].copy()        return interped_ys    if shared.sd_model.is_sdxl:        sigmas = [14.615, 6.315, 3.771, 2.181, 1.342, 0.862, 0.555, 0.380, 0.234, 0.113, 0.029]    else:        # Default SD 1.5 sigmas.        sigmas = [14.615, 6.475, 3.861, 2.697, 1.886, 1.396, 0.963, 0.652, 0.399, 0.152, 0.029]    #sigmas = [14.615, 14.158, 13.702, 13.245, 12.788, 12.331, 11.875, 11.418, 10.961, 10.505, 10.048, 9.591, 9.134, 8.678, 8.221, 7.764, 7.308, 6.851, 6.394, 5.937, 5.481, 5.024, 4.567, 4.110, 3.654, 3.197, 2.740, 2.284, 1.827, 1.370, 0.913, 0.457, 0] # 32 Step Sigmas    if n != len(sigmas):        sigmas = np.append(loglinear_interp(sigmas, n), [0.0])    else:        sigmas.append(0.0)    return torch.FloatTensor(sigmas).to(device)def ays_32_sigmas(n, sigma_min, sigma_max, device='cpu'):    # https://research.nvidia.com/labs/toronto-ai/AlignYourSteps/howto.html    def loglinear_interp(t_steps, num_steps):        """        Performs log-linear interpolation of a given array of decreasing numbers.        """        xs = np.linspace(0, 1, len(t_steps))        ys = np.log(t_steps[::-1])        new_xs = np.linspace(0, 1, num_steps)        new_ys = np.interp(new_xs, xs, ys)        interped_ys = np.exp(new_ys)[::-1].copy()        return interped_ys    if shared.sd_model.is_sdxl:        sigmas = [14.61500000000000000, 11.14916180000000000, 8.505221270000000000, 6.488271510000000000, 5.437074020000000000, 4.603986190000000000, 3.898547040000000000, 3.274074570000000000, 2.743965270000000000, 2.299686590000000000, 1.954485140000000000, 1.671087150000000000, 1.428781520000000000, 1.231810090000000000, 1.067896490000000000, 0.925794430000000000, 0.802908860000000000, 0.696601210000000000, 0.604369030000000000, 0.528525520000000000, 0.467733440000000000, 0.413933790000000000, 0.362581860000000000, 0.310085170000000000, 0.265189250000000000, 0.223264610000000000, 0.176538770000000000, 0.139591920000000000, 0.105873810000000000, 0.055193690000000000, 0.028773340000000000, 0.015000000000000000]    else:        # Default SD 1.5 sigmas.        #sigmas = [14.615, 6.475, 3.861, 2.697, 1.886, 1.396, 0.963, 0.652, 0.399, 0.152, 0.029]    sigmas = [14.61500000000000000, 11.23951352000000000, 8.643630810000000000, 6.647294240000000000, 5.572508620000000000, 4.716485460000000000, 3.991960650000000000, 3.519560900000000000, 3.134904660000000000, 2.792287880000000000, 2.487736280000000000, 2.216638650000000000, 1.975083510000000000, 1.779317200000000000, 1.614753350000000000, 1.465409530000000000, 1.314849000000000000, 1.166424970000000000, 1.034755470000000000, 0.915737440000000000, 0.807481690000000000, 0.712023610000000000, 0.621739000000000000, 0.530652020000000000, 0.452909600000000000, 0.374914550000000000, 0.274618190000000000, 0.201152900000000000, 0.141058730000000000, 0.066828810000000000, 0.031661210000000000, 0.015000000000000000]    if n != len(sigmas):        sigmas = np.append(loglinear_interp(sigmas, n), [0.0])    else:        sigmas.append(0.0)    return torch.FloatTensor(sigmas).to(device)

Don't forget to add the following lines to the bottom

Scheduler('align_your_steps_11', 'Align Your Steps 11', ays_11_sigmas),Scheduler('align_your_steps_32', 'Align Your Steps 32', ays_32_sigmas),

Copy link

AG-w commentedJun 3, 2024

As you can see, Sigmas should be stretched over the amount of steps you use for better prompt coherence.
As for the testing, you will not be able to replicate my results. I'm using a lot of custom forked & edited code that I haven't uploaded to a repo yet, along with 2k generation using a 64k resized seed. I'd use a higher resized seed but 8GB of VRAM on my 3060TI gets maxed out by 64k. I'm also using an SD v1.5 based model for these results, SDXL will have to wait until my setup plays nice with it.
You can test the sigmas yourselves.

maybe you should explain what you have modified since all we know is you changed sigmas?

Copy link

Koitenshin commentedJun 3, 2024•
edited
Loading

As you can see, Sigmas should be stretched over the amount of steps you use for better prompt coherence.
As for the testing, you will not be able to replicate my results. I'm using a lot of custom forked & edited code that I haven't uploaded to a repo yet, along with 2k generation using a 64k resized seed. I'd use a higher resized seed but 8GB of VRAM on my 3060TI gets maxed out by 64k. I'm also using an SD v1.5 based model for these results, SDXL will have to wait until my setup plays nice with it.
You can test the sigmas yourselves.
maybe you should explain what you have modified since all we know is you changed sigmas?

It wouldn't matter. I just did a 512x512 comparison with none of the "bells & whistles" I normally use. The difference is still apparent, just difficult to tell at such a small scale. You'll probably have to zoom in a bit. It's most apparent with the ancestral sampler. You can find the image at the bottom of the same Imgur link:https://imgur.com/a/NQLCD4M

I even tossed in Restart & UniPC samplers (default options for UniPC since I haven't tested that one too much).

EDIT: SDXL testing won't be happening. I keep getting garbled images and pixelation even with the normal samplers. I guess 8GB of VRAM is enough to get it running, but not running well. And it's happening with and without --medvram-sdxl

2nd EDIT: Looks like SDXL testing did happen, and only thanks to Forge and NeverOOM. 32 step sigmas are more accurate than 11 Sigmas stretched over 32 steps. Their samplers & schedulers are a pain in the butt to edit because they're not separated like A1111's. I updated the above code post with my new sigmas, got tired of issues and just used nVidia's code to generate new sigmas.

AUTOMATIC1111 approved these changes

Jun 8, 2024

View reviewed changes

AUTOMATIC1111 merged commit46bcfbe intoAUTOMATIC1111:dev

Jun 8, 2024

Panchovix pushed a commit to Panchovix/stable-diffusion-webui-reForge that referenced this pull request

Jul 3, 2024

Add AYS GITS-11-32 steps Schedulers Variants

2650c4f

Adds AYS GITS, refer fromhttps://arxiv.org/abs/2405.11326AUTOMATIC1111/stable-diffusion-webui#15751 (comment)Adds AYS 11 and 32 steps, fromAUTOMATIC1111/stable-diffusion-webui#15751 (comment)

Panchovix pushed a commit to Panchovix/stable-diffusion-webui that referenced this pull request

Jul 3, 2024

Add AYS GITS-11-32 steps Schedulers Variants

9d39a42

Adds AYS GITS, refer fromhttps://arxiv.org/abs/2405.11326AUTOMATIC1111#15751 (comment)Adds AYS 11 and 32 steps, fromAUTOMATIC1111#15751 (comment)

Labels

None yet

6 participants

Movatterモバイル変換

Add Align Your Steps to available schedulers#15751

Add Align Your Steps to available schedulers#15751

Conversation

LoganBooker commentedMay 10, 2024

Description

Screenshots/videos:

Checklist:

Uh oh!

Uh oh!

AG-w commentedMay 10, 2024

Uh oh!

LoganBooker commentedMay 10, 2024

Uh oh!

v0xie commentedMay 21, 2024• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

AG-w commentedMay 24, 2024• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

Koitenshin commentedJun 2, 2024• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

AG-w commentedJun 3, 2024

Uh oh!

Koitenshin commentedJun 3, 2024• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

v0xie commentedMay 21, 2024•
edited
Loading

AG-w commentedMay 24, 2024•
edited
Loading

Koitenshin commentedJun 2, 2024•
edited
Loading

Koitenshin commentedJun 3, 2024•
edited
Loading