Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

feat: add more caching methods#1066

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
rmatif wants to merge12 commits intoleejet:master
base:master
Choose a base branch
Loading
fromrmatif:add-ucache

Conversation

@rmatif
Copy link
Contributor

Adding an experimental variant of an EasyCache-like feature for UNet models. I came up with the name "ucache" (if someone has a better suggestion, I'd take it). For now it uses a step-level skipping mechanism. I want to make it per-block to provide more granularity and control, but the current UNet implementation doesn't allow that for now, and the static nature of ggml graphs makes it difficult to capture precise UNet blocks. I have found the results good enough for now to make this a first iteration

Threshold may vary with the sampler + scheduler combo

./build/bin/sd -m models/model.safetensors --cfg-scale 7 -p "a cute cat sitting on a red pillow" --steps 20 -H 1024 -W 1024 -s 42 --ucache 1,0.15,0.95 --diffusion-fa

20 Steps:

Baseline5/207/208/20
baseline5_207_208_20
0 steps skipped (1x)5 steps skipped (~1.33x)7 steps skipped (~1.54x)8 steps skipped (~1.67x)

30 Steps:

Baseline10/3011/3012/3014/3015/30
baseline10_3011_3012_3014_3015_30
0 skipped (1x)10 skipped (1.5x)11 skipped (~1.58x)12 skipped (~1.67x)14 skipped (~1.88x)15 skipped (2x)

Supersedes#705

Green-Sky, stduhpf, koenbeuk, leejet, JohnLoveJoy, fszontagh, GreenShadows, and bssrdf reacted with thumbs up emojiGreen-Sky and koenbeuk reacted with hooray emojiwbruna and fszontagh reacted with heart emoji
@wbruna
Copy link
Contributor

This also has a nice side-effect on some low-CFG distilled models: the skipped steps help avoid the "overcooked" effect when using too many steps.

I came up with the name "ucache" (if someone has a better suggestion, I'd take it).

Since EasyCache itself doesn't work with UNet, and ucache uses a similar algorithm, I'd suggest reusing the same command-line parameters and parameter struct, to make it simpler to use both for the command line and frontends.

For the same reason, perhaps a scaling factor could be applied to the threshold, to make similar values have similar behavior (at least for the default value)?

@rmatif
Copy link
ContributorAuthor

Since EasyCache itself doesn't work with UNet, and ucache uses a similar algorithm, I'd suggest reusing the same command-line parameters and parameter struct, to make it simpler to use both for the command line and frontends.

I have thought about that, but I'm planning to make more changes to make it depth-aware, so it will diverge from the original EasyCache implementation. Since the latter is working well I wanted to leave it unchanged. I'm still reusing the easycache hooks though to avoid some duplication

For the same reason, perhaps a scaling factor could be applied to the threshold, to make similar values have similar behavior (at least for the default value)?

If the threshold was the same across samplers/schedulers, I'd say yes but I feel it's a hacky way to do arbitrary scaling depending on that. Plus it's not only different but also too few sensitive sometimes, different values will get you similar skipped steps. We lack granularity inside a single step, I'll work on unifying this

@wbruna
Copy link
Contributor

I have thought about that, but I'm planning to make more changes to make it depth-aware, so it will diverge from the original EasyCache implementation. Since the latter is working well I wanted to leave it unchanged. I'm still reusing the easycache hooks though to avoid some duplication

My suggestion is from a usability side, not development. If I need to specify "turn on the cache implementation" in, say, a Koboldcpp config file or sd.cpp-webui field, it's much easier if I don't have separate fields for each model version. Especially because the model version isn't really available at that point (there's no reliable way to figure it outoutside sd.cpp code - not evenmain.cpp has that information). I'd need to either duplicate the fields, and leave to the user to figure out what she needs, or always fill both, and tolerate the warning messages.

In reality, Koboldcpp would avoid that anyway by patching stable-diffusion.cpp (each patch adds maintenance overhead, but it's better than the alternative). Command-line users like sd.cpp-webui, or users that don't patch the library, don't have that option.

Even if the cache types used completely different parameters, they have defaults, so a simple flag/checkbox "turn the default cache on" would still be useful, and easy to be supported and used.

And it'd be zero change for the EasyCache implementation: just reuse its parameter struct. The code won't mind a few extra fields, if they are needed.

If the threshold was the same across samplers/schedulers, I'd say yes but I feel it's a hacky way to do arbitrary scaling depending on that. Plus it's not only different but also too few sensitive sometimes, different values will get you similar skipped steps. We lack granularity inside a single step, I'll work on unifying this

Again, my comment was about unifying the default value from a user's POV (essentially a flat*5 on the input for unet, such as 0.2 is always the default). I agree that doesn't matter much, if you intend to have different defaults depending on model version/sampler/scheduler.

rmatif reacted with thumbs up emoji

@Green-Sky
Copy link
Contributor

fyi I had to go down to 0.05 or 0.025 to get stable results with 36 steps.


Regarding a unified command argument, maybe something like--latent-cache or--prediction-cache or something.

rmatif reacted with thumbs up emoji

@leejet
Copy link
Owner

Maybe we should use --cache-mode to control the caching method (and disable caching if it’s not configured), and use --cache-option to configure the cache parameters?

Green-Sky and rmatif reacted with thumbs up emoji

@rmatif
Copy link
ContributorAuthor

Maybe we should use --cache-mode to control the caching method (and disable caching if it’s not configured), and use --cache-option to configure the cache parameters?

Something like that? Or do you want to expose those on the C API?

fyi I had to go down to 0.05 or 0.025 to get stable results with 36 steps.

On which sampler/scheduler?

My suggestion is from a usability side, not development. If I need to specify "turn on the cache implementation" in, say, a Koboldcpp config file or sd.cpp-webui field, it's much easier if I don't have separate fields for each model version. Especially because the model version isn't really available at that point (there's no reliable way to figure it out outside sd.cpp code - not even main.cpp has that information). I'd need to either duplicate the fields, and leave to the user to figure out what she needs, or always fill both, and tolerate the warning message

That's a valid point, and I think it will be a preferable goal to reach. The fact that the app isn't aware of the model is true for a lot of options here. I think since it's experimental and I will be iterating on it, it's fine to keep it manual for now, and once it's good enough, having only one cache option on sdcpp for every model

@rmatif
Copy link
ContributorAuthor

Add some tweaks, now it can sometimes accidentally add a nice aesthetic pattern compared to the baseline

20 Steps:

Baseline5/206/207/208/209/2010/20
baseline_205_206_207_208_209_2010_20
0 steps skipped (1x)5 steps skipped (~1.33x)6 steps skipped (~1.43x)7 steps skipped (~1.54x)8 steps skipped (~1.67x)9 steps skipped (~1.82x)10 steps skipped (2x)

30 Steps:

Baseline8/3010/3012/3013/3014/3015/3016/30
baseline_308_3010_3012_3013_3014_3015_3016_30
0 steps skipped (1x)8 steps skipped (~1.36x)10 steps skipped (1.5x)12 steps skipped (~1.67x)13 steps skipped (~1.76x)14 steps skipped (~1.88x)15 steps skipped (2x)16 steps skipped (~2.14x)
Green-Sky reacted with eyes emoji

@Green-Sky
Copy link
Contributor

fyi I had to go down to 0.05 or 0.025 to get stable results with 36 steps.

On which sampler/scheduler?

I was using--cfg-scale 8 --steps 36 --scheduler karras --sampling-method dpm++2m with cyberrealisticxl.

Will test the new code later and play some more with the params.

rmatif reacted with thumbs up emoji

@leejet
Copy link
Owner

Something like that? Or do you want to expose those on the C API?

The command arguments are just as I expected.
By the way, maybe we can put all the cache-related parameters into a single struct in the API?
Then when we add more cache methods later, we won’t need a separate struct for each one — and it seems like multiple cache methods can’t be active at the same time anyway.

rmatif reacted with thumbs up emoji

@rmatifrmatif changed the titlefeat: add ucachefeat: add more caching methodsDec 12, 2025
@rmatif
Copy link
ContributorAuthor

Decided to portcache-dit

BaselineSlowMediumFastUltra
baselineslowmediumfastultra
1x2x~2.73x3x3x
leejet and GreenShadows reacted with thumbs up emojiGreen-Sky reacted with hooray emoji

@rmatifrmatifforce-pushed theadd-ucache branch 2 times, most recently frome047457 to9af991dCompareDecember 15, 2025 17:03
@rmatif
Copy link
ContributorAuthor

More results

./build/bin/sd-cli -m /workspace/ucache/stable-diffusion.cpp/models/2395247.safetensors --cfg-scale 7 -p "a cute cat sitting on a red pillow" --steps 20 -H 1024 -W 1024 -s 42 --cache-mode ucache --cache-option "threshold=1,start=0.15,end=0.95,decay=1,relative=1,reset=1" --diffusion-fa --sampling-method dpm++2m --scheduler karras

20 Steps:

Baseline6/207/208/209/2010/2011/20
baseline_206_207_208_209_2010_2011_20
0 steps skipped (1x)6 steps skipped (~1.43x)7 steps skipped (~1.54x)8 steps skipped (~1.67x)9 steps skipped (~1.82x)10 steps skipped (2x)11 steps skipped (~2.22x)

30 Steps:

Baseline10/3011/3012/3013/3014/3015/3016/3017/3018/3019/3020/30
baseline_3010_3011_3012_3013_3014_3015_3016_3017_3018_3019_3020_30
0 steps skipped (1x)10 steps skipped (1.5x)11 steps skipped (~1.58x)12 steps skipped (~1.67x)13 steps skipped (~1.76x)14 steps skipped (~1.88x)15 steps skipped (2x)16 steps skipped (~2.14x)17 steps skipped (~2.31x)18 steps skipped (2.5x)19 steps skipped (~2.73x)20 steps skipped (3x)

@leejet I believe this is ready for review

Green-Sky and daniandtheweb reacted with hooray emoji

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@leejetleejetleejet left review comments

+1 more reviewer

@wbrunawbrunawbruna left review comments

Reviewers whose approvals may not affect merge requirements

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

4 participants

@rmatif@wbruna@Green-Sky@leejet

[8]ページ先頭

©2009-2025 Movatter.jp