Adding an experimental variant of an EasyCache-like feature for UNet models. I came up with the name "ucache" (if someone has a better suggestion, I'd take it). For now it uses a step-level skipping mechanism. I want to make it per-block to provide more granularity and control, but the current UNet implementation doesn't allow that for now, and the static nature of ggml graphs makes it difficult to capture precise UNet blocks. I have found the results good enough for now to make this a first iteration

Threshold may vary with the sampler + scheduler combo

./build/bin/sd -m models/model.safetensors --cfg-scale 7 -p "a cute cat sitting on a red pillow" --steps 20 -H 1024 -W 1024 -s 42 --ucache 1,0.15,0.95 --diffusion-fa

20 Steps:

Baseline	5/20	7/20	8/20

0 steps skipped (1x)	5 steps skipped (~1.33x)	7 steps skipped (~1.54x)	8 steps skipped (~1.67x)

30 Steps:

Baseline	10/30	11/30	12/30	14/30	15/30

0 skipped (1x)	10 skipped (1.5x)	11 skipped (~1.58x)	12 skipped (~1.67x)	14 skipped (~1.88x)	15 skipped (2x)

Supersedes#705

Copy link

Contributor

wbruna commentedDec 8, 2025

This also has a nice side-effect on some low-CFG distilled models: the skipped steps help avoid the "overcooked" effect when using too many steps.

I came up with the name "ucache" (if someone has a better suggestion, I'd take it).

Since EasyCache itself doesn't work with UNet, and ucache uses a similar algorithm, I'd suggest reusing the same command-line parameters and parameter struct, to make it simpler to use both for the command line and frontends.

For the same reason, perhaps a scaling factor could be applied to the threshold, to make similar values have similar behavior (at least for the default value)?

Copy link

ContributorAuthor

rmatif commentedDec 8, 2025

Since EasyCache itself doesn't work with UNet, and ucache uses a similar algorithm, I'd suggest reusing the same command-line parameters and parameter struct, to make it simpler to use both for the command line and frontends.

I have thought about that, but I'm planning to make more changes to make it depth-aware, so it will diverge from the original EasyCache implementation. Since the latter is working well I wanted to leave it unchanged. I'm still reusing the easycache hooks though to avoid some duplication

For the same reason, perhaps a scaling factor could be applied to the threshold, to make similar values have similar behavior (at least for the default value)?

If the threshold was the same across samplers/schedulers, I'd say yes but I feel it's a hacky way to do arbitrary scaling depending on that. Plus it's not only different but also too few sensitive sometimes, different values will get you similar skipped steps. We lack granularity inside a single step, I'll work on unifying this

Copy link

Contributor

wbruna commentedDec 8, 2025

I have thought about that, but I'm planning to make more changes to make it depth-aware, so it will diverge from the original EasyCache implementation. Since the latter is working well I wanted to leave it unchanged. I'm still reusing the easycache hooks though to avoid some duplication

My suggestion is from a usability side, not development. If I need to specify "turn on the cache implementation" in, say, a Koboldcpp config file or sd.cpp-webui field, it's much easier if I don't have separate fields for each model version. Especially because the model version isn't really available at that point (there's no reliable way to figure it outoutside sd.cpp code - not evenmain.cpp has that information). I'd need to either duplicate the fields, and leave to the user to figure out what she needs, or always fill both, and tolerate the warning messages.

In reality, Koboldcpp would avoid that anyway by patching stable-diffusion.cpp (each patch adds maintenance overhead, but it's better than the alternative). Command-line users like sd.cpp-webui, or users that don't patch the library, don't have that option.

Even if the cache types used completely different parameters, they have defaults, so a simple flag/checkbox "turn the default cache on" would still be useful, and easy to be supported and used.

And it'd be zero change for the EasyCache implementation: just reuse its parameter struct. The code won't mind a few extra fields, if they are needed.

If the threshold was the same across samplers/schedulers, I'd say yes but I feel it's a hacky way to do arbitrary scaling depending on that. Plus it's not only different but also too few sensitive sometimes, different values will get you similar skipped steps. We lack granularity inside a single step, I'll work on unifying this

Again, my comment was about unifying the default value from a user's POV (essentially a flat*5 on the input for unet, such as 0.2 is always the default). I agree that doesn't matter much, if you intend to have different defaults depending on model version/sampler/scheduler.

Copy link

Contributor

Green-Sky commentedDec 9, 2025

fyi I had to go down to 0.05 or 0.025 to get stable results with 36 steps.

Regarding a unified command argument, maybe something like--latent-cache or--prediction-cache or something.

Copy link

Owner

leejet commentedDec 9, 2025

Maybe we should use --cache-mode to control the caching method (and disable caching if it’s not configured), and use --cache-option to configure the cache parameters?

rmatif force-pushed theadd-ucache branch from4ceff8d toc8cc665Compare

December 9, 2025 23:07

Copy link

ContributorAuthor

rmatif commentedDec 9, 2025

Maybe we should use --cache-mode to control the caching method (and disable caching if it’s not configured), and use --cache-option to configure the cache parameters?

Something like that? Or do you want to expose those on the C API?

fyi I had to go down to 0.05 or 0.025 to get stable results with 36 steps.

On which sampler/scheduler?

My suggestion is from a usability side, not development. If I need to specify "turn on the cache implementation" in, say, a Koboldcpp config file or sd.cpp-webui field, it's much easier if I don't have separate fields for each model version. Especially because the model version isn't really available at that point (there's no reliable way to figure it out outside sd.cpp code - not even main.cpp has that information). I'd need to either duplicate the fields, and leave to the user to figure out what she needs, or always fill both, and tolerate the warning message

That's a valid point, and I think it will be a preferable goal to reach. The fact that the app isn't aware of the model is true for a lot of options here. I think since it's experimental and I will be iterating on it, it's fine to keep it manual for now, and once it's good enough, having only one cache option on sdcpp for every model

Copy link

ContributorAuthor

rmatif commentedDec 10, 2025

Add some tweaks, now it can sometimes accidentally add a nice aesthetic pattern compared to the baseline

20 Steps:

Baseline	5/20	6/20	7/20	8/20	9/20	10/20

0 steps skipped (1x)	5 steps skipped (~1.33x)	6 steps skipped (~1.43x)	7 steps skipped (~1.54x)	8 steps skipped (~1.67x)	9 steps skipped (~1.82x)	10 steps skipped (2x)

30 Steps:

Baseline	8/30	10/30	12/30	13/30	14/30	15/30	16/30

0 steps skipped (1x)	8 steps skipped (~1.36x)	10 steps skipped (1.5x)	12 steps skipped (~1.67x)	13 steps skipped (~1.76x)	14 steps skipped (~1.88x)	15 steps skipped (2x)	16 steps skipped (~2.14x)

Copy link

Contributor

Green-Sky commentedDec 10, 2025

fyi I had to go down to 0.05 or 0.025 to get stable results with 36 steps.
On which sampler/scheduler?

I was using--cfg-scale 8 --steps 36 --scheduler karras --sampling-method dpm++2m with cyberrealisticxl.

Will test the new code later and play some more with the params.

Copy link

Owner

leejet commentedDec 10, 2025

Something like that? Or do you want to expose those on the C API?

The command arguments are just as I expected.
By the way, maybe we can put all the cache-related parameters into a single struct in the API?
Then when we add more cache methods later, we won’t need a separate struct for each one — and it seems like multiple cache methods can’t be active at the same time anyway.

wbruna reviewed

Dec 11, 2025

View reviewed changes

examples/cli/README.md OutdatedShow resolvedHide resolved

rmatif changed the title~~feat: add ucache~~feat: add more caching methods

Dec 12, 2025

Copy link

ContributorAuthor

rmatif commentedDec 12, 2025

Decided to portcache-dit

Baseline	Slow	Medium	Fast	Ultra

1x	2x	~2.73x	3x	3x

rmatif force-pushed theadd-ucache branch 2 times, most recently frome047457 to9af991dCompare

December 15, 2025 17:03

rmatif added10 commits

December 16, 2025 16:21

add ucache

fb88d86

add cache-mode and cache-option

f347010

add decay rate and relative threshold

148bfdf

use single unified struct

c59f414

use actual scheduler sigmas for ucache bounds

78230da

add cache-dit

186038e

fix Fn/Bn handling

b176dfd

named parameter

f04166f

add reset param to ucache

c6f0e22

adapt to upstream refactor (common.hpp)

ffbe00a

rmatif force-pushed theadd-ucache branch from9af991d toffbe00aCompare

December 16, 2025 16:29

rmatif added2 commits

December 16, 2025 17:20

add documentation

e49d1ab

cleanup

d26f06a

Copy link

ContributorAuthor

rmatif commentedDec 16, 2025

More results

./build/bin/sd-cli -m /workspace/ucache/stable-diffusion.cpp/models/2395247.safetensors --cfg-scale 7 -p "a cute cat sitting on a red pillow" --steps 20 -H 1024 -W 1024 -s 42 --cache-mode ucache --cache-option "threshold=1,start=0.15,end=0.95,decay=1,relative=1,reset=1" --diffusion-fa --sampling-method dpm++2m --scheduler karras

20 Steps:

Baseline	6/20	7/20	8/20	9/20	10/20	11/20

0 steps skipped (1x)	6 steps skipped (~1.43x)	7 steps skipped (~1.54x)	8 steps skipped (~1.67x)	9 steps skipped (~1.82x)	10 steps skipped (2x)	11 steps skipped (~2.22x)

30 Steps:

Baseline	10/30	11/30	12/30	13/30	14/30	15/30	16/30	17/30	18/30	19/30	20/30

0 steps skipped (1x)	10 steps skipped (1.5x)	11 steps skipped (~1.58x)	12 steps skipped (~1.67x)	13 steps skipped (~1.76x)	14 steps skipped (~1.88x)	15 steps skipped (2x)	16 steps skipped (~2.14x)	17 steps skipped (~2.31x)	18 steps skipped (2.5x)	19 steps skipped (~2.73x)	20 steps skipped (3x)