leejet/stable-diffusion.cppPublic

NotificationsYou must be signed in to change notification settings
Fork475
Star4.9k

Add imatrix support#633

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Open

stduhpf wants to merge15 commits intoleejet:master

base:master

Choose a base branch

fromstduhpf:imatrix

Open

Add imatrix support#633

stduhpf wants to merge15 commits intoleejet:masterfromstduhpf:imatrix

Conversation

Copy link

Contributor

stduhpf commentedMar 23, 2025•
edited
Loading

Adds support for llama.cpp-style importance matrices (seehttps://github.com/ggml-org/llama.cpp/blob/master/examples/imatrix/README.md andggml-org/llama.cpp#4861) to increase the performance of quantized models.

Models generated with imatrix are backwards compatible with the previous releases.

Usage:

To train imatrix:
sd.exe [same exact parameters as normal generation] --imat-out imatrix.dat
This will generate an image and train the imatrix while doing so (you can use-b to generate multiple images at once).

To keep training an existing imatrix:
sd.exe [same exact parameters as normal generation] --imat-out imatrix.dat --imat-in imatrix.dat

You can load multiple imatrix at once, this will merge them in the output:
sd.exe [same exact parameters as normal generation] --imat-out imatrix.dat --imat-in imatrix.dat --imat-in imatrix2.dat

Quantize with imatrix:
sd.exe -M convert [same exact parameters as normal quantization] --imat-in imatrix.dat
(again you can use multiple imatrix)

Examples

"simple" imatrix trained on a batch of 32 image generations (512x512) with the dreamshaper_8LCM (f16, 8 steps) model and empty prompts. (because of the model's bias, it was mostly calibrated on portraits of asian women):

"better" imatrix trained on 504 generations using diverse prompst and aspect ratios, using the same model.

iq3_xxs static *	iq3_xxs with simple imatrix	iq3_xxs with better imatrix	fp16
(no prompt)	(no prompt)	(no prompt)	(no prompt)
a cute cat playing with yarn	a cute cat playing with yarn	a cute cat playing with yarn	a cute cat playing with yarn
a girl wearing a funny hat	a girl wearing a funny hat	a girl wearing a funny hat	a girl wearing a funny hat

* static means that the importance matrix is not active (all ones), as it is set up to do when quantizing with the master branch.

iq2_xs seems completely broken even with imatrix for this model, but the effect is still noticable. With iq4, the static quant is already pretty good so the difference in quality isn't obvious. (both using the "better" imatrix here)

iq2_xs static	iq2_xs imatrix	iq4_nl static	iq4_nl imatrix
a girl wearing a funny hat	a girl wearing a funny hat	a girl wearing a funny hat	a girl wearing a funny hat

Interesting observation: for the "girl wearing a funny hat" prompt, static quants put her in a city like the original fp16 model does, while the quants calibrated with the "better" imatrix put her in a forest. This is most likely due to a bias in the calibraton dataset, which contained some samples of girls with forest background and none with city backgrounds.

You can find these models and the imatrices used here:https://huggingface.co/stduhpf/dreamshaper-8LCM-im-GGUF-sdcpp

You can find examples with other models in the discussion.

Copy link

Contributor

Green-Sky commentedMar 23, 2025

@stduhpf Thank you for working on this :)

Do you think transformer based models work better with importance, like ggml quants generally do? (eg. flux)

Copy link

ContributorAuthor

stduhpf commentedMar 23, 2025•
edited
Loading

@Green-Sky I have no idea. I'm not sure it would work right now, but I've only tested sd1.5 so far, because it's so much faster

Copy link

ContributorAuthor

stduhpf commentedMar 23, 2025•
edited
Loading

~~I don't understand why the CI's linker is unable to find log_printf(). It works on my machine just fine, which is also windows, and I'm also using cmake...~~ It's probably because I'm not using-DSD_BUILD_SHARED_LIBS=ON

Copy link

Contributor

Green-Sky commentedMar 23, 2025

~~I don't understand why the CI's linker is unable to find log_printf(). It works on my machine just fine, which is also windows, and I'm also using cmake...~~ It's probably because I'm not using-DSD_BUILD_SHARED_LIBS=ON

Maybe imatrix.hpp should just not be a header only lib ^^

Copy link

ContributorAuthor

stduhpf commentedMar 23, 2025•
edited
Loading

@Green-Sky I'm doing some tests with sd3, it seems to be doing something, but cooking imatrix for larger un-distilled models takes ages compared to something like sd1.5 LCM.

Now that I think about It, applying imatrix to flux (or any model with standalone diffusion model) will be tricky, The imatrix uses the name that the weight have at runtime, but when quantizing the names are not prefixed like they are at runtime.

Copy link

Contributor

idostyle commentedMar 24, 2025

Nice job stduhpf.

Do you think transformer based models work better with importance, like ggml quants generally do? (eg. flux)

Flux also seems to struggle with the lower bit i-quants:https://huggingface.co/Eviation/flux-imatrix

Copy link

ContributorAuthor

stduhpf commentedMar 24, 2025•
edited
Loading

Results with my sd3 2B experiments, using a basic imatrix trained on a dozen generations only.

K-quants

sd3_medium_incl_clips_t5xxl q3_K static	sd3_medium_incl_clips_t5xxl q3_K imatrix	sd3_medium_incl_clips_t5xxl q4_K static	sd3_medium_incl_clips_t5xxl q4_K imatrix

sd3_medium_incl_clips_t5xxl q5_K static	sd3_medium_incl_clips_t5xxl q5_K imatrix	sd3_medium_incl_clips_t5xxl q6_K static	sd3_medium_incl_clips_t5xxl q6_K imatrix

i-quants

sd3_medium_incl_clips_t5xxl iq3_xxs static	sd3_medium_incl_clips_t5xxl iq3_xxs imatrix	sd3_medium_incl_clips_t5xxl iq3_s static	sd3_medium_incl_clips_t5xxl iq3_s imatrix

sd3_medium_incl_clips_t5xxl iq4_xs static	sd3_medium_incl_clips_t5xxl iq4_xs imatrix	sd3_medium_incl_clips_t5xxl iq4_nl static	sd3_medium_incl_clips_t5xxl iq4_nl imatrix

Ground truth

sd3_medium_incl_clips_t5xxl fp16

(all images generated with same settings, only quantization changes)

Copy link

ContributorAuthor

stduhpf commentedMar 28, 2025•
edited
Loading

Ok I found a satisfactory way to apply imatrix to flux. (Also it seems like training the imatrix with quantized models works just fine)

Flux.1 schnell q2_k static	Flux.1 schnell q2_k imatrix	Flux.1 schnell q2_k imatrix trained on Flux dev

Flux.1 dev q2_k static	Flux.1 dev q2_k imatrix	Flux.1 dev q2_k imatrix trained on Flux schnell

(imatrix trained on 10 generations using static q4_k (schnell) or iq4_nl (dev) model)

Copy link

Contributor

Green-Sky commentedMar 28, 2025

Looks great.

Did you tune it on the same amount sampling steps? Optimising for your own usecase is probably the best for lower quants.

Copy link

ContributorAuthor

stduhpf commentedMar 28, 2025•
edited
Loading

Did you tune it on the same amount sampling steps? Optimising for your own usecase is probably the best for lower quants.

For the schnell one, I trained it with 4 steps only, with different resolutions. My PC is currently cooking a Flux dev imatrix using varying step count (from 16 to 40). Maybe I'll try to make one with fixed step count to compare with after.

stduhpf force-pushed theimatrix branch 2 times, most recently from4ec74a9 to24d8fd7Compare

March 29, 2025 18:03

stduhpf marked this pull request as ready for review

March 29, 2025 19:04

Copy link

ContributorAuthor

stduhpf commentedMar 29, 2025

I feel like this is pretty much ready now.

stduhpf changed the title~~Imatrix: first implementation attempt~~Add imatrix support

Mar 29, 2025

stduhpf force-pushed theimatrix branch 2 times, most recently from7379982 to2dc5dfbCompare

March 29, 2025 22:54

Copy link

Contributor

Green-Sky commentedMar 31, 2025

I am trying this right now. I am no expert on how the importance data flows into the quantization, but does it make sense to sample using a quant, just to recreate the same quant with the importance data?

You showed that using a higher quant to generate the imat works, but using the same quant would be interesting...

Copy link

ContributorAuthor

stduhpf commentedMar 31, 2025

I am trying this right now. I am no expert on how the importance data flows into the quantization, but does it make sense to sample using a quant, just to recreate the same quant with the importance data?

I think it would work. As long as the original quant is "good enough" to generate coherent images, the activations should already be representative of the ideal activations, and therefore the imatrix shouldn't be too different from the one trained on the full precision model, with the same kind of improvements.

Copy link

Contributor

Green-Sky commentedMar 31, 2025

Thanks, good to know. This all reminds me very much of PGO, where you usually stack them, to get the last 1-2% performance. 😄

I am doing q5_k right now, and the image is very coherent indeed.

Green-Sky reviewed

Mar 31, 2025

View reviewed changes

examples/cli/main.cpp Outdated

		printf(" --type [TYPE] weight type (examples: f32, f16, q4_0, q4_1, q5_0, q5_1, q8_0, q2_K, q3_K, q4_K)\n");
		printf(" If not specified, the default is the type of the weight file\n");
		printf(" --imat-out [PATH] If set, compute the imatrix for this run and save it to the provided path");
		printf(" --imat-in [PATH] Use imatrix for quantization.");

Copy link

Contributor

Green-SkyMar 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

both new options miss a new line.

Green-Sky reviewed

Mar 31, 2025

View reviewed changes

imatrix.cpp Outdated

		returnfalse;
		}

		// Recreate the state as expected by save_imatrix(), and corerct for weighted sum.

Copy link

Contributor

Green-SkyMar 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

corerct -> correct

Copy link

ContributorAuthor

stduhpfMar 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I don't know, I just copy-pasted that part of the code, maybe the typo is important

Copy link

Contributor

Green-SkyMar 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

😁

Copy link

Contributor

Green-Sky commentedMar 31, 2025•
edited
Loading

Not sure I did anything wrong, but using imats produced by the same quant seems to produce the same model file. So it either does not work, or I did something wrong.

$ result/bin/sd -M convert -m models/flux.1-lite-8B.safetensors --type q5_K -o models/flux.1-lite-8B-q5_k-igo.gguf --imat-in flux.1-lite-8B-q5_k-1024x768-g3.3-s24-b3-asia1.dat --imat-in flux.1-lite-8B-q5_k-512x768-g3.4-s30-b3-nordic1.dat --imat-in flux.1-lite-8B-q5_k-768x768-g3.2-s30-b3-nordic2.dat --imat-in flux.1-lite-8B-q5_k-768x768-g3.5-s28-b3-anime1.datloading imatrix from 'flux.1-lite-8B-q5_k-1024x768-g3.3-s24-b3-asia1.dat'loading imatrix from 'flux.1-lite-8B-q5_k-512x768-g3.4-s30-b3-nordic1.dat'loading imatrix from 'flux.1-lite-8B-q5_k-768x768-g3.2-s30-b3-nordic2.dat'loading imatrix from 'flux.1-lite-8B-q5_k-768x768-g3.5-s28-b3-anime1.dat'[INFO ] model.cpp:918  - load models/flux.1-lite-8B.safetensors using safetensors format[INFO ] model.cpp:2003 - model tensors mem size: 5562.48MB  |==================================================| 516/516 - 24.39it/s[INFO ] model.cpp:2038 - load tensors done[INFO ] model.cpp:2039 - trying to save tensors to models/flux.1-lite-8B-q5_k-igo.ggufconvert 'models/flux.1-lite-8B.safetensors'/'' to 'models/flux.1-lite-8B-q5_k-igo.gguf' success

ba1a721718a7431f79a3266b127999bae515ed2b3f0cb835558b9db7d0bb9890  models/flux.1-lite-8B-q5_k.ggufba1a721718a7431f79a3266b127999bae515ed2b3f0cb835558b9db7d0bb9890  models/flux.1-lite-8B-q5_k-igo.gguf

edit:strings shows that the imat contains data for the diffusion model:

...model.diffusion_model.single_blocks.35.modulation.lin.weightHmodel.diffusion_model.double_blocks.5.img_mlp.2.weightH...

edit2: and the imats are different

b481d4c6e8903ac4a1e612a8e9b5dc8afc4b2bb31d1fea2a2a404e9bd565416a  flux.1-lite-8B-q5_k-1024x768-g3.3-s24-b3-asia1.datab385c84e8bd4002a1579350a7bdd01a96581900922cf192bc47012224038ebe  flux.1-lite-8B-q5_k-512x768-g3.4-s30-b3-nordic1.dat

edit3: tried to do an optimized q4_k, same issue, so something is fundamentally broken with the flux prune/distill/dedistill i am using.
https://huggingface.co/Freepik/flux.1-lite-8B
https://huggingface.co/Green-Sky/flux.1-lite-8B-GGUF/tree/main/base

Copy link

ContributorAuthor

stduhpf commentedMar 31, 2025•
edited
Loading

Not sure I did anything wrong, but using imats produced by the same quant seems to produce the same model file. So it either does not work, or I did something wrong.

$ result/bin/sd -M convert -m models/flux.1-lite-8B.safetensors --type q5_K -o models/flux.1-lite-8B-q5_k-igo.gguf --imat-in flux.1-lite-8B-q5_k-1024x768-g3.3-s24-b3-asia1.dat --imat-in flux.1-lite-8B-q5_k-512x768-g3.4-s30-b3-nordic1.dat --imat-in flux.1-lite-8B-q5_k-768x768-g3.2-s30-b3-nordic2.dat --imat-in flux.1-lite-8B-q5_k-768x768-g3.5-s28-b3-anime1.datloading imatrix from 'flux.1-lite-8B-q5_k-1024x768-g3.3-s24-b3-asia1.dat'loading imatrix from 'flux.1-lite-8B-q5_k-512x768-g3.4-s30-b3-nordic1.dat'loading imatrix from 'flux.1-lite-8B-q5_k-768x768-g3.2-s30-b3-nordic2.dat'loading imatrix from 'flux.1-lite-8B-q5_k-768x768-g3.5-s28-b3-anime1.dat'[INFO ] model.cpp:918  - load models/flux.1-lite-8B.safetensors using safetensors format[INFO ] model.cpp:2003 - model tensors mem size: 5562.48MB  |==================================================| 516/516 - 24.39it/s[INFO ] model.cpp:2038 - load tensors done[INFO ] model.cpp:2039 - trying to save tensors to models/flux.1-lite-8B-q5_k-igo.ggufconvert 'models/flux.1-lite-8B.safetensors'/'' to 'models/flux.1-lite-8B-q5_k-igo.gguf' success

ba1a721718a7431f79a3266b127999bae515ed2b3f0cb835558b9db7d0bb9890  models/flux.1-lite-8B-q5_k.ggufba1a721718a7431f79a3266b127999bae515ed2b3f0cb835558b9db7d0bb9890  models/flux.1-lite-8B-q5_k-igo.gguf

Try withresult/bin/sd -M convert --diffusion-model models/flux.1-lite-8B.safetensors --type q5_K -o models/flux.1-lite-8B-q5_k-igo.gguf --imat-in flux.1-lite-8B-q5_k-1024x768-g3.3-s24-b3-asia1.dat --imat-in flux.1-lite-8B-q5_k-512x768-g3.4-s30-b3-nordic1.dat --imat-in flux.1-lite-8B-q5_k-768x768-g3.2-s30-b3-nordic2.dat --imat-in flux.1-lite-8B-q5_k-768x768-g3.5-s28-b3-anime1.dat (you can also include vae/text encoders)

And then run it with-m models/flux.1-lite-8B-q5_k-igo.gguf instead of--diffusion-model models/flux.1-lite-8B-q5_k-igo.gguf

Copy link

Contributor

Green-Sky commentedMar 31, 2025•
edited
Loading

Another issue. When I use flash attention it breaks the imat collection after a varying amount of images. (usingsd_turbo here)

[WARN ] imatrix.cpp:140  - inf detected in model.diffusion_model.input_blocks.7.1.transformer_blocks.0.attn1.to_q.weight

Details

$ result/bin/sd -m models/sd_turbo-f16-q8_0.gguf --cfg-scale 1 --steps 8 --schedule karras -p "a lovely cat" --imat-out sd_turbo.imat -b 32 -s -1 --diffusion-fa IMPORTANT: imatrix file sd_turbo.imat already exists, but wasn't found in the imatrix inputs.sd_turbo.imat will get overwritten!ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    noggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: noggml_cuda_init: found 1 CUDA devices:  Device 0: NVIDIA GeForce RTX 2070, compute capability 7.5, VMM: yes[INFO ] stable-diffusion.cpp:197  - loading model from 'models/sd_turbo-f16-q8_0.gguf'[INFO ] model.cpp:915  - load models/sd_turbo-f16-q8_0.gguf using gguf format[INFO ] stable-diffusion.cpp:244  - Version: SD 2.x[INFO ] stable-diffusion.cpp:277  - Weight type:                 q8_0[INFO ] stable-diffusion.cpp:278  - Conditioner weight type:     q8_0[INFO ] stable-diffusion.cpp:279  - Diffusion model weight type: q8_0[INFO ] stable-diffusion.cpp:280  - VAE weight type:             q8_0[INFO ] stable-diffusion.cpp:328  - Using flash attention in the diffusion model  |>                                                 | 3/1323 - 0.00it/s[INFO ] model.cpp:1915 - unknown tensor 'cond_stage_model.transformer.text_model.text_projection | q8_0 | 2 [1024, 1024, 1, 1, 1]' in model file  |==================================================| 1323/1323 - 1000.00it/s[INFO ] stable-diffusion.cpp:503  - total params memory size = 2006.07MB (VRAM 2006.07MB, RAM 0.00MB): clip 500.53MB(VRAM), unet 1411.07MB(VRAM), vae 94.47MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM)[INFO ] stable-diffusion.cpp:522  - loading model from 'models/sd_turbo-f16-q8_0.gguf' completed, taking 0.73s[INFO ] stable-diffusion.cpp:556  - running in eps-prediction mode[INFO ] stable-diffusion.cpp:566  - running with Karras schedule[INFO ] stable-diffusion.cpp:690  - Attempting to apply 0 LoRAs[INFO ] stable-diffusion.cpp:1246 - apply_loras completed, taking 0.00s[INFO ] stable-diffusion.cpp:1379 - get_learned_condition completed, taking 51 ms[INFO ] stable-diffusion.cpp:1402 - sampling using Euler A method[INFO ] stable-diffusion.cpp:1439 - generating image: 1/32 - seed 832414162  |==================================================| 8/8 - 3.60it/s[INFO ] stable-diffusion.cpp:1478 - sampling completed, taking 2.30s[INFO ] stable-diffusion.cpp:1439 - generating image: 2/32 - seed 832414163  |==================================================| 8/8 - 3.64it/s[INFO ] stable-diffusion.cpp:1478 - sampling completed, taking 2.27s[INFO ] stable-diffusion.cpp:1439 - generating image: 3/32 - seed 832414164  |==================================================| 8/8 - 3.64it/s[INFO ] stable-diffusion.cpp:1478 - sampling completed, taking 2.27s[INFO ] stable-diffusion.cpp:1439 - generating image: 4/32 - seed 832414165  |==================================================| 8/8 - 3.62it/s[INFO ] stable-diffusion.cpp:1478 - sampling completed, taking 2.28s[INFO ] stable-diffusion.cpp:1439 - generating image: 5/32 - seed 832414166  |==================================================| 8/8 - 3.51it/s[INFO ] stable-diffusion.cpp:1478 - sampling completed, taking 2.29s[INFO ] stable-diffusion.cpp:1439 - generating image: 6/32 - seed 832414167  |==================================================| 8/8 - 3.64it/s[INFO ] stable-diffusion.cpp:1478 - sampling completed, taking 2.26s[INFO ] stable-diffusion.cpp:1439 - generating image: 7/32 - seed 832414168  |==================================================| 8/8 - 3.68it/s[INFO ] stable-diffusion.cpp:1478 - sampling completed, taking 2.26s[INFO ] stable-diffusion.cpp:1439 - generating image: 8/32 - seed 832414169  |==================================================| 8/8 - 3.65it/s[INFO ] stable-diffusion.cpp:1478 - sampling completed, taking 2.26s[INFO ] stable-diffusion.cpp:1439 - generating image: 9/32 - seed 832414170  |============>                                     | 2/8 - 3.65it/s[WARN ] imatrix.cpp:140  - inf detected in model.diffusion_model.input_blocks.7.1.transformer_blocks.0.attn1.to_q.weight

update: happened after much longer without flash attention too.

[WARN ] imatrix.cpp:140  - inf detected in model.diffusion_model.output_blocks.7.1.transformer_blocks.0.attn1.to_q.weight

Copy link

Contributor

Green-Sky commentedMar 31, 2025•
edited
Loading

Try withresult/bin/sd -M convert --diffusion-model models/flux.1-lite-8B.safetensors --type q5_K -o models/flux.1-lite-8B-q5_k-igo.gguf --imat-in flux.1-lite-8B-q5_k-1024x768-g3.3-s24-b3-asia1.dat --imat-in flux.1-lite-8B-q5_k-512x768-g3.4-s30-b3-nordic1.dat --imat-in flux.1-lite-8B-q5_k-768x768-g3.2-s30-b3-nordic2.dat --imat-in flux.1-lite-8B-q5_k-768x768-g3.5-s28-b3-anime1.dat (you can also include vae/text encoders)
And then run it with-m models/flux.1-lite-8B-q5_k-igo.gguf instead of--diffusion-model models/flux.1-lite-8B-q5_k-igo.gguf

This seems to have worked. Not a fan of the tensor renaming though.

base q5_k	q5_k with q5_k imat	diff

(they where obv identical before)

base q4_k	q4_k with q5_k imat	diff

It looks like the imat from q5_k made q4_k stray more. I don't have the full size model image for this example (it's too expensive ngl), but to me this looks worse. However, the importance guided quant seems to have less dither noise, so it got better somewhere...

I was trying to measure the visual quality difference of the quants, and I saw and remembered that flux specifically showsdither-like patterns when you go lower with the quants. So I tried to measure that with gimp. I first applied a high-pass filter (at 0.5 std and 4 contrast) and then used the histogram plot.

base q5_k	q5_k with q5_k imat

Base is spread out a little more, so this should mean there is indeed more high frequency noise, but this is just a single sample AND a highly experimental and somewhat subjective analysis 😅

Copy link

ContributorAuthor

stduhpf commentedMar 31, 2025•
edited
Loading

This seems to have worked. Not a fan of the tensor renaming though.

Yes, this is a bit annoying for flux models. I thought of adding a way to extract the diffusion model (or other components like vae or text encoders) from the model file, but I feel like this is getting a bit out of scope for this PR. (Something likesd.exe -M extract -m "models/flux.1-lite-8B-q5_k-igo.gguf" --diffusion-model "models/flux.1-lite-8B-q5_k-igo.gguf" maybe, or likesd.exe -M extract -m "models/flux.1-lite-8B-q5_k-igo.gguf" -o "models/flux.1-lite-8B-q5_k-igo.gguf" -p "model.diffusion_model"

Copy link

ContributorAuthor

stduhpf commentedMar 31, 2025

@Green-Sky I've had a similar problem with sd3 q4_k. (with fp16 imatrix). For some reason the outputs of the q4_k model seems to stray further from the full precision model when using imatrix, but it still seem to minimise artifacts. (I think q6_k shows a similar behavior to a lesser extent)

Copy link

Contributor

Green-Sky commentedMar 31, 2025

@Green-Sky I've had a similar problem with sd3 q4_k. (with fp16 imatrix). For some reason the outputs of the q4_k model seems to stray further from the full precision model when using imatrix, but it still seem to minimise artifacts. (I think q6_k shows a similar behavior to a lesser extent)

Interesting. Here is q2_k, which seems to have the same, but much more noticeable behavior.

q2_k	q2_k with q5_k imat

Copy link

Contributor

Green-Sky commentedMar 31, 2025

q3_k	q3_k with q5_k imat

the imat version might look more coherent, but it also looks more blury. So over all for qx_k quants, imat seems to act more like a blur, at least a little.

I will run some q8_0 imat profiling on cpu tomorrow to get a better source of truth.

Copy link

Contributor

Green-Sky commentedApr 1, 2025•
edited
Loading

I will run some q8_0 imat profiling on cpu tomorrow to get a better source of truth.

I actually cant, it crashes.

result/bin/sd --diffusion-model models/flux.1-lite-8B-q8_0.gguf --clip_l models/flux-extra/clip_l-q8_0.gguf --t5xxl models/flux-extra/t5xxl_q8_0.gguf --vae models/flux-extra/ae-f16.gguf -v --color -t 8 --cfg-scale 1 --sampling-method euler --steps 24 --guidance 3.3 -W 1024 -H 768 -p "a lovely cat" --imat-out imat.dat

Option:    n_threads:         8    mode:              txt2img    model_path:    wtype:             unspecified    clip_l_path:       models/flux-extra/clip_l-q8_0.gguf    clip_g_path:    t5xxl_path:        models/flux-extra/t5xxl_q8_0.gguf    diffusion_model_path:   models/flux.1-lite-8B-q8_0.gguf    vae_path:          models/flux-extra/ae-f16.gguf    taesd_path:    esrgan_path:    controlnet_path:    embeddings_path:    stacked_id_embeddings_path:    input_id_images_path:    style ratio:       20.00    normalize input image :  false    output_path:       output.png    init_img:    mask_img:    control_image:    clip on cpu:       false    controlnet cpu:    false    vae decoder on cpu:false    diffusion flash attention:false    strength(control): 0.90    prompt:            a lovely cat    negative_prompt:    min_cfg:           1.00    cfg_scale:         1.00    slg_scale:         0.00    guidance:          3.30    eta:               0.00    clip_skip:         -1    width:             1024    height:            768    sample_method:     euler    schedule:          default    sample_steps:      24    strength(img2img): 0.75    rng:               cuda    seed:              42    batch_count:       1    vae_tiling:        false    upscale_repeats:   1System Info:    SSE3 = 1    AVX = 1    AVX2 = 1    AVX512 = 0    AVX512_VBMI = 0    AVX512_VNNI = 0    FMA = 1    NEON = 0    ARM_FMA = 0    F16C = 1    FP16_VA = 0    WASM_SIMD = 0    VSX = 0[DEBUG] stable-diffusion.cpp:188  - Using CPU backend[INFO ] stable-diffusion.cpp:204  - loading clip_l from 'models/flux-extra/clip_l-q8_0.gguf'[INFO ] model.cpp:915  - load models/flux-extra/clip_l-q8_0.gguf using gguf format[DEBUG] model.cpp:932  - init from 'models/flux-extra/clip_l-q8_0.gguf'[INFO ] stable-diffusion.cpp:218  - loading t5xxl from 'models/flux-extra/t5xxl_q8_0.gguf'[INFO ] model.cpp:915  - load models/flux-extra/t5xxl_q8_0.gguf using gguf format[DEBUG] model.cpp:932  - init from 'models/flux-extra/t5xxl_q8_0.gguf'[INFO ] stable-diffusion.cpp:225  - loading diffusion model from 'models/flux.1-lite-8B-q8_0.gguf'[INFO ] model.cpp:915  - load models/flux.1-lite-8B-q8_0.gguf using gguf format[DEBUG] model.cpp:932  - init from 'models/flux.1-lite-8B-q8_0.gguf'[INFO ] stable-diffusion.cpp:232  - loading vae from 'models/flux-extra/ae-f16.gguf'[INFO ] model.cpp:915  - load models/flux-extra/ae-f16.gguf using gguf format[DEBUG] model.cpp:932  - init from 'models/flux-extra/ae-f16.gguf'[INFO ] stable-diffusion.cpp:244  - Version: Flux[INFO ] stable-diffusion.cpp:277  - Weight type:                 q8_0[INFO ] stable-diffusion.cpp:278  - Conditioner weight type:     q8_0[INFO ] stable-diffusion.cpp:279  - Diffusion model weight type: q8_0[INFO ] stable-diffusion.cpp:280  - VAE weight type:             f16[DEBUG] stable-diffusion.cpp:282  - ggml tensor size = 400 bytes[DEBUG] clip.hpp:171  - vocab size: 49408[DEBUG] clip.hpp:182  -  trigger word img already in vocab[INFO ] flux.hpp:889  - Flux blocks: 8 double, 38 single[DEBUG] ggml_extend.hpp:1169 - clip params backend buffer size =  231.50 MB(RAM) (196 tensors)[DEBUG] ggml_extend.hpp:1169 - t5 params backend buffer size =  4826.11 MB(RAM) (219 tensors)[DEBUG] ggml_extend.hpp:1169 - flux params backend buffer size =  8457.01 MB(RAM) (516 tensors)[DEBUG] ggml_extend.hpp:1169 - vae params backend buffer size =  94.57 MB(RAM) (138 tensors)[DEBUG] stable-diffusion.cpp:419  - loading weights[DEBUG] model.cpp:1737 - loading tensors from models/flux-extra/clip_l-q8_0.gguf  |========>                                         | 196/1176 - 0.00it/s[DEBUG] model.cpp:1737 - loading tensors from models/flux-extra/t5xxl_q8_0.gguf  |=================>                                | 413/1176 - 0.00it/s[INFO ] model.cpp:1915 - unknown tensor 'text_encoders.t5xxl.transformer.encoder.embed_tokens.weight | q8_0 | 2 [4096, 32128, 1, 1, 1]' in model file  |=================>                                | 416/1176 - 9.17it/s[DEBUG] model.cpp:1737 - loading tensors from models/flux.1-lite-8B-q8_0.gguf  |=======================================>          | 932/1176 - 25.00it/s[DEBUG] model.cpp:1737 - loading tensors from models/flux-extra/ae-f16.gguf  |=============================================>    | 1070/1176 - 200.00it/s[INFO ] stable-diffusion.cpp:503  - total params memory size = 13609.19MB (VRAM 0.00MB, RAM 13609.19MB): clip 5057.61MB(RAM), unet 8457.01MB(RAM), vae 94.57MB(RAM), controlnet 0.00MB(VRAM), pmid 0.00MB(RAM)[INFO ] stable-diffusion.cpp:522  - loading model from '' completed, taking 10.34s[INFO ] stable-diffusion.cpp:543  - running in Flux FLOW mode[DEBUG] stable-diffusion.cpp:600  - finished loaded file[DEBUG] stable-diffusion.cpp:1548 - txt2img 1024x768[DEBUG] stable-diffusion.cpp:1241 - prompt after extract and remove lora: "a lovely cat"[INFO ] stable-diffusion.cpp:690  - Attempting to apply 0 LoRAs[INFO ] stable-diffusion.cpp:1246 - apply_loras completed, taking 0.00s[DEBUG] conditioner.hpp:1055 - parse 'a lovely cat' to [['a lovely cat', 1], ][DEBUG] clip.hpp:311  - token length: 77[DEBUG] t5.hpp:397  - token length: 256[DEBUG] clip.hpp:737  - Missing text_projection matrix, assuming identity...[DEBUG] ggml_extend.hpp:1121 - clip compute buffer size: 1.40 MB(RAM)[DEBUG] clip.hpp:737  - Missing text_projection matrix, assuming identity...[DEBUG] ggml_extend.hpp:1121 - t5 compute buffer size: 68.25 MB(RAM)[DEBUG] conditioner.hpp:1170 - computing condition graph completed, taking 9611 ms[INFO ] stable-diffusion.cpp:1379 - get_learned_condition completed, taking 9615 ms[INFO ] stable-diffusion.cpp:1402 - sampling using Euler method[INFO ] stable-diffusion.cpp:1439 - generating image: 1/1 - seed 42[DEBUG] stable-diffusion.cpp:808  - Sample[DEBUG] ggml_extend.hpp:1121 - flux compute buffer size: 1662.42 MB(RAM)Segmentation fault (core dumped)

The reason seems to be a null buffer here:
https://github.com/stduhpf/stable-diffusion.cpp/blob/71eed146cd78ab771761888169cab7d82d90a5bb/imatrix.cpp#L54

update: It's with any model type (q8_0, f16, q5_k tested)

stacktrace:

#0  ggml_backend_buffer_get_type (buffer=0x0) at /build/bm2mq7d2xnqz2yd5h8hkqazrwlp15q5h-source/ggml/src/ggml-backend.cpp:175#1  0x00000000005430f5 in ggml_backend_buffer_is_host (buffer=<optimized out>) at /build/bm2mq7d2xnqz2yd5h8hkqazrwlp15q5h-source/ggml/src/ggml-backend.cpp:158#2  0x0000000000507363 in IMatrixCollector::collect_imatrix (this=this@entry=0x8c3aa0 <imatrix_collector>, t=0x6153370, ask=false, user_data=<optimized out>)    at /build/bm2mq7d2xnqz2yd5h8hkqazrwlp15q5h-source/imatrix.cpp:54#3  0x0000000000436254 in collect_imatrix (t=<optimized out>, ask=<optimized out>, user_data=<optimized out>)    at /build/bm2mq7d2xnqz2yd5h8hkqazrwlp15q5h-source/model.cpp:2127#4  0x000000000048b25b in GGMLRunner::compute(std::function<ggml_cgraph* ()>, int, bool, ggml_tensor**, ggml_context*) (this=this@entry=0xc7ba58, get_graph=...,    n_threads=n_threads@entry=8, free_compute_buffer_immediately=free_compute_buffer_immediately@entry=false, output=output@entry=0x7fffffffb160,    output_ctx=output_ctx@entry=0x0) at /build/bm2mq7d2xnqz2yd5h8hkqazrwlp15q5h-source/ggml_extend.hpp:1263#5  0x000000000048b912 in Flux::FluxRunner::compute (this=this@entry=0xc7ba58, n_threads=n_threads@entry=8, x=<optimized out>, x@entry=0x7ffc9e66c0d0,    timesteps=<optimized out>, timesteps@entry=0x7ffc9e96c790, context=<optimized out>, context@entry=0x7ffc9e4ebbc0, c_concat=<optimized out>, c_concat@entry=0x0,    y=<optimized out>, guidance=<optimized out>, output=0x7fffffffb160, output_ctx=0x0, skip_layers=std::vector of length 0, capacity 0)    at /build/bm2mq7d2xnqz2yd5h8hkqazrwlp15q5h-source/flux.hpp:970#6  0x0000000000494901 in FluxModel::compute (this=this@entry=0xc7ba50, n_threads=8, x=0x7ffc9e66c0d0, timesteps=timesteps@entry=0x7ffc9e96c790, context=0x7ffc9e4ebbc0,    c_concat=0x0, y=0x7ffc9dcea500, guidance=0x7ffc9e96c950, num_video_frames=-1, controls=std::vector of length 0, capacity 0,    control_strength=control_strength@entry=0.899999976, output=0x7fffffffb160, output_ctx=0x0, skip_layers=std::vector of length 0, capacity 0)    at /build/bm2mq7d2xnqz2yd5h8hkqazrwlp15q5h-source/diffusion_model.hpp:178#7  0x0000000000496b6d in StableDiffusionGGML::sample(ggml_context*, ggml_tensor*, ggml_tensor*, SDCondition, SDCondition, ggml_tensor*, float, float, float, float, float, sample_method_t, std::vector<float, std::allocator<float> > const&, int, SDCondition, std::vector<int, std::allocator<int> >, float, float, float, ggml_tensor*)::{lambda(ggml_tensor*, float, int)#1}::operator()(ggml_tensor*, float, int) const (__closure=0x965cb0, input=0x7ffc9e5abf20, sigma=<optimized out>, step=1)...

update2: same for sd_turbo

#0  ggml_backend_buffer_get_type (buffer=0x0) at /build/bm2mq7d2xnqz2yd5h8hkqazrwlp15q5h-source/ggml/src/ggml-backend.cpp:175#1  0x00000000005430f5 in ggml_backend_buffer_is_host (buffer=<optimized out>) at /build/bm2mq7d2xnqz2yd5h8hkqazrwlp15q5h-source/ggml/src/ggml-backend.cpp:158#2  0x0000000000507363 in IMatrixCollector::collect_imatrix (this=this@entry=0x8c3aa0 <imatrix_collector>, t=0x32925d0, ask=false, user_data=<optimized out>)    at /build/bm2mq7d2xnqz2yd5h8hkqazrwlp15q5h-source/imatrix.cpp:54#3  0x0000000000436254 in collect_imatrix (t=<optimized out>, ask=<optimized out>, user_data=<optimized out>)    at /build/bm2mq7d2xnqz2yd5h8hkqazrwlp15q5h-source/model.cpp:2127#4  0x000000000048b25b in GGMLRunner::compute(std::function<ggml_cgraph* ()>, int, bool, ggml_tensor**, ggml_context*) (this=this@entry=0xb58ad8, get_graph=...,    n_threads=n_threads@entry=12, free_compute_buffer_immediately=free_compute_buffer_immediately@entry=false, output=output@entry=0x7fffffffb3b8,    output_ctx=output_ctx@entry=0x0) at /build/bm2mq7d2xnqz2yd5h8hkqazrwlp15q5h-source/ggml_extend.hpp:1263#5  0x000000000048b4c4 in UNetModelRunner::compute (this=this@entry=0xb58ad8, n_threads=n_threads@entry=12, x=<optimized out>, x@entry=0x2716b30, timesteps=<optimized out>,    timesteps@entry=0x2719290, context=<optimized out>, context@entry=0x27170e0, c_concat=<optimized out>, c_concat@entry=0x0, y=<optimized out>,    num_video_frames=<optimized out>, controls=std::vector of length 0, capacity 0, control_strength=<optimized out>, control_strength@entry=0, output=0x7fffffffb3b8,    output_ctx=0x0) at /build/bm2mq7d2xnqz2yd5h8hkqazrwlp15q5h-source/unet.hpp:615#6  0x00000000004964dc in UNetModel::compute (this=0xb58ad0, n_threads=12, x=0x2716b30, timesteps=0x2719290, context=0x27170e0, c_concat=0x0, y=0x0, guidance=0x0,    num_video_frames=-1, controls=std::vector of length 0, capacity 0, control_strength=0, output=0x7fffffffb3b8, output_ctx=0x0,    skip_layers=std::vector of length 0, capacity 0) at /build/bm2mq7d2xnqz2yd5h8hkqazrwlp15q5h-source/diffusion_model.hpp:78#7  0x0000000000485dba in StableDiffusionGGML::is_using_v_parameterization_for_sd2 (this=this@entry=0x9655d0, work_ctx=work_ctx@entry=0xcb00a0, is_inpaint=<optimized out>)    at /build/bm2mq7d2xnqz2yd5h8hkqazrwlp15q5h-source/stable-diffusion.cpp:621#8  0x00000000004e6a9b in StableDiffusionGGML::load_from_file (this=this@entry=0x9655d0, model_path="models/sd_turbo-f16-q8_0.gguf", clip_l_path="", clip_g_path="",    t5xxl_path="", diffusion_model_path="", vae_path=..., control_net_path=..., embeddings_path=..., id_embeddings_path=..., taesd_path=..., vae_tiling_=<optimized out>,    wtype=<optimized out>, schedule=<optimized out>, clip_on_cpu=<optimized out>, control_net_cpu=<optimized out>, vae_on_cpu=<optimized out>,    diffusion_flash_attn=<optimized out>) at /build/bm2mq7d2xnqz2yd5h8hkqazrwlp15q5h-source/stable-diffusion.cpp:527#9  0x00000000004760ad in new_sd_ctx (model_path_c_str=<optimized out>, clip_l_path_c_str=0x7fffffffbb58 "", clip_g_path_c_str=0x7fffffffbb78 "",    t5xxl_path_c_str=0x7fffffffbb98 "", diffusion_model_path_c_str=0x7fffffffbbb8 "", vae_path_c_str=0x7fffffffbbd8 "", taesd_path_c_str=0x7fffffffbbf8 "",    control_net_path_c_str=0x7fffffffbc38 "", lora_model_dir_c_str=0x7fffffffbcc0 "", embed_dir_c_str=0x7fffffffbc58 "", id_embed_dir_c_str=0x7fffffffbc78 "",    vae_decode_only=true, vae_tiling=false, free_params_immediately=true, n_threads=12, wtype=SD_TYPE_COUNT, rng_type=CUDA_RNG, s=KARRAS, keep_clip_on_cpu=false,    keep_control_net_cpu=false, keep_vae_on_cpu=false, diffusion_flash_attn=false) at /build/bm2mq7d2xnqz2yd5h8hkqazrwlp15q5h-source/stable-diffusion.cpp:1159#10 0x0000000000420451 in main (argc=<optimized out>, argv=<optimized out>) at /build/bm2mq7d2xnqz2yd5h8hkqazrwlp15q5h-source/examples/cli/main.cpp:926

Copy link

ContributorAuthor

stduhpf commentedApr 1, 2025

@Green-Sky It should be fixed now.

Copy link

Contributor

Green-Sky commentedApr 1, 2025•
edited
Loading

@stduhpf works thanks. running some at incredible 541.82s/it.

But looks like cpu inference is broken for that model..., so imat might be of questionable quality.
(probably not related to this pr)

update: I did a big oopsy and forgot to add--cfg-scale 1 --sampling-method euler

Copy link

Contributor

Green-Sky commentedApr 3, 2025

I ran the freshly f16 generated imatrix file throughggml-org/llama.cpp#12718 :

$ result/bin/llama-imatrix --show-statistics --in-file ../stable-diffusion.cpp/flux.1-lite-8B-f16-768x768-g3.5-s28-b1-anime1.datComputing statistics for ../stable-diffusion.cpp/flux.1-lite-8B-f16-768x768-g3.5-s28-b1-anime1.dat (444 tensors) Layer               Tensor          μ(Importance Scores)   Contribution================================================================================    -                            wo          83.69              5.9814 %    -                             0          82.23              5.8774 %    -                          proj          64.58              4.6155 %    -                             2          61.22              4.3756 %    -                          proj          61.04              4.3627 %    -                          proj          45.07              3.2214 %    -                            wo          35.81              2.5595 %    -                       linear2          32.47              2.3209 %    -                          proj          31.14              2.2260 %    -                          proj          30.86              2.2058 %    -                      in_layer          28.69              2.0506 %    -                          proj          28.63              2.0465 %    -                        img_in          21.10              1.5079 %    -                          proj          20.19              1.4431 %    -                             2          16.62              1.1883 %    -                             0          16.57              1.1847 %    -                             0          15.83              1.1315 %    -                          proj          14.21              1.0154 %    -                             0          14.13              1.0097 %    -                      in_layer          14.00              1.0007 %    -                      in_layer          14.00              1.0006 %    -                          proj          13.91              0.9942 %    -                             0          13.52              0.9663 %    -                           qkv          13.28              0.9490 %    -                            wo          11.81              0.8442 %    -                           qkv          11.78              0.8419 %    -                           qkv          11.66              0.8331 %    -                           qkv          10.23              0.7314 %    -                       linear2           9.72              0.6950 %    -                       linear1           9.15              0.6540 %    -                          proj           8.44              0.6034 %    -                             2           8.44              0.6033 %    -                       linear1           7.96              0.5692 %    -                           qkv           7.81              0.5579 %    -                             0           7.54              0.5388 %    -                       linear2           7.40              0.5293 %    -                       linear2           7.09              0.5070 %    -                          proj           7.08              0.5064 %    -                           fc1           6.90              0.4929 %    -                           fc1           6.43              0.4595 %    -                           qkv           5.80              0.4142 %    -                             0           5.72              0.4089 %    -                          proj           5.60              0.4001 %    -                           qkv           5.51              0.3935 %    -                             0           5.42              0.3871 %    -                       linear2           5.24              0.3742 %    -                       linear1           5.21              0.3722 %    -                       linear1           5.09              0.3641 %    -                          proj           5.09              0.3639 %    -                           qkv           4.79              0.3427 %    -                           fc1           4.68              0.3344 %    -                           qkv           4.67              0.3335 %    -                           fc1           4.62              0.3305 %    -                       linear2           4.62              0.3302 %    -                       linear1           4.47              0.3196 %    -                       linear1           4.46              0.3187 %    -                           fc1           4.46              0.3186 %    -                       linear1           4.44              0.3172 %    -                           fc1           4.39              0.3140 %    -                           qkv           4.37              0.3124 %    -                       linear1           4.35              0.3110 %    -                       linear1           4.31              0.3082 %    -                       linear1           4.29              0.3067 %    -                       linear1           4.29              0.3063 %    -                           fc1           4.17              0.2981 %    -                       linear1           4.16              0.2974 %    -                       linear1           4.15              0.2964 %    -                       linear1           4.12              0.2942 %    -                        linear           4.10              0.2931 %    -                       linear1           4.09              0.2927 %    -                       linear1           4.06              0.2904 %    -                       linear2           4.00              0.2857 %    -                           fc1           3.97              0.2840 %    -                           qkv           3.94              0.2816 %    -                       linear2           3.93              0.2805 %    -                       linear2           3.76              0.2684 %    -                       linear1           3.75              0.2677 %    -                           fc1           3.65              0.2608 %    -                           qkv           3.63              0.2596 %    -                       linear1           3.61              0.2581 %    -                       linear1           3.61              0.2577 %    -                       linear2           3.51              0.2512 %    -                           fc1           3.46              0.2470 %    -                       linear2           3.45              0.2464 %    -                           fc1           3.44              0.2458 %    -                       linear1           3.42              0.2445 %    -                       linear1           3.40              0.2427 %    -                       linear2           3.33              0.2383 %    -                           fc1           3.32              0.2372 %    -                       linear1           3.32              0.2371 %    -                             2           3.30              0.2361 %    -                       linear1           3.30              0.2361 %    -                       linear1           3.30              0.2359 %    -                       linear1           3.27              0.2338 %    -                          proj           3.26              0.2333 %    -                       linear1           3.24              0.2318 %    -                       linear2           3.19              0.2280 %    -                       linear2           3.15              0.2251 %    -                       linear2           3.14              0.2244 %    -                       linear2           3.13              0.2238 %    -                       linear2           3.12              0.2229 %    -                       linear2           3.11              0.2222 %    -                       linear2           3.10              0.2214 %    -                       linear1           3.01              0.2154 %    -                       linear1           2.99              0.2140 %    -                       linear1           2.99              0.2137 %    -                             0           2.98              0.2132 %    -                       linear2           2.96              0.2118 %    -                           qkv           2.92              0.2089 %    -                       linear2           2.89              0.2062 %    -                       linear1           2.88              0.2062 %    -                       linear1           2.85              0.2037 %    -                       linear2           2.84              0.2030 %    -                            wo           2.83              0.2022 %    -                       linear1           2.82              0.2017 %    -                       linear2           2.80              0.2001 %    -                       linear1           2.76              0.1974 %    -                       linear1           2.74              0.1961 %    -                       linear1           2.74              0.1960 %    -                       linear2           2.72              0.1942 %    -                       linear2           2.71              0.1937 %    -                       linear1           2.68              0.1918 %    -                       linear1           2.66              0.1901 %    -                       linear2           2.65              0.1898 %    -                           qkv           2.62              0.1875 %    -                       linear2           2.61              0.1863 %    -                       linear2           2.59              0.1851 %    -                        v_proj           2.57              0.1836 %    -                        k_proj           2.57              0.1836 %    -                        q_proj           2.57              0.1836 %    -                       linear2           2.56              0.1829 %    -                        q_proj           2.47              0.1765 %    -                        v_proj           2.47              0.1765 %    -                        k_proj           2.47              0.1765 %    -                             0           2.46              0.1760 %    -                       linear1           2.43              0.1736 %    -                           qkv           2.43              0.1734 %    -                           qkv           2.34              0.1674 %    -                             0           2.32              0.1661 %    -                        v_proj           2.32              0.1658 %    -                        k_proj           2.32              0.1658 %    -                        q_proj           2.32              0.1658 %    -                       linear2           2.30              0.1644 %    -                            wo           2.30              0.1642 %    -                          proj           2.26              0.1613 %    -                       linear2           2.21              0.1580 %    -                             o           2.12              0.1513 %    -                        k_proj           2.07              0.1478 %    -                        v_proj           2.07              0.1478 %    -                        q_proj           2.07              0.1478 %    -                       linear2           2.07              0.1478 %    -                        v_proj           2.07              0.1477 %    -                        q_proj           2.07              0.1477 %    -                        k_proj           2.07              0.1477 %    -                       linear2           2.05              0.1468 %    -                        v_proj           2.02              0.1443 %    -                        k_proj           2.02              0.1443 %    -                        q_proj           2.02              0.1443 %    -                       linear2           1.96              0.1399 %    -                             o           1.95              0.1397 %    -                       linear2           1.95              0.1394 %    -                             o           1.87              0.1333 %    -                       linear2           1.82              0.1303 %    -                             2           1.80              0.1288 %    -                        v_proj           1.79              0.1279 %    -                        q_proj           1.79              0.1279 %    -                        k_proj           1.79              0.1279 %    -                       linear2           1.77              0.1266 %    -                       linear2           1.74              0.1247 %    -                            wo           1.70              0.1219 %    -                           fc2           1.64              0.1175 %    -                             2           1.62              0.1160 %    -                             2           1.53              0.1092 %    -                             2           1.49              0.1064 %    -                        q_proj           1.47              0.1051 %    -                        v_proj           1.47              0.1051 %    -                        k_proj           1.47              0.1051 %    -                             0           1.46              0.1043 %    -                        v_proj           1.37              0.0976 %    -                        q_proj           1.37              0.0976 %    -                        k_proj           1.37              0.0976 %    -                        q_proj           1.28              0.0916 %    -                        k_proj           1.28              0.0916 %    -                        v_proj           1.28              0.0916 %    -                             0           1.23              0.0881 %    -                             2           1.19              0.0850 %    -                             2           1.16              0.0832 %    -                             o           1.03              0.0735 %    -                            wo           1.00              0.0711 %    -                        k_proj           0.97              0.0691 %    -                        v_proj           0.97              0.0691 %    -                        q_proj           0.97              0.0691 %    -                        k_proj           0.97              0.0691 %    -                        v_proj           0.97              0.0691 %    -                        q_proj           0.97              0.0691 %    -                             0           0.76              0.0546 %    -                             o           0.74              0.0529 %    -                             o           0.70              0.0503 %    -                            wo           0.70              0.0500 %    -                             2           0.65              0.0464 %    -                             2           0.65              0.0463 %    -                          proj           0.62              0.0445 %    -                             2           0.61              0.0436 %    -                             0           0.60              0.0428 %    -                             o           0.55              0.0395 %    -                             2           0.50              0.0356 %    -                             o           0.48              0.0344 %    -                             o           0.48              0.0342 %    -                            wo           0.45              0.0324 %    -                             2           0.41              0.0293 %    -                            wo           0.40              0.0285 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                             1           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                            wo           0.39              0.0276 %    -                            wo           0.38              0.0269 %    -                             o           0.37              0.0267 %    -                             o           0.36              0.0261 %    -                             o           0.36              0.0258 %    -                             0           0.33              0.0233 %    -                             o           0.30              0.0218 %    -                            wo           0.30              0.0217 %    -                     out_layer           0.30              0.0214 %    -                        txt_in           0.30              0.0212 %    -                             o           0.25              0.0182 %    -                            wo           0.25              0.0178 %    -                             2           0.24              0.0172 %    -                      out_proj           0.24              0.0169 %    -                             o           0.21              0.0153 %    -                             o           0.21              0.0150 %    -                             o           0.21              0.0148 %    -                             o           0.20              0.0145 %    -                             o           0.20              0.0140 %    -                     out_layer           0.19              0.0133 %    -                             o           0.18              0.0129 %    -                            wo           0.18              0.0128 %    -                            wo           0.18              0.0127 %    -                             o           0.17              0.0119 %    -                      out_proj           0.16              0.0113 %    -                             o           0.13              0.0089 %    -                             o           0.12              0.0086 %    -                            wo           0.11              0.0079 %    -                            wo           0.10              0.0074 %    -                            wo           0.10              0.0070 %    -                            wo           0.08              0.0058 %    -                            wo           0.07              0.0050 %    -                          wi_1           0.07              0.0049 %    -                          wi_0           0.07              0.0049 %    -                             o           0.06              0.0046 %    -                      out_proj           0.06              0.0044 %    -                      out_proj           0.06              0.0044 %    -                            wo           0.06              0.0042 %    -                     out_layer           0.06              0.0040 %    -                            wo           0.05              0.0037 %    -                             v           0.05              0.0035 %    -                             k           0.05              0.0035 %    -                             q           0.05              0.0035 %    -                            wo           0.05              0.0033 %    -                      out_proj           0.05              0.0032 %    -                      out_proj           0.04              0.0032 %    -                          wi_0           0.04              0.0030 %    -                          wi_1           0.04              0.0030 %    -                             q           0.04              0.0029 %    -                             v           0.04              0.0029 %    -                             k           0.04              0.0029 %    -                             k           0.04              0.0028 %    -                             v           0.04              0.0028 %    -                             q           0.04              0.0028 %    -                      out_proj           0.04              0.0026 %    -                          wi_1           0.03              0.0023 %    -                          wi_0           0.03              0.0023 %    -                           fc2           0.03              0.0022 %    -                          wi_1           0.03              0.0020 %    -                          wi_0           0.03              0.0020 %    -                             v           0.03              0.0020 %    -                             k           0.03              0.0020 %    -                             q           0.03              0.0020 %    -                          wi_1           0.03              0.0020 %    -                          wi_0           0.03              0.0020 %    -                             q           0.03              0.0019 %    -                             v           0.03              0.0019 %    -                             k           0.03              0.0019 %    -                             k           0.03              0.0018 %    -                             q           0.03              0.0018 %    -                             v           0.03              0.0018 %    -                          wi_0           0.03              0.0018 %    -                          wi_1           0.03              0.0018 %    -                             q           0.02              0.0018 %    -                             k           0.02              0.0018 %    -                             v           0.02              0.0018 %    -                             v           0.02              0.0017 %    -                             k           0.02              0.0017 %    -                             q           0.02              0.0017 %    -                           fc2           0.02              0.0016 %    -                      out_proj           0.02              0.0016 %    -                             q           0.02              0.0015 %    -                             k           0.02              0.0015 %    -                             v           0.02              0.0015 %    -                          wi_1           0.02              0.0014 %    -                          wi_0           0.02              0.0014 %    -                          wi_1           0.02              0.0012 %    -                          wi_0           0.02              0.0012 %    -                           fc2           0.02              0.0011 %    -                           fc2           0.02              0.0011 %    -                             k           0.02              0.0011 %    -                             q           0.02              0.0011 %    -                             v           0.02              0.0011 %    -                      out_proj           0.02              0.0011 %    -                          wi_0           0.02              0.0011 %    -                          wi_1           0.02              0.0011 %    -                          wi_0           0.01              0.0011 %    -                          wi_1           0.01              0.0011 %    -                          wi_0           0.01              0.0010 %    -                          wi_1           0.01              0.0010 %    -                          wi_0           0.01              0.0010 %    -                          wi_1           0.01              0.0010 %    -                           fc2           0.01              0.0010 %    -                             k           0.01              0.0010 %    -                             q           0.01              0.0010 %    -                             v           0.01              0.0010 %    -                             k           0.01              0.0009 %    -                             q           0.01              0.0009 %    -                             v           0.01              0.0009 %    -                             k           0.01              0.0009 %    -                             v           0.01              0.0009 %    -                             q           0.01              0.0009 %    -                          wi_0           0.01              0.0009 %    -                          wi_1           0.01              0.0009 %    -                             v           0.01              0.0009 %    -                             k           0.01              0.0009 %    -                             q           0.01              0.0009 %    -                             q           0.01              0.0009 %    -                             k           0.01              0.0009 %    -                             v           0.01              0.0009 %    -                             v           0.01              0.0009 %    -                             k           0.01              0.0009 %    -                             q           0.01              0.0009 %    -                          wi_0           0.01              0.0009 %    -                          wi_1           0.01              0.0009 %    -                             k           0.01              0.0009 %    -                             q           0.01              0.0009 %    -                             v           0.01              0.0009 %    -                          wi_0           0.01              0.0009 %    -                          wi_1           0.01              0.0009 %    -                           fc2           0.01              0.0008 %    -                          wi_0           0.01              0.0008 %    -                          wi_1           0.01              0.0008 %    -                      out_proj           0.01              0.0007 %    -                             v           0.01              0.0007 %    -                             q           0.01              0.0007 %    -                             k           0.01              0.0007 %    -                      out_proj           0.01              0.0007 %    -                             q           0.01              0.0007 %    -                             k           0.01              0.0007 %    -                             v           0.01              0.0007 %    -                           fc2           0.01              0.0007 %    -                          wi_0           0.01              0.0006 %    -                          wi_1           0.01              0.0006 %    -                             k           0.01              0.0006 %    -                             v           0.01              0.0006 %    -                             q           0.01              0.0006 %    -                             v           0.01              0.0006 %    -                             q           0.01              0.0006 %    -                             k           0.01              0.0006 %    -                          wi_0           0.01              0.0006 %    -                          wi_1           0.01              0.0006 %    -                             k           0.01              0.0006 %    -                             q           0.01              0.0006 %    -                             v           0.01              0.0006 %    -                           fc2           0.01              0.0006 %    -                          wi_0           0.01              0.0006 %    -                          wi_1           0.01              0.0006 %    -                             v           0.01              0.0006 %    -                             q           0.01              0.0006 %    -                             k           0.01              0.0006 %    -                          wi_0           0.01              0.0006 %    -                          wi_1           0.01              0.0006 %    -                      out_proj           0.01              0.0006 %    -                          wi_0           0.01              0.0005 %    -                          wi_1           0.01              0.0005 %    -                          wi_0           0.01              0.0005 %    -                          wi_1           0.01              0.0005 %    -                          wi_1           0.01              0.0005 %    -                          wi_0           0.01              0.0005 %    -                          wi_0           0.01              0.0005 %    -                          wi_1           0.01              0.0005 %    -                             q           0.01              0.0005 %    -                             v           0.01              0.0005 %    -                             k           0.01              0.0005 %    -                           fc2           0.01              0.0004 %    -                           fc2           0.00              0.0003 %    -                           fc2           0.00              0.0001 %

somewhat unreadable.

Copy link

Contributor

idostyle commentedApr 3, 2025

llama-imatrix --show-statistics assumes that layer naming follows "blk.%d" instead of thesingle_blocks.%d/double_blocks.%d naming in flux and flux-lite. Would have to adjustprocess_tensor_name in that PR accordingly.

Copy link

EAddario commentedApr 3, 2025•
edited
Loading

Haven't had much of an opportunity to play with T2I models yet but if someone can point me to a sample model and imatrix file, happy to make the necessary changes.

Copy link

SA-j00u commentedJul 15, 2025•
edited
Loading

and maybe you can release this imatrix-es for different well know models
is some repo
for users that can't do it on full model
like
model_PartOfSHA256_steps.dat
without wasting of terabytes

Copy link

Contributor

Green-Sky commentedJul 15, 2025•
edited
Loading

and maybe you can release this imatrix-es for different well know models is some repo for users that can't do it on full model like model_PartOfSHA256_steps.dat without wasting of terabytes

Before we do that,ggml-org/llama.cpp#9400 looks to be merged very soon, so might be worth waiting for that.

edit: also, would be nice if someone could do the imats, that actually has enough vram (:

stduhpf added14 commits

July 18, 2025 12:22

Imatrix: first implementation attempt

6d0d214

Refactor imatrix implementation into main example

d397adf

do not use logger in imatrix.hpp

66399ad

imatrix: support DiT text encoders

bfc1616

Model: merge split models when converting

048a6de

Make imatrix not a header-only lib

8e3468b

Warn user if imatrix will get overwritten

154eb77

Fix missing includes

55f7f35

Refactor imatrix api, fix build shared libs

c6d2a57

imatrix: add docs

fafc5bd

Avoid redefinition of ggml_log_callback_default

aa1105b

Fix typos

875e339

forgot to use imatrix is some cases

d15cfef

fix imatrix collection on CPU backend

cf6baf8

stduhpf force-pushed theimatrix branch froma386ba9 tocf6baf8Compare

July 18, 2025 11:49

compilade mentioned this pull request

Jul 19, 2025

imatrix: add option to display importance score statistics for a given imatrix fileggml-org/llama.cpp#12718

Merged

imatrix: skip if activations are not finite

7d43bfe

stduhpf force-pushed theimatrix branch from5e16f31 to7d43bfeCompare

July 23, 2025 12:42

Labels

None yet

Movatterモバイル変換

Add imatrix support#633

Are you sure you want to change the base?

Add imatrix support#633

Uh oh!

Conversation

stduhpf commentedMar 23, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Usage:

Examples

Uh oh!

Green-Sky commentedMar 23, 2025

Uh oh!

stduhpf commentedMar 23, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

stduhpf commentedMar 23, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

Green-Sky commentedMar 23, 2025

Uh oh!

stduhpf commentedMar 23, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

idostyle commentedMar 24, 2025

Uh oh!

stduhpf commentedMar 24, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

K-quants

i-quants

Ground truth

Uh oh!

stduhpf commentedMar 28, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

Green-Sky commentedMar 28, 2025

Uh oh!

stduhpf commentedMar 28, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

stduhpf commentedMar 29, 2025

Uh oh!

Green-Sky commentedMar 31, 2025

Uh oh!

stduhpf commentedMar 31, 2025

Uh oh!

Green-Sky commentedMar 31, 2025

Uh oh!

Green-SkyMar 31, 2025

Choose a reason for hiding this comment

Uh oh!

Green-SkyMar 31, 2025

Choose a reason for hiding this comment

Uh oh!

stduhpfMar 31, 2025

Choose a reason for hiding this comment

Uh oh!

Green-SkyMar 31, 2025

Choose a reason for hiding this comment

Uh oh!

Green-Sky commentedMar 31, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

stduhpf commentedMar 31, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

Green-Sky commentedMar 31, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

Green-Sky commentedMar 31, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

stduhpf commentedMar 31, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

stduhpf commentedMar 31, 2025

Uh oh!

Green-Sky commentedMar 31, 2025

Uh oh!

Green-Sky commentedMar 31, 2025

stduhpf commentedMar 23, 2025•
edited
Loading

stduhpf commentedMar 23, 2025•
edited
Loading

stduhpf commentedMar 23, 2025•
edited
Loading

stduhpf commentedMar 23, 2025•
edited
Loading

stduhpf commentedMar 24, 2025•
edited
Loading

stduhpf commentedMar 28, 2025•
edited
Loading

stduhpf commentedMar 28, 2025•
edited
Loading

Green-Sky commentedMar 31, 2025•
edited
Loading

stduhpf commentedMar 31, 2025•
edited
Loading

Green-Sky commentedMar 31, 2025•
edited
Loading

Green-Sky commentedMar 31, 2025•
edited
Loading

stduhpf commentedMar 31, 2025•
edited
Loading

Green-Sky commentedApr 1, 2025•
edited
Loading

Green-Sky commentedApr 1, 2025•
edited
Loading

EAddario commentedApr 3, 2025•
edited
Loading

SA-j00u commentedJul 15, 2025•
edited
Loading

Green-Sky commentedJul 15, 2025•
edited
Loading