Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Add imatrix support#633

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
stduhpf wants to merge15 commits intoleejet:master
base:master
Choose a base branch
Loading
fromstduhpf:imatrix
Open

Add imatrix support#633

stduhpf wants to merge15 commits intoleejet:masterfromstduhpf:imatrix

Conversation

@stduhpf
Copy link
Contributor

@stduhpfstduhpf commentedMar 23, 2025
edited
Loading

Adds support for llama.cpp-style importance matrices (seehttps://github.com/ggml-org/llama.cpp/blob/master/examples/imatrix/README.md andggml-org/llama.cpp#4861) to increase the performance of quantized models.

Models generated with imatrix are backwards compatible with the previous releases.

Usage:

To train imatrix:
sd.exe [same exact parameters as normal generation] --imat-out imatrix.dat
This will generate an image and train the imatrix while doing so (you can use-b to generate multiple images at once).

To keep training an existing imatrix:
sd.exe [same exact parameters as normal generation] --imat-out imatrix.dat --imat-in imatrix.dat

You can load multiple imatrix at once, this will merge them in the output:
sd.exe [same exact parameters as normal generation] --imat-out imatrix.dat --imat-in imatrix.dat --imat-in imatrix2.dat

Quantize with imatrix:
sd.exe -M convert [same exact parameters as normal quantization] --imat-in imatrix.dat
(again you can use multiple imatrix)

Examples

"simple" imatrix trained on a batch of 32 image generations (512x512) with the dreamshaper_8LCM (f16, 8 steps) model and empty prompts. (because of the model's bias, it was mostly calibrated on portraits of asian women):

"better" imatrix trained on 504 generations using diverse prompst and aspect ratios, using the same model.

iq3_xxs static *iq3_xxs with simple imatrixiq3_xxs with better imatrixfp16
xxs_stat (no prompt)xxs_imat (no prompt)xxs_imatv2 (no prompt)full (no prompt)
xxs_stat a cute cat playing with yarnxxs_imat a cute cat playing with yarnxxs_imatv2 a cute cat playing with yarnfull a cute cat playing with yarn
xxs_stat a girl wearing a funny hatxxs_imat a girl wearing a funny hatxxs_imatv2 a girl wearing a funny hatfull a girl wearing a funny hat

* static means that the importance matrix is not active (all ones), as it is set up to do when quantizing with the master branch.

iq2_xs seems completely broken even with imatrix for this model, but the effect is still noticable. With iq4, the static quant is already pretty good so the difference in quality isn't obvious. (both using the "better" imatrix here)

iq2_xs staticiq2_xs imatrixiq4_nl staticiq4_nl imatrix
2xs_static a girl wearing a funny hat2xs_imatv2 a girl wearing a funny hat4nl_static a girl wearing a funny hat4nl_imatv2 a girl wearing a funny hat

Interesting observation: for the "girl wearing a funny hat" prompt, static quants put her in a city like the original fp16 model does, while the quants calibrated with the "better" imatrix put her in a forest. This is most likely due to a bias in the calibraton dataset, which contained some samples of girls with forest background and none with city backgrounds.

You can find these models and the imatrices used here:https://huggingface.co/stduhpf/dreamshaper-8LCM-im-GGUF-sdcpp

You can find examples with other models in the discussion.

lin72h, Green-Sky, fszontagh, idostyle, and dsignarius reacted with thumbs up emojiGreen-Sky, lin72h, and vmobilis reacted with rocket emoji
@Green-Sky
Copy link
Contributor

@stduhpf Thank you for working on this :)

Do you think transformer based models work better with importance, like ggml quants generally do? (eg. flux)

@stduhpf
Copy link
ContributorAuthor

stduhpf commentedMar 23, 2025
edited
Loading

@Green-Sky I have no idea. I'm not sure it would work right now, but I've only tested sd1.5 so far, because it's so much faster

@stduhpf
Copy link
ContributorAuthor

stduhpf commentedMar 23, 2025
edited
Loading

I don't understand why the CI's linker is unable to find log_printf(). It works on my machine just fine, which is also windows, and I'm also using cmake... It's probably because I'm not using-DSD_BUILD_SHARED_LIBS=ON

Green-Sky reacted with eyes emoji

@Green-Sky
Copy link
Contributor

I don't understand why the CI's linker is unable to find log_printf(). It works on my machine just fine, which is also windows, and I'm also using cmake... It's probably because I'm not using-DSD_BUILD_SHARED_LIBS=ON

Maybe imatrix.hpp should just not be a header only lib ^^

stduhpf reacted with confused emoji

@stduhpf
Copy link
ContributorAuthor

stduhpf commentedMar 23, 2025
edited
Loading

@Green-Sky I'm doing some tests with sd3, it seems to be doing something, but cooking imatrix for larger un-distilled models takes ages compared to something like sd1.5 LCM.

Now that I think about It, applying imatrix to flux (or any model with standalone diffusion model) will be tricky, The imatrix uses the name that the weight have at runtime, but when quantizing the names are not prefixed like they are at runtime.

Green-Sky reacted with eyes emoji

@idostyle
Copy link
Contributor

Nice job stduhpf.

Do you think transformer based models work better with importance, like ggml quants generally do? (eg. flux)

Flux also seems to struggle with the lower bit i-quants:https://huggingface.co/Eviation/flux-imatrix

stduhpf and Green-Sky reacted with eyes emoji

@stduhpf
Copy link
ContributorAuthor

stduhpf commentedMar 24, 2025
edited
Loading

Results with my sd3 2B experiments, using a basic imatrix trained on a dozen generations only.

K-quants

sd3_medium_incl_clips_t5xxl q3_K staticsd3_medium_incl_clips_t5xxl q3_K imatrixsd3_medium_incl_clips_t5xxl q4_K staticsd3_medium_incl_clips_t5xxl q4_K imatrix
outputoutputoutputoutput
sd3_medium_incl_clips_t5xxl q5_K staticsd3_medium_incl_clips_t5xxl q5_K imatrixsd3_medium_incl_clips_t5xxl q6_K staticsd3_medium_incl_clips_t5xxl q6_K imatrix
outputoutputoutputoutput

i-quants

sd3_medium_incl_clips_t5xxl iq3_xxs staticsd3_medium_incl_clips_t5xxl iq3_xxs imatrixsd3_medium_incl_clips_t5xxl iq3_s staticsd3_medium_incl_clips_t5xxl iq3_s imatrix
outputoutputoutputoutput
sd3_medium_incl_clips_t5xxl iq4_xs staticsd3_medium_incl_clips_t5xxl iq4_xs imatrixsd3_medium_incl_clips_t5xxl iq4_nl staticsd3_medium_incl_clips_t5xxl iq4_nl imatrix
output copy 11output copy 10outputoutput

Ground truth

sd3_medium_incl_clips_t5xxl fp16
output

(all images generated with same settings, only quantization changes)

fszontagh and Green-Sky reacted with heart emoji

@stduhpf
Copy link
ContributorAuthor

stduhpf commentedMar 28, 2025
edited
Loading

Ok I found a satisfactory way to apply imatrix to flux. (Also it seems like training the imatrix with quantized models works just fine)

Flux.1 schnell q2_k staticFlux.1 schnell q2_k imatrixFlux.1 schnell q2_k imatrix trained on Flux dev
schnell-q2k-staticschnell-q2k-imatrixoutput
Flux.1 dev q2_k staticFlux.1 dev q2_k imatrixFlux.1 dev q2_k imatrix trained on Flux schnell
outputoutputoutput

(imatrix trained on 10 generations using static q4_k (schnell) or iq4_nl (dev) model)

dsignarius reacted with thumbs up emojiGreen-Sky reacted with rocket emoji

@Green-Sky
Copy link
Contributor

Looks great.

Did you tune it on the same amount sampling steps? Optimising for your own usecase is probably the best for lower quants.

@stduhpf
Copy link
ContributorAuthor

stduhpf commentedMar 28, 2025
edited
Loading

Did you tune it on the same amount sampling steps? Optimising for your own usecase is probably the best for lower quants.

For the schnell one, I trained it with 4 steps only, with different resolutions. My PC is currently cooking a Flux dev imatrix using varying step count (from 16 to 40). Maybe I'll try to make one with fixed step count to compare with after.

Green-Sky reacted with eyes emoji

@stduhpfstduhpfforce-pushed theimatrix branch 2 times, most recently from4ec74a9 to24d8fd7CompareMarch 29, 2025 18:03
@stduhpfstduhpf marked this pull request as ready for reviewMarch 29, 2025 19:04
@stduhpf
Copy link
ContributorAuthor

I feel like this is pretty much ready now.

Green-Sky reacted with rocket emoji

@stduhpfstduhpf changed the titleImatrix: first implementation attemptAdd imatrix supportMar 29, 2025
@stduhpfstduhpfforce-pushed theimatrix branch 2 times, most recently from7379982 to2dc5dfbCompareMarch 29, 2025 22:54
@Green-Sky
Copy link
Contributor

I am trying this right now. I am no expert on how the importance data flows into the quantization, but does it make sense to sample using a quant, just to recreate the same quant with the importance data?

You showed that using a higher quant to generate the imat works, but using the same quant would be interesting...

@stduhpf
Copy link
ContributorAuthor

I am trying this right now. I am no expert on how the importance data flows into the quantization, but does it make sense to sample using a quant, just to recreate the same quant with the importance data?

I think it would work. As long as the original quant is "good enough" to generate coherent images, the activations should already be representative of the ideal activations, and therefore the imatrix shouldn't be too different from the one trained on the full precision model, with the same kind of improvements.

@Green-Sky
Copy link
Contributor

Thanks, good to know. This all reminds me very much of PGO, where you usually stack them, to get the last 1-2% performance. 😄

I am doing q5_k right now, and the image is very coherent indeed.

printf(" --type [TYPE] weight type (examples: f32, f16, q4_0, q4_1, q5_0, q5_1, q8_0, q2_K, q3_K, q4_K)\n");
printf(" If not specified, the default is the type of the weight file\n");
printf(" --imat-out [PATH] If set, compute the imatrix for this run and save it to the provided path");
printf(" --imat-in [PATH] Use imatrix for quantization.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

both new options miss a new line.

stduhpf reacted with thumbs up emoji
imatrix.cpp Outdated
returnfalse;
}

// Recreate the state as expected by save_imatrix(), and corerct for weighted sum.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

corerct -> correct

stduhpf reacted with thumbs up emoji
Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I don't know, I just copy-pasted that part of the code, maybe the typo is important

fszontagh reacted with laugh emoji
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

😁

@Green-Sky
Copy link
Contributor

Green-Sky commentedMar 31, 2025
edited
Loading

Not sure I did anything wrong, but using imats produced by the same quant seems to produce the same model file. So it either does not work, or I did something wrong.

$ result/bin/sd -M convert -m models/flux.1-lite-8B.safetensors --type q5_K -o models/flux.1-lite-8B-q5_k-igo.gguf --imat-in flux.1-lite-8B-q5_k-1024x768-g3.3-s24-b3-asia1.dat --imat-in flux.1-lite-8B-q5_k-512x768-g3.4-s30-b3-nordic1.dat --imat-in flux.1-lite-8B-q5_k-768x768-g3.2-s30-b3-nordic2.dat --imat-in flux.1-lite-8B-q5_k-768x768-g3.5-s28-b3-anime1.datloading imatrix from 'flux.1-lite-8B-q5_k-1024x768-g3.3-s24-b3-asia1.dat'loading imatrix from 'flux.1-lite-8B-q5_k-512x768-g3.4-s30-b3-nordic1.dat'loading imatrix from 'flux.1-lite-8B-q5_k-768x768-g3.2-s30-b3-nordic2.dat'loading imatrix from 'flux.1-lite-8B-q5_k-768x768-g3.5-s28-b3-anime1.dat'[INFO ] model.cpp:918  - load models/flux.1-lite-8B.safetensors using safetensors format[INFO ] model.cpp:2003 - model tensors mem size: 5562.48MB  |==================================================| 516/516 - 24.39it/s[INFO ] model.cpp:2038 - load tensors done[INFO ] model.cpp:2039 - trying to save tensors to models/flux.1-lite-8B-q5_k-igo.ggufconvert 'models/flux.1-lite-8B.safetensors'/'' to 'models/flux.1-lite-8B-q5_k-igo.gguf' success
ba1a721718a7431f79a3266b127999bae515ed2b3f0cb835558b9db7d0bb9890  models/flux.1-lite-8B-q5_k.ggufba1a721718a7431f79a3266b127999bae515ed2b3f0cb835558b9db7d0bb9890  models/flux.1-lite-8B-q5_k-igo.gguf

edit:strings shows that the imat contains data for the diffusion model:

...model.diffusion_model.single_blocks.35.modulation.lin.weightHmodel.diffusion_model.double_blocks.5.img_mlp.2.weightH...

edit2: and the imats are different

b481d4c6e8903ac4a1e612a8e9b5dc8afc4b2bb31d1fea2a2a404e9bd565416a  flux.1-lite-8B-q5_k-1024x768-g3.3-s24-b3-asia1.datab385c84e8bd4002a1579350a7bdd01a96581900922cf192bc47012224038ebe  flux.1-lite-8B-q5_k-512x768-g3.4-s30-b3-nordic1.dat

edit3: tried to do an optimized q4_k, same issue, so something is fundamentally broken with the flux prune/distill/dedistill i am using.
https://huggingface.co/Freepik/flux.1-lite-8B
https://huggingface.co/Green-Sky/flux.1-lite-8B-GGUF/tree/main/base

@stduhpf
Copy link
ContributorAuthor

stduhpf commentedMar 31, 2025
edited
Loading

Not sure I did anything wrong, but using imats produced by the same quant seems to produce the same model file. So it either does not work, or I did something wrong.

$ result/bin/sd -M convert -m models/flux.1-lite-8B.safetensors --type q5_K -o models/flux.1-lite-8B-q5_k-igo.gguf --imat-in flux.1-lite-8B-q5_k-1024x768-g3.3-s24-b3-asia1.dat --imat-in flux.1-lite-8B-q5_k-512x768-g3.4-s30-b3-nordic1.dat --imat-in flux.1-lite-8B-q5_k-768x768-g3.2-s30-b3-nordic2.dat --imat-in flux.1-lite-8B-q5_k-768x768-g3.5-s28-b3-anime1.datloading imatrix from 'flux.1-lite-8B-q5_k-1024x768-g3.3-s24-b3-asia1.dat'loading imatrix from 'flux.1-lite-8B-q5_k-512x768-g3.4-s30-b3-nordic1.dat'loading imatrix from 'flux.1-lite-8B-q5_k-768x768-g3.2-s30-b3-nordic2.dat'loading imatrix from 'flux.1-lite-8B-q5_k-768x768-g3.5-s28-b3-anime1.dat'[INFO ] model.cpp:918  - load models/flux.1-lite-8B.safetensors using safetensors format[INFO ] model.cpp:2003 - model tensors mem size: 5562.48MB  |==================================================| 516/516 - 24.39it/s[INFO ] model.cpp:2038 - load tensors done[INFO ] model.cpp:2039 - trying to save tensors to models/flux.1-lite-8B-q5_k-igo.ggufconvert 'models/flux.1-lite-8B.safetensors'/'' to 'models/flux.1-lite-8B-q5_k-igo.gguf' success
ba1a721718a7431f79a3266b127999bae515ed2b3f0cb835558b9db7d0bb9890  models/flux.1-lite-8B-q5_k.ggufba1a721718a7431f79a3266b127999bae515ed2b3f0cb835558b9db7d0bb9890  models/flux.1-lite-8B-q5_k-igo.gguf

Try withresult/bin/sd -M convert --diffusion-model models/flux.1-lite-8B.safetensors --type q5_K -o models/flux.1-lite-8B-q5_k-igo.gguf --imat-in flux.1-lite-8B-q5_k-1024x768-g3.3-s24-b3-asia1.dat --imat-in flux.1-lite-8B-q5_k-512x768-g3.4-s30-b3-nordic1.dat --imat-in flux.1-lite-8B-q5_k-768x768-g3.2-s30-b3-nordic2.dat --imat-in flux.1-lite-8B-q5_k-768x768-g3.5-s28-b3-anime1.dat (you can also include vae/text encoders)

And then run it with-m models/flux.1-lite-8B-q5_k-igo.gguf instead of--diffusion-model models/flux.1-lite-8B-q5_k-igo.gguf

Green-Sky reacted with eyes emoji

@Green-Sky
Copy link
Contributor

Green-Sky commentedMar 31, 2025
edited
Loading

Another issue. When I use flash attention it breaks the imat collection after a varying amount of images. (usingsd_turbo here)

[WARN ] imatrix.cpp:140  - inf detected in model.diffusion_model.input_blocks.7.1.transformer_blocks.0.attn1.to_q.weight
Details
$ result/bin/sd -m models/sd_turbo-f16-q8_0.gguf --cfg-scale 1 --steps 8 --schedule karras -p "a lovely cat" --imat-out sd_turbo.imat -b 32 -s -1 --diffusion-fa IMPORTANT: imatrix file sd_turbo.imat already exists, but wasn't found in the imatrix inputs.sd_turbo.imat will get overwritten!ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    noggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: noggml_cuda_init: found 1 CUDA devices:  Device 0: NVIDIA GeForce RTX 2070, compute capability 7.5, VMM: yes[INFO ] stable-diffusion.cpp:197  - loading model from 'models/sd_turbo-f16-q8_0.gguf'[INFO ] model.cpp:915  - load models/sd_turbo-f16-q8_0.gguf using gguf format[INFO ] stable-diffusion.cpp:244  - Version: SD 2.x[INFO ] stable-diffusion.cpp:277  - Weight type:                 q8_0[INFO ] stable-diffusion.cpp:278  - Conditioner weight type:     q8_0[INFO ] stable-diffusion.cpp:279  - Diffusion model weight type: q8_0[INFO ] stable-diffusion.cpp:280  - VAE weight type:             q8_0[INFO ] stable-diffusion.cpp:328  - Using flash attention in the diffusion model  |>                                                 | 3/1323 - 0.00it/s[INFO ] model.cpp:1915 - unknown tensor 'cond_stage_model.transformer.text_model.text_projection | q8_0 | 2 [1024, 1024, 1, 1, 1]' in model file  |==================================================| 1323/1323 - 1000.00it/s[INFO ] stable-diffusion.cpp:503  - total params memory size = 2006.07MB (VRAM 2006.07MB, RAM 0.00MB): clip 500.53MB(VRAM), unet 1411.07MB(VRAM), vae 94.47MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM)[INFO ] stable-diffusion.cpp:522  - loading model from 'models/sd_turbo-f16-q8_0.gguf' completed, taking 0.73s[INFO ] stable-diffusion.cpp:556  - running in eps-prediction mode[INFO ] stable-diffusion.cpp:566  - running with Karras schedule[INFO ] stable-diffusion.cpp:690  - Attempting to apply 0 LoRAs[INFO ] stable-diffusion.cpp:1246 - apply_loras completed, taking 0.00s[INFO ] stable-diffusion.cpp:1379 - get_learned_condition completed, taking 51 ms[INFO ] stable-diffusion.cpp:1402 - sampling using Euler A method[INFO ] stable-diffusion.cpp:1439 - generating image: 1/32 - seed 832414162  |==================================================| 8/8 - 3.60it/s[INFO ] stable-diffusion.cpp:1478 - sampling completed, taking 2.30s[INFO ] stable-diffusion.cpp:1439 - generating image: 2/32 - seed 832414163  |==================================================| 8/8 - 3.64it/s[INFO ] stable-diffusion.cpp:1478 - sampling completed, taking 2.27s[INFO ] stable-diffusion.cpp:1439 - generating image: 3/32 - seed 832414164  |==================================================| 8/8 - 3.64it/s[INFO ] stable-diffusion.cpp:1478 - sampling completed, taking 2.27s[INFO ] stable-diffusion.cpp:1439 - generating image: 4/32 - seed 832414165  |==================================================| 8/8 - 3.62it/s[INFO ] stable-diffusion.cpp:1478 - sampling completed, taking 2.28s[INFO ] stable-diffusion.cpp:1439 - generating image: 5/32 - seed 832414166  |==================================================| 8/8 - 3.51it/s[INFO ] stable-diffusion.cpp:1478 - sampling completed, taking 2.29s[INFO ] stable-diffusion.cpp:1439 - generating image: 6/32 - seed 832414167  |==================================================| 8/8 - 3.64it/s[INFO ] stable-diffusion.cpp:1478 - sampling completed, taking 2.26s[INFO ] stable-diffusion.cpp:1439 - generating image: 7/32 - seed 832414168  |==================================================| 8/8 - 3.68it/s[INFO ] stable-diffusion.cpp:1478 - sampling completed, taking 2.26s[INFO ] stable-diffusion.cpp:1439 - generating image: 8/32 - seed 832414169  |==================================================| 8/8 - 3.65it/s[INFO ] stable-diffusion.cpp:1478 - sampling completed, taking 2.26s[INFO ] stable-diffusion.cpp:1439 - generating image: 9/32 - seed 832414170  |============>                                     | 2/8 - 3.65it/s[WARN ] imatrix.cpp:140  - inf detected in model.diffusion_model.input_blocks.7.1.transformer_blocks.0.attn1.to_q.weight

update: happened after much longer without flash attention too.

[WARN ] imatrix.cpp:140  - inf detected in model.diffusion_model.output_blocks.7.1.transformer_blocks.0.attn1.to_q.weight
stduhpf reacted with eyes emoji

@Green-Sky
Copy link
Contributor

Green-Sky commentedMar 31, 2025
edited
Loading

Try withresult/bin/sd -M convert --diffusion-model models/flux.1-lite-8B.safetensors --type q5_K -o models/flux.1-lite-8B-q5_k-igo.gguf --imat-in flux.1-lite-8B-q5_k-1024x768-g3.3-s24-b3-asia1.dat --imat-in flux.1-lite-8B-q5_k-512x768-g3.4-s30-b3-nordic1.dat --imat-in flux.1-lite-8B-q5_k-768x768-g3.2-s30-b3-nordic2.dat --imat-in flux.1-lite-8B-q5_k-768x768-g3.5-s28-b3-anime1.dat (you can also include vae/text encoders)

And then run it with-m models/flux.1-lite-8B-q5_k-igo.gguf instead of--diffusion-model models/flux.1-lite-8B-q5_k-igo.gguf

This seems to have worked. Not a fan of the tensor renaming though.

base q5_kq5_k with q5_k imatdiff
flux 1-lite-8B-q5_k-anime-base-s545345362flux 1-lite-8B-q5_k-anime-igo-q5_k-s545345362flux 1-lite-8B-q5_k-anime-igo-q5_k-diff-to-q5_k-s545345362

(they where obv identical before)

base q4_kq4_k with q5_k imatdiff
flux 1-lite-8B-q4_k-anime-base-s545345362flux 1-lite-8B-q4_k-anime-igo-q5_k-s545345362image

It looks like the imat from q5_k made q4_k stray more. I don't have the full size model image for this example (it's too expensive ngl), but to me this looks worse. However, the importance guided quant seems to have less dither noise, so it got better somewhere...


I was trying to measure the visual quality difference of the quants, and I saw and remembered that flux specifically showsdither-like patterns when you go lower with the quants. So I tried to measure that with gimp. I first applied a high-pass filter (at 0.5 std and 4 contrast) and then used the histogram plot.

base q5_kq5_k with q5_k imat
imageimage

Base is spread out a little more, so this should mean there is indeed more high frequency noise, but this is just a single sample AND a highly experimental and somewhat subjective analysis 😅

@stduhpf
Copy link
ContributorAuthor

stduhpf commentedMar 31, 2025
edited
Loading

This seems to have worked. Not a fan of the tensor renaming though.

Yes, this is a bit annoying for flux models. I thought of adding a way to extract the diffusion model (or other components like vae or text encoders) from the model file, but I feel like this is getting a bit out of scope for this PR. (Something likesd.exe -M extract -m "models/flux.1-lite-8B-q5_k-igo.gguf" --diffusion-model "models/flux.1-lite-8B-q5_k-igo.gguf" maybe, or likesd.exe -M extract -m "models/flux.1-lite-8B-q5_k-igo.gguf" -o "models/flux.1-lite-8B-q5_k-igo.gguf" -p "model.diffusion_model"

Green-Sky reacted with thumbs up emoji

@stduhpf
Copy link
ContributorAuthor

@Green-Sky I've had a similar problem with sd3 q4_k. (with fp16 imatrix). For some reason the outputs of the q4_k model seems to stray further from the full precision model when using imatrix, but it still seem to minimise artifacts. (I think q6_k shows a similar behavior to a lesser extent)

@Green-Sky
Copy link
Contributor

@Green-Sky I've had a similar problem with sd3 q4_k. (with fp16 imatrix). For some reason the outputs of the q4_k model seems to stray further from the full precision model when using imatrix, but it still seem to minimise artifacts. (I think q6_k shows a similar behavior to a lesser extent)

Interesting. Here is q2_k, which seems to have the same, but much more noticeable behavior.

q2_kq2_k with q5_k imat
flux 1-lite-8B-q2_k-anime-base-s545345362flux 1-lite-8B-q2_k-anime-igo-q5_k-s545345362

@Green-Sky
Copy link
Contributor

q3_kq3_k with q5_k imat
flux 1-lite-8B-q3_k-anime-base-s545345362flux 1-lite-8B-q3_k-anime-igo-q5_k-s545345362

the imat version might look more coherent, but it also looks more blury. So over all for qx_k quants, imat seems to act more like a blur, at least a little.

I will run some q8_0 imat profiling on cpu tomorrow to get a better source of truth.

@Green-Sky
Copy link
Contributor

Green-Sky commentedApr 1, 2025
edited
Loading

I will run some q8_0 imat profiling on cpu tomorrow to get a better source of truth.

I actually cant, it crashes.

result/bin/sd --diffusion-model models/flux.1-lite-8B-q8_0.gguf --clip_l models/flux-extra/clip_l-q8_0.gguf --t5xxl models/flux-extra/t5xxl_q8_0.gguf --vae models/flux-extra/ae-f16.gguf -v --color -t 8 --cfg-scale 1 --sampling-method euler --steps 24 --guidance 3.3 -W 1024 -H 768 -p "a lovely cat" --imat-out imat.dat
Option:    n_threads:         8    mode:              txt2img    model_path:    wtype:             unspecified    clip_l_path:       models/flux-extra/clip_l-q8_0.gguf    clip_g_path:    t5xxl_path:        models/flux-extra/t5xxl_q8_0.gguf    diffusion_model_path:   models/flux.1-lite-8B-q8_0.gguf    vae_path:          models/flux-extra/ae-f16.gguf    taesd_path:    esrgan_path:    controlnet_path:    embeddings_path:    stacked_id_embeddings_path:    input_id_images_path:    style ratio:       20.00    normalize input image :  false    output_path:       output.png    init_img:    mask_img:    control_image:    clip on cpu:       false    controlnet cpu:    false    vae decoder on cpu:false    diffusion flash attention:false    strength(control): 0.90    prompt:            a lovely cat    negative_prompt:    min_cfg:           1.00    cfg_scale:         1.00    slg_scale:         0.00    guidance:          3.30    eta:               0.00    clip_skip:         -1    width:             1024    height:            768    sample_method:     euler    schedule:          default    sample_steps:      24    strength(img2img): 0.75    rng:               cuda    seed:              42    batch_count:       1    vae_tiling:        false    upscale_repeats:   1System Info:    SSE3 = 1    AVX = 1    AVX2 = 1    AVX512 = 0    AVX512_VBMI = 0    AVX512_VNNI = 0    FMA = 1    NEON = 0    ARM_FMA = 0    F16C = 1    FP16_VA = 0    WASM_SIMD = 0    VSX = 0[DEBUG] stable-diffusion.cpp:188  - Using CPU backend[INFO ] stable-diffusion.cpp:204  - loading clip_l from 'models/flux-extra/clip_l-q8_0.gguf'[INFO ] model.cpp:915  - load models/flux-extra/clip_l-q8_0.gguf using gguf format[DEBUG] model.cpp:932  - init from 'models/flux-extra/clip_l-q8_0.gguf'[INFO ] stable-diffusion.cpp:218  - loading t5xxl from 'models/flux-extra/t5xxl_q8_0.gguf'[INFO ] model.cpp:915  - load models/flux-extra/t5xxl_q8_0.gguf using gguf format[DEBUG] model.cpp:932  - init from 'models/flux-extra/t5xxl_q8_0.gguf'[INFO ] stable-diffusion.cpp:225  - loading diffusion model from 'models/flux.1-lite-8B-q8_0.gguf'[INFO ] model.cpp:915  - load models/flux.1-lite-8B-q8_0.gguf using gguf format[DEBUG] model.cpp:932  - init from 'models/flux.1-lite-8B-q8_0.gguf'[INFO ] stable-diffusion.cpp:232  - loading vae from 'models/flux-extra/ae-f16.gguf'[INFO ] model.cpp:915  - load models/flux-extra/ae-f16.gguf using gguf format[DEBUG] model.cpp:932  - init from 'models/flux-extra/ae-f16.gguf'[INFO ] stable-diffusion.cpp:244  - Version: Flux[INFO ] stable-diffusion.cpp:277  - Weight type:                 q8_0[INFO ] stable-diffusion.cpp:278  - Conditioner weight type:     q8_0[INFO ] stable-diffusion.cpp:279  - Diffusion model weight type: q8_0[INFO ] stable-diffusion.cpp:280  - VAE weight type:             f16[DEBUG] stable-diffusion.cpp:282  - ggml tensor size = 400 bytes[DEBUG] clip.hpp:171  - vocab size: 49408[DEBUG] clip.hpp:182  -  trigger word img already in vocab[INFO ] flux.hpp:889  - Flux blocks: 8 double, 38 single[DEBUG] ggml_extend.hpp:1169 - clip params backend buffer size =  231.50 MB(RAM) (196 tensors)[DEBUG] ggml_extend.hpp:1169 - t5 params backend buffer size =  4826.11 MB(RAM) (219 tensors)[DEBUG] ggml_extend.hpp:1169 - flux params backend buffer size =  8457.01 MB(RAM) (516 tensors)[DEBUG] ggml_extend.hpp:1169 - vae params backend buffer size =  94.57 MB(RAM) (138 tensors)[DEBUG] stable-diffusion.cpp:419  - loading weights[DEBUG] model.cpp:1737 - loading tensors from models/flux-extra/clip_l-q8_0.gguf  |========>                                         | 196/1176 - 0.00it/s[DEBUG] model.cpp:1737 - loading tensors from models/flux-extra/t5xxl_q8_0.gguf  |=================>                                | 413/1176 - 0.00it/s[INFO ] model.cpp:1915 - unknown tensor 'text_encoders.t5xxl.transformer.encoder.embed_tokens.weight | q8_0 | 2 [4096, 32128, 1, 1, 1]' in model file  |=================>                                | 416/1176 - 9.17it/s[DEBUG] model.cpp:1737 - loading tensors from models/flux.1-lite-8B-q8_0.gguf  |=======================================>          | 932/1176 - 25.00it/s[DEBUG] model.cpp:1737 - loading tensors from models/flux-extra/ae-f16.gguf  |=============================================>    | 1070/1176 - 200.00it/s[INFO ] stable-diffusion.cpp:503  - total params memory size = 13609.19MB (VRAM 0.00MB, RAM 13609.19MB): clip 5057.61MB(RAM), unet 8457.01MB(RAM), vae 94.57MB(RAM), controlnet 0.00MB(VRAM), pmid 0.00MB(RAM)[INFO ] stable-diffusion.cpp:522  - loading model from '' completed, taking 10.34s[INFO ] stable-diffusion.cpp:543  - running in Flux FLOW mode[DEBUG] stable-diffusion.cpp:600  - finished loaded file[DEBUG] stable-diffusion.cpp:1548 - txt2img 1024x768[DEBUG] stable-diffusion.cpp:1241 - prompt after extract and remove lora: "a lovely cat"[INFO ] stable-diffusion.cpp:690  - Attempting to apply 0 LoRAs[INFO ] stable-diffusion.cpp:1246 - apply_loras completed, taking 0.00s[DEBUG] conditioner.hpp:1055 - parse 'a lovely cat' to [['a lovely cat', 1], ][DEBUG] clip.hpp:311  - token length: 77[DEBUG] t5.hpp:397  - token length: 256[DEBUG] clip.hpp:737  - Missing text_projection matrix, assuming identity...[DEBUG] ggml_extend.hpp:1121 - clip compute buffer size: 1.40 MB(RAM)[DEBUG] clip.hpp:737  - Missing text_projection matrix, assuming identity...[DEBUG] ggml_extend.hpp:1121 - t5 compute buffer size: 68.25 MB(RAM)[DEBUG] conditioner.hpp:1170 - computing condition graph completed, taking 9611 ms[INFO ] stable-diffusion.cpp:1379 - get_learned_condition completed, taking 9615 ms[INFO ] stable-diffusion.cpp:1402 - sampling using Euler method[INFO ] stable-diffusion.cpp:1439 - generating image: 1/1 - seed 42[DEBUG] stable-diffusion.cpp:808  - Sample[DEBUG] ggml_extend.hpp:1121 - flux compute buffer size: 1662.42 MB(RAM)Segmentation fault (core dumped)

The reason seems to be a null buffer here:
https://github.com/stduhpf/stable-diffusion.cpp/blob/71eed146cd78ab771761888169cab7d82d90a5bb/imatrix.cpp#L54

update: It's with any model type (q8_0, f16, q5_k tested)

stacktrace:

#0  ggml_backend_buffer_get_type (buffer=0x0) at /build/bm2mq7d2xnqz2yd5h8hkqazrwlp15q5h-source/ggml/src/ggml-backend.cpp:175#1  0x00000000005430f5 in ggml_backend_buffer_is_host (buffer=<optimized out>) at /build/bm2mq7d2xnqz2yd5h8hkqazrwlp15q5h-source/ggml/src/ggml-backend.cpp:158#2  0x0000000000507363 in IMatrixCollector::collect_imatrix (this=this@entry=0x8c3aa0 <imatrix_collector>, t=0x6153370, ask=false, user_data=<optimized out>)    at /build/bm2mq7d2xnqz2yd5h8hkqazrwlp15q5h-source/imatrix.cpp:54#3  0x0000000000436254 in collect_imatrix (t=<optimized out>, ask=<optimized out>, user_data=<optimized out>)    at /build/bm2mq7d2xnqz2yd5h8hkqazrwlp15q5h-source/model.cpp:2127#4  0x000000000048b25b in GGMLRunner::compute(std::function<ggml_cgraph* ()>, int, bool, ggml_tensor**, ggml_context*) (this=this@entry=0xc7ba58, get_graph=...,    n_threads=n_threads@entry=8, free_compute_buffer_immediately=free_compute_buffer_immediately@entry=false, output=output@entry=0x7fffffffb160,    output_ctx=output_ctx@entry=0x0) at /build/bm2mq7d2xnqz2yd5h8hkqazrwlp15q5h-source/ggml_extend.hpp:1263#5  0x000000000048b912 in Flux::FluxRunner::compute (this=this@entry=0xc7ba58, n_threads=n_threads@entry=8, x=<optimized out>, x@entry=0x7ffc9e66c0d0,    timesteps=<optimized out>, timesteps@entry=0x7ffc9e96c790, context=<optimized out>, context@entry=0x7ffc9e4ebbc0, c_concat=<optimized out>, c_concat@entry=0x0,    y=<optimized out>, guidance=<optimized out>, output=0x7fffffffb160, output_ctx=0x0, skip_layers=std::vector of length 0, capacity 0)    at /build/bm2mq7d2xnqz2yd5h8hkqazrwlp15q5h-source/flux.hpp:970#6  0x0000000000494901 in FluxModel::compute (this=this@entry=0xc7ba50, n_threads=8, x=0x7ffc9e66c0d0, timesteps=timesteps@entry=0x7ffc9e96c790, context=0x7ffc9e4ebbc0,    c_concat=0x0, y=0x7ffc9dcea500, guidance=0x7ffc9e96c950, num_video_frames=-1, controls=std::vector of length 0, capacity 0,    control_strength=control_strength@entry=0.899999976, output=0x7fffffffb160, output_ctx=0x0, skip_layers=std::vector of length 0, capacity 0)    at /build/bm2mq7d2xnqz2yd5h8hkqazrwlp15q5h-source/diffusion_model.hpp:178#7  0x0000000000496b6d in StableDiffusionGGML::sample(ggml_context*, ggml_tensor*, ggml_tensor*, SDCondition, SDCondition, ggml_tensor*, float, float, float, float, float, sample_method_t, std::vector<float, std::allocator<float> > const&, int, SDCondition, std::vector<int, std::allocator<int> >, float, float, float, ggml_tensor*)::{lambda(ggml_tensor*, float, int)#1}::operator()(ggml_tensor*, float, int) const (__closure=0x965cb0, input=0x7ffc9e5abf20, sigma=<optimized out>, step=1)...

update2: same for sd_turbo

#0  ggml_backend_buffer_get_type (buffer=0x0) at /build/bm2mq7d2xnqz2yd5h8hkqazrwlp15q5h-source/ggml/src/ggml-backend.cpp:175#1  0x00000000005430f5 in ggml_backend_buffer_is_host (buffer=<optimized out>) at /build/bm2mq7d2xnqz2yd5h8hkqazrwlp15q5h-source/ggml/src/ggml-backend.cpp:158#2  0x0000000000507363 in IMatrixCollector::collect_imatrix (this=this@entry=0x8c3aa0 <imatrix_collector>, t=0x32925d0, ask=false, user_data=<optimized out>)    at /build/bm2mq7d2xnqz2yd5h8hkqazrwlp15q5h-source/imatrix.cpp:54#3  0x0000000000436254 in collect_imatrix (t=<optimized out>, ask=<optimized out>, user_data=<optimized out>)    at /build/bm2mq7d2xnqz2yd5h8hkqazrwlp15q5h-source/model.cpp:2127#4  0x000000000048b25b in GGMLRunner::compute(std::function<ggml_cgraph* ()>, int, bool, ggml_tensor**, ggml_context*) (this=this@entry=0xb58ad8, get_graph=...,    n_threads=n_threads@entry=12, free_compute_buffer_immediately=free_compute_buffer_immediately@entry=false, output=output@entry=0x7fffffffb3b8,    output_ctx=output_ctx@entry=0x0) at /build/bm2mq7d2xnqz2yd5h8hkqazrwlp15q5h-source/ggml_extend.hpp:1263#5  0x000000000048b4c4 in UNetModelRunner::compute (this=this@entry=0xb58ad8, n_threads=n_threads@entry=12, x=<optimized out>, x@entry=0x2716b30, timesteps=<optimized out>,    timesteps@entry=0x2719290, context=<optimized out>, context@entry=0x27170e0, c_concat=<optimized out>, c_concat@entry=0x0, y=<optimized out>,    num_video_frames=<optimized out>, controls=std::vector of length 0, capacity 0, control_strength=<optimized out>, control_strength@entry=0, output=0x7fffffffb3b8,    output_ctx=0x0) at /build/bm2mq7d2xnqz2yd5h8hkqazrwlp15q5h-source/unet.hpp:615#6  0x00000000004964dc in UNetModel::compute (this=0xb58ad0, n_threads=12, x=0x2716b30, timesteps=0x2719290, context=0x27170e0, c_concat=0x0, y=0x0, guidance=0x0,    num_video_frames=-1, controls=std::vector of length 0, capacity 0, control_strength=0, output=0x7fffffffb3b8, output_ctx=0x0,    skip_layers=std::vector of length 0, capacity 0) at /build/bm2mq7d2xnqz2yd5h8hkqazrwlp15q5h-source/diffusion_model.hpp:78#7  0x0000000000485dba in StableDiffusionGGML::is_using_v_parameterization_for_sd2 (this=this@entry=0x9655d0, work_ctx=work_ctx@entry=0xcb00a0, is_inpaint=<optimized out>)    at /build/bm2mq7d2xnqz2yd5h8hkqazrwlp15q5h-source/stable-diffusion.cpp:621#8  0x00000000004e6a9b in StableDiffusionGGML::load_from_file (this=this@entry=0x9655d0, model_path="models/sd_turbo-f16-q8_0.gguf", clip_l_path="", clip_g_path="",    t5xxl_path="", diffusion_model_path="", vae_path=..., control_net_path=..., embeddings_path=..., id_embeddings_path=..., taesd_path=..., vae_tiling_=<optimized out>,    wtype=<optimized out>, schedule=<optimized out>, clip_on_cpu=<optimized out>, control_net_cpu=<optimized out>, vae_on_cpu=<optimized out>,    diffusion_flash_attn=<optimized out>) at /build/bm2mq7d2xnqz2yd5h8hkqazrwlp15q5h-source/stable-diffusion.cpp:527#9  0x00000000004760ad in new_sd_ctx (model_path_c_str=<optimized out>, clip_l_path_c_str=0x7fffffffbb58 "", clip_g_path_c_str=0x7fffffffbb78 "",    t5xxl_path_c_str=0x7fffffffbb98 "", diffusion_model_path_c_str=0x7fffffffbbb8 "", vae_path_c_str=0x7fffffffbbd8 "", taesd_path_c_str=0x7fffffffbbf8 "",    control_net_path_c_str=0x7fffffffbc38 "", lora_model_dir_c_str=0x7fffffffbcc0 "", embed_dir_c_str=0x7fffffffbc58 "", id_embed_dir_c_str=0x7fffffffbc78 "",    vae_decode_only=true, vae_tiling=false, free_params_immediately=true, n_threads=12, wtype=SD_TYPE_COUNT, rng_type=CUDA_RNG, s=KARRAS, keep_clip_on_cpu=false,    keep_control_net_cpu=false, keep_vae_on_cpu=false, diffusion_flash_attn=false) at /build/bm2mq7d2xnqz2yd5h8hkqazrwlp15q5h-source/stable-diffusion.cpp:1159#10 0x0000000000420451 in main (argc=<optimized out>, argv=<optimized out>) at /build/bm2mq7d2xnqz2yd5h8hkqazrwlp15q5h-source/examples/cli/main.cpp:926
stduhpf reacted with eyes emoji

@stduhpf
Copy link
ContributorAuthor

@Green-Sky It should be fixed now.

Green-Sky reacted with eyes emoji

@Green-Sky
Copy link
Contributor

Green-Sky commentedApr 1, 2025
edited
Loading

@stduhpf works thanks. running some at incredible 541.82s/it.

But looks like cpu inference is broken for that model..., so imat might be of questionable quality.
(probably not related to this pr)

update: I did a big oopsy and forgot to add--cfg-scale 1 --sampling-method euler

stduhpf reacted with laugh emoji

@Green-Sky
Copy link
Contributor

I ran the freshly f16 generated imatrix file throughggml-org/llama.cpp#12718 :

$ result/bin/llama-imatrix --show-statistics --in-file ../stable-diffusion.cpp/flux.1-lite-8B-f16-768x768-g3.5-s28-b1-anime1.datComputing statistics for ../stable-diffusion.cpp/flux.1-lite-8B-f16-768x768-g3.5-s28-b1-anime1.dat (444 tensors) Layer               Tensor          μ(Importance Scores)   Contribution================================================================================    -                            wo          83.69              5.9814 %    -                             0          82.23              5.8774 %    -                          proj          64.58              4.6155 %    -                             2          61.22              4.3756 %    -                          proj          61.04              4.3627 %    -                          proj          45.07              3.2214 %    -                            wo          35.81              2.5595 %    -                       linear2          32.47              2.3209 %    -                          proj          31.14              2.2260 %    -                          proj          30.86              2.2058 %    -                      in_layer          28.69              2.0506 %    -                          proj          28.63              2.0465 %    -                        img_in          21.10              1.5079 %    -                          proj          20.19              1.4431 %    -                             2          16.62              1.1883 %    -                             0          16.57              1.1847 %    -                             0          15.83              1.1315 %    -                          proj          14.21              1.0154 %    -                             0          14.13              1.0097 %    -                      in_layer          14.00              1.0007 %    -                      in_layer          14.00              1.0006 %    -                          proj          13.91              0.9942 %    -                             0          13.52              0.9663 %    -                           qkv          13.28              0.9490 %    -                            wo          11.81              0.8442 %    -                           qkv          11.78              0.8419 %    -                           qkv          11.66              0.8331 %    -                           qkv          10.23              0.7314 %    -                       linear2           9.72              0.6950 %    -                       linear1           9.15              0.6540 %    -                          proj           8.44              0.6034 %    -                             2           8.44              0.6033 %    -                       linear1           7.96              0.5692 %    -                           qkv           7.81              0.5579 %    -                             0           7.54              0.5388 %    -                       linear2           7.40              0.5293 %    -                       linear2           7.09              0.5070 %    -                          proj           7.08              0.5064 %    -                           fc1           6.90              0.4929 %    -                           fc1           6.43              0.4595 %    -                           qkv           5.80              0.4142 %    -                             0           5.72              0.4089 %    -                          proj           5.60              0.4001 %    -                           qkv           5.51              0.3935 %    -                             0           5.42              0.3871 %    -                       linear2           5.24              0.3742 %    -                       linear1           5.21              0.3722 %    -                       linear1           5.09              0.3641 %    -                          proj           5.09              0.3639 %    -                           qkv           4.79              0.3427 %    -                           fc1           4.68              0.3344 %    -                           qkv           4.67              0.3335 %    -                           fc1           4.62              0.3305 %    -                       linear2           4.62              0.3302 %    -                       linear1           4.47              0.3196 %    -                       linear1           4.46              0.3187 %    -                           fc1           4.46              0.3186 %    -                       linear1           4.44              0.3172 %    -                           fc1           4.39              0.3140 %    -                           qkv           4.37              0.3124 %    -                       linear1           4.35              0.3110 %    -                       linear1           4.31              0.3082 %    -                       linear1           4.29              0.3067 %    -                       linear1           4.29              0.3063 %    -                           fc1           4.17              0.2981 %    -                       linear1           4.16              0.2974 %    -                       linear1           4.15              0.2964 %    -                       linear1           4.12              0.2942 %    -                        linear           4.10              0.2931 %    -                       linear1           4.09              0.2927 %    -                       linear1           4.06              0.2904 %    -                       linear2           4.00              0.2857 %    -                           fc1           3.97              0.2840 %    -                           qkv           3.94              0.2816 %    -                       linear2           3.93              0.2805 %    -                       linear2           3.76              0.2684 %    -                       linear1           3.75              0.2677 %    -                           fc1           3.65              0.2608 %    -                           qkv           3.63              0.2596 %    -                       linear1           3.61              0.2581 %    -                       linear1           3.61              0.2577 %    -                       linear2           3.51              0.2512 %    -                           fc1           3.46              0.2470 %    -                       linear2           3.45              0.2464 %    -                           fc1           3.44              0.2458 %    -                       linear1           3.42              0.2445 %    -                       linear1           3.40              0.2427 %    -                       linear2           3.33              0.2383 %    -                           fc1           3.32              0.2372 %    -                       linear1           3.32              0.2371 %    -                             2           3.30              0.2361 %    -                       linear1           3.30              0.2361 %    -                       linear1           3.30              0.2359 %    -                       linear1           3.27              0.2338 %    -                          proj           3.26              0.2333 %    -                       linear1           3.24              0.2318 %    -                       linear2           3.19              0.2280 %    -                       linear2           3.15              0.2251 %    -                       linear2           3.14              0.2244 %    -                       linear2           3.13              0.2238 %    -                       linear2           3.12              0.2229 %    -                       linear2           3.11              0.2222 %    -                       linear2           3.10              0.2214 %    -                       linear1           3.01              0.2154 %    -                       linear1           2.99              0.2140 %    -                       linear1           2.99              0.2137 %    -                             0           2.98              0.2132 %    -                       linear2           2.96              0.2118 %    -                           qkv           2.92              0.2089 %    -                       linear2           2.89              0.2062 %    -                       linear1           2.88              0.2062 %    -                       linear1           2.85              0.2037 %    -                       linear2           2.84              0.2030 %    -                            wo           2.83              0.2022 %    -                       linear1           2.82              0.2017 %    -                       linear2           2.80              0.2001 %    -                       linear1           2.76              0.1974 %    -                       linear1           2.74              0.1961 %    -                       linear1           2.74              0.1960 %    -                       linear2           2.72              0.1942 %    -                       linear2           2.71              0.1937 %    -                       linear1           2.68              0.1918 %    -                       linear1           2.66              0.1901 %    -                       linear2           2.65              0.1898 %    -                           qkv           2.62              0.1875 %    -                       linear2           2.61              0.1863 %    -                       linear2           2.59              0.1851 %    -                        v_proj           2.57              0.1836 %    -                        k_proj           2.57              0.1836 %    -                        q_proj           2.57              0.1836 %    -                       linear2           2.56              0.1829 %    -                        q_proj           2.47              0.1765 %    -                        v_proj           2.47              0.1765 %    -                        k_proj           2.47              0.1765 %    -                             0           2.46              0.1760 %    -                       linear1           2.43              0.1736 %    -                           qkv           2.43              0.1734 %    -                           qkv           2.34              0.1674 %    -                             0           2.32              0.1661 %    -                        v_proj           2.32              0.1658 %    -                        k_proj           2.32              0.1658 %    -                        q_proj           2.32              0.1658 %    -                       linear2           2.30              0.1644 %    -                            wo           2.30              0.1642 %    -                          proj           2.26              0.1613 %    -                       linear2           2.21              0.1580 %    -                             o           2.12              0.1513 %    -                        k_proj           2.07              0.1478 %    -                        v_proj           2.07              0.1478 %    -                        q_proj           2.07              0.1478 %    -                       linear2           2.07              0.1478 %    -                        v_proj           2.07              0.1477 %    -                        q_proj           2.07              0.1477 %    -                        k_proj           2.07              0.1477 %    -                       linear2           2.05              0.1468 %    -                        v_proj           2.02              0.1443 %    -                        k_proj           2.02              0.1443 %    -                        q_proj           2.02              0.1443 %    -                       linear2           1.96              0.1399 %    -                             o           1.95              0.1397 %    -                       linear2           1.95              0.1394 %    -                             o           1.87              0.1333 %    -                       linear2           1.82              0.1303 %    -                             2           1.80              0.1288 %    -                        v_proj           1.79              0.1279 %    -                        q_proj           1.79              0.1279 %    -                        k_proj           1.79              0.1279 %    -                       linear2           1.77              0.1266 %    -                       linear2           1.74              0.1247 %    -                            wo           1.70              0.1219 %    -                           fc2           1.64              0.1175 %    -                             2           1.62              0.1160 %    -                             2           1.53              0.1092 %    -                             2           1.49              0.1064 %    -                        q_proj           1.47              0.1051 %    -                        v_proj           1.47              0.1051 %    -                        k_proj           1.47              0.1051 %    -                             0           1.46              0.1043 %    -                        v_proj           1.37              0.0976 %    -                        q_proj           1.37              0.0976 %    -                        k_proj           1.37              0.0976 %    -                        q_proj           1.28              0.0916 %    -                        k_proj           1.28              0.0916 %    -                        v_proj           1.28              0.0916 %    -                             0           1.23              0.0881 %    -                             2           1.19              0.0850 %    -                             2           1.16              0.0832 %    -                             o           1.03              0.0735 %    -                            wo           1.00              0.0711 %    -                        k_proj           0.97              0.0691 %    -                        v_proj           0.97              0.0691 %    -                        q_proj           0.97              0.0691 %    -                        k_proj           0.97              0.0691 %    -                        v_proj           0.97              0.0691 %    -                        q_proj           0.97              0.0691 %    -                             0           0.76              0.0546 %    -                             o           0.74              0.0529 %    -                             o           0.70              0.0503 %    -                            wo           0.70              0.0500 %    -                             2           0.65              0.0464 %    -                             2           0.65              0.0463 %    -                          proj           0.62              0.0445 %    -                             2           0.61              0.0436 %    -                             0           0.60              0.0428 %    -                             o           0.55              0.0395 %    -                             2           0.50              0.0356 %    -                             o           0.48              0.0344 %    -                             o           0.48              0.0342 %    -                            wo           0.45              0.0324 %    -                             2           0.41              0.0293 %    -                            wo           0.40              0.0285 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                             1           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                           lin           0.39              0.0280 %    -                            wo           0.39              0.0276 %    -                            wo           0.38              0.0269 %    -                             o           0.37              0.0267 %    -                             o           0.36              0.0261 %    -                             o           0.36              0.0258 %    -                             0           0.33              0.0233 %    -                             o           0.30              0.0218 %    -                            wo           0.30              0.0217 %    -                     out_layer           0.30              0.0214 %    -                        txt_in           0.30              0.0212 %    -                             o           0.25              0.0182 %    -                            wo           0.25              0.0178 %    -                             2           0.24              0.0172 %    -                      out_proj           0.24              0.0169 %    -                             o           0.21              0.0153 %    -                             o           0.21              0.0150 %    -                             o           0.21              0.0148 %    -                             o           0.20              0.0145 %    -                             o           0.20              0.0140 %    -                     out_layer           0.19              0.0133 %    -                             o           0.18              0.0129 %    -                            wo           0.18              0.0128 %    -                            wo           0.18              0.0127 %    -                             o           0.17              0.0119 %    -                      out_proj           0.16              0.0113 %    -                             o           0.13              0.0089 %    -                             o           0.12              0.0086 %    -                            wo           0.11              0.0079 %    -                            wo           0.10              0.0074 %    -                            wo           0.10              0.0070 %    -                            wo           0.08              0.0058 %    -                            wo           0.07              0.0050 %    -                          wi_1           0.07              0.0049 %    -                          wi_0           0.07              0.0049 %    -                             o           0.06              0.0046 %    -                      out_proj           0.06              0.0044 %    -                      out_proj           0.06              0.0044 %    -                            wo           0.06              0.0042 %    -                     out_layer           0.06              0.0040 %    -                            wo           0.05              0.0037 %    -                             v           0.05              0.0035 %    -                             k           0.05              0.0035 %    -                             q           0.05              0.0035 %    -                            wo           0.05              0.0033 %    -                      out_proj           0.05              0.0032 %    -                      out_proj           0.04              0.0032 %    -                          wi_0           0.04              0.0030 %    -                          wi_1           0.04              0.0030 %    -                             q           0.04              0.0029 %    -                             v           0.04              0.0029 %    -                             k           0.04              0.0029 %    -                             k           0.04              0.0028 %    -                             v           0.04              0.0028 %    -                             q           0.04              0.0028 %    -                      out_proj           0.04              0.0026 %    -                          wi_1           0.03              0.0023 %    -                          wi_0           0.03              0.0023 %    -                           fc2           0.03              0.0022 %    -                          wi_1           0.03              0.0020 %    -                          wi_0           0.03              0.0020 %    -                             v           0.03              0.0020 %    -                             k           0.03              0.0020 %    -                             q           0.03              0.0020 %    -                          wi_1           0.03              0.0020 %    -                          wi_0           0.03              0.0020 %    -                             q           0.03              0.0019 %    -                             v           0.03              0.0019 %    -                             k           0.03              0.0019 %    -                             k           0.03              0.0018 %    -                             q           0.03              0.0018 %    -                             v           0.03              0.0018 %    -                          wi_0           0.03              0.0018 %    -                          wi_1           0.03              0.0018 %    -                             q           0.02              0.0018 %    -                             k           0.02              0.0018 %    -                             v           0.02              0.0018 %    -                             v           0.02              0.0017 %    -                             k           0.02              0.0017 %    -                             q           0.02              0.0017 %    -                           fc2           0.02              0.0016 %    -                      out_proj           0.02              0.0016 %    -                             q           0.02              0.0015 %    -                             k           0.02              0.0015 %    -                             v           0.02              0.0015 %    -                          wi_1           0.02              0.0014 %    -                          wi_0           0.02              0.0014 %    -                          wi_1           0.02              0.0012 %    -                          wi_0           0.02              0.0012 %    -                           fc2           0.02              0.0011 %    -                           fc2           0.02              0.0011 %    -                             k           0.02              0.0011 %    -                             q           0.02              0.0011 %    -                             v           0.02              0.0011 %    -                      out_proj           0.02              0.0011 %    -                          wi_0           0.02              0.0011 %    -                          wi_1           0.02              0.0011 %    -                          wi_0           0.01              0.0011 %    -                          wi_1           0.01              0.0011 %    -                          wi_0           0.01              0.0010 %    -                          wi_1           0.01              0.0010 %    -                          wi_0           0.01              0.0010 %    -                          wi_1           0.01              0.0010 %    -                           fc2           0.01              0.0010 %    -                             k           0.01              0.0010 %    -                             q           0.01              0.0010 %    -                             v           0.01              0.0010 %    -                             k           0.01              0.0009 %    -                             q           0.01              0.0009 %    -                             v           0.01              0.0009 %    -                             k           0.01              0.0009 %    -                             v           0.01              0.0009 %    -                             q           0.01              0.0009 %    -                          wi_0           0.01              0.0009 %    -                          wi_1           0.01              0.0009 %    -                             v           0.01              0.0009 %    -                             k           0.01              0.0009 %    -                             q           0.01              0.0009 %    -                             q           0.01              0.0009 %    -                             k           0.01              0.0009 %    -                             v           0.01              0.0009 %    -                             v           0.01              0.0009 %    -                             k           0.01              0.0009 %    -                             q           0.01              0.0009 %    -                          wi_0           0.01              0.0009 %    -                          wi_1           0.01              0.0009 %    -                             k           0.01              0.0009 %    -                             q           0.01              0.0009 %    -                             v           0.01              0.0009 %    -                          wi_0           0.01              0.0009 %    -                          wi_1           0.01              0.0009 %    -                           fc2           0.01              0.0008 %    -                          wi_0           0.01              0.0008 %    -                          wi_1           0.01              0.0008 %    -                      out_proj           0.01              0.0007 %    -                             v           0.01              0.0007 %    -                             q           0.01              0.0007 %    -                             k           0.01              0.0007 %    -                      out_proj           0.01              0.0007 %    -                             q           0.01              0.0007 %    -                             k           0.01              0.0007 %    -                             v           0.01              0.0007 %    -                           fc2           0.01              0.0007 %    -                          wi_0           0.01              0.0006 %    -                          wi_1           0.01              0.0006 %    -                             k           0.01              0.0006 %    -                             v           0.01              0.0006 %    -                             q           0.01              0.0006 %    -                             v           0.01              0.0006 %    -                             q           0.01              0.0006 %    -                             k           0.01              0.0006 %    -                          wi_0           0.01              0.0006 %    -                          wi_1           0.01              0.0006 %    -                             k           0.01              0.0006 %    -                             q           0.01              0.0006 %    -                             v           0.01              0.0006 %    -                           fc2           0.01              0.0006 %    -                          wi_0           0.01              0.0006 %    -                          wi_1           0.01              0.0006 %    -                             v           0.01              0.0006 %    -                             q           0.01              0.0006 %    -                             k           0.01              0.0006 %    -                          wi_0           0.01              0.0006 %    -                          wi_1           0.01              0.0006 %    -                      out_proj           0.01              0.0006 %    -                          wi_0           0.01              0.0005 %    -                          wi_1           0.01              0.0005 %    -                          wi_0           0.01              0.0005 %    -                          wi_1           0.01              0.0005 %    -                          wi_1           0.01              0.0005 %    -                          wi_0           0.01              0.0005 %    -                          wi_0           0.01              0.0005 %    -                          wi_1           0.01              0.0005 %    -                             q           0.01              0.0005 %    -                             v           0.01              0.0005 %    -                             k           0.01              0.0005 %    -                           fc2           0.01              0.0004 %    -                           fc2           0.00              0.0003 %    -                           fc2           0.00              0.0001 %

somewhat unreadable.

stduhpf reacted with eyes emoji

@idostyle
Copy link
Contributor

llama-imatrix --show-statistics assumes that layer naming follows "blk.%d" instead of thesingle_blocks.%d/double_blocks.%d naming in flux and flux-lite. Would have to adjustprocess_tensor_name in that PR accordingly.

Green-Sky and EAddario reacted with thumbs up emoji

@EAddario
Copy link

EAddario commentedApr 3, 2025
edited
Loading

Haven't had much of an opportunity to play with T2I models yet but if someone can point me to a sample model and imatrix file, happy to make the necessary changes.

@SA-j00u
Copy link

SA-j00u commentedJul 15, 2025
edited
Loading

and maybe you can release this imatrix-es for different well know models
is some repo
for users that can't do it on full model
like
model_PartOfSHA256_steps.dat
without wasting of terabytes

@Green-Sky
Copy link
Contributor

Green-Sky commentedJul 15, 2025
edited
Loading

and maybe you can release this imatrix-es for different well know models is some repo for users that can't do it on full model like model_PartOfSHA256_steps.dat without wasting of terabytes

Before we do that,ggml-org/llama.cpp#9400 looks to be merged very soon, so might be worth waiting for that.

edit: also, would be nice if someone could do the imats, that actually has enough vram (:

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

1 more reviewer

@Green-SkyGreen-SkyGreen-Sky left review comments

Reviewers whose approvals may not affect merge requirements

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

5 participants

@stduhpf@Green-Sky@idostyle@EAddario@SA-j00u

[8]ページ先頭

©2009-2025 Movatter.jp