sd.exe --diffusion-model ..\ComfyUI\models\unet\LongCat-Image-Q8_0.gguf --vae ..\ComfyUI\models\vae\flux\ae.safetensors --cfg-scale 4.0 --sampling-method euler -v --clip-on-cpu -p "A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night. The city lights are a chaotic blur of neon orange and cool blue, reflecting on the wet asphalt. The scene evokes a sense of being a single component in a vast machine. Superimposed over the image in a sleek, modern, slightly glitched font is the philosophical quote: \"THE CITY IS A CIRCUIT BOARD, AND I AM A LONG CAT.\" -- moody, atmospheric, profound, dark academic" --preview proj --steps 20 --qwen2vl ..\ComfyUI\models\clip\Qwen2.5-VL-7B-Instruct.Q4_K_M.gguf --diffusion-fa --color -W 1024 -H 1024

Test models (converted to bfl format) can be found there:

Inference for models in diffusers format seem to be still broken

Copy link

Contributor

wbruna commentedDec 5, 2025

Thatdoes look a bit like a circuit board...

Copy link

ContributorAuthor

stduhpf commentedDec 6, 2025

TODO for when image generation works

Copy link

ContributorAuthor

stduhpf commentedDec 6, 2025

I can't figure out what I'm doing wrong, I think it is supposed to be working just like Flux1, but with different PE indices and Qwen Text Encoder.... Maybe I'm missing an important detail but I can't find it.

Copy link

ContributorAuthor

stduhpf commentedDec 7, 2025•
edited
Loading

I tried using my SplitAttention thing on a Flux model converted to diffusers format, and

I guess I found what is not working. I will try converting LongCat to Flux format and see if it works.

Copy link

ContributorAuthor

stduhpf commentedDec 8, 2025•
edited
Loading

I think I got it?

With the padding fixed, but with diffusers format:

Copy link

ContributorAuthor

stduhpf commentedDec 8, 2025•
edited
Loading

With the character-level tokenization trick:

Might need testing to make sure the current implementation supports languages that don't use the latin alphabet. Also for now it's applied to text wrapped in single quotes ( ') only.

stduhpf marked this pull request as ready for review

December 8, 2025 01:29

Copy link

ContributorAuthor

stduhpf commentedDec 8, 2025

Oh no, why are there so many conflicts now?

Copy link

ContributorAuthor

stduhpf commentedDec 8, 2025•
edited
Loading

Using' as a quote delimiter was a bad idea because it's the same symbol used for apostrophes. I will change it to detect" instead

Copy link

ContributorAuthor

stduhpf commentedDec 8, 2025•
edited
Loading

Somehow not fully working yet, but it's definitely able to see it's supposed to be a cat holding a sign, maybe because of the vision model
sd.exe --diffusion-model ..\ComfyUI\models\unet\longcat_edit_bfl_format-Q8_0.gguf --vae ..\ComfyUI\models\vae\flux\ae.safetensors --cfg-scale 4.5 --sampling-method euler -v --offload-to-cpu --preview proj --steps 50 --vae-tile-size 128 --qwen2vl ..\ComfyUI\models\clip\Qwen2.5-VL-7B-Instruct.Q4_K_M.gguf --color --seed 0 -r .\assets\flux\flux1-dev-q8_0.png --llm_vision ..\ComfyUI\models\clip_vision\Qwen2.5-VL-7B-Instruct.mmproj-f16.gguf -p "Change the text to say \"I'm a long one\""

ref	out

(Also I made the change so it now needs double quotes around literal text)

Copy link

ContributorAuthor

stduhpf commentedDec 8, 2025•
edited
Loading

Somehow couldn't get it to remove the original text, but there it goes

stduhpf changed the title~~Wip: Longcat-Image support~~Longcat-Image / Longcat-Image-Edit support

Dec 8, 2025

stduhpf changed the title~~Longcat-Image / Longcat-Image-Edit support~~feat: Longcat-Image / Longcat-Image-Edit support

Dec 8, 2025

leejet reviewed

Dec 9, 2025

View reviewed changes

stable-diffusion.cpp OutdatedShow resolvedHide resolved

vae.hpp OutdatedShow resolvedHide resolved

Copy link

Rocky-Lee-001 commentedDec 10, 2025

May I ask which comfyui node is used to load this GGUF model?

Copy link

ContributorAuthor

stduhpf commentedDec 12, 2025

Now supports UTF-8 encoding properly for the quoted text. (also quote characters are no longer excluded from the prompt after being parsed, seems to help a bit, especially with longer text.)

stduhpf added12 commits

December 12, 2025 02:55

Support LongCat Image model

4249294

temp fix cuda error on quant concat for splitlinear

52ef50a

pre-patchify

7ba7feb

longcat rope ids

1241323

Fix diffusers_style detection

203d053

Flux: simplify when patch_size is 1

37c5e3e

correct rope offset for image tokens

a907fe2

stuff

Fix token length

fc8d85e

Split quoted text into character-level tokens

9f225e4

remove debug logs

support longcat-image-edit

c044a40

Fix base rope offset for ref images

Split quotes by utf8 characters rather than individual char

fd032bc

patch size consistent with Flux1

196bb89

stduhpf force-pushed thelongcat branch fromc31128b to196bb89Compare

December 12, 2025 02:10

Copy link

ContributorAuthor

stduhpf commentedDec 12, 2025

May I ask which comfyui node is used to load this GGUF model?

@Rocky-Lee-001 I don't think LongCat-Image is natively supported by ComfyUI yet. You could givehttps://github.com/sooxt98/comfyui_longcat_image a try, maybe it works well with the GGUF node for comfyUI?

leejet reviewed

Dec 13, 2025

View reviewed changes

ggml_extend.hpp

		}
		};

		classSplitLinear :publicLinear {

Copy link

Owner

leejetDec 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

If this part has no effect, I think we can remove the related code. In fact, even if it does have some effect, additional work is required to handle it when LoRA uses QKV format, so I wouldn’t really recommend this approach.

Copy link

ContributorAuthor

stduhpfDec 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thi is used when loading Flux diffusion models with diffusers naming convention, which has the qkv matrices split as individual linear layers rather than one big linear layer. For some reason it is not quite working, not sure why.

Copy link

Owner

leejet commentedDec 13, 2025

I’m not sure whether I did something wrong on my end, but I got a strange image.

.\bin\Release\sd-cli.exe --diffusion-model  ..\models\longcat_bfl_format-Q4_K_M.gguf --vae ..\..\ComfyUI\models\vae\ae.sft  --llm ..\..\ComfyUI\models\text_encoders\Qwen2.5-VL-7B-Instruct-Q8_0.gguf -p 'a lovely cat' --cfg-scale 5.0 -v --offload-to-cpu --diffusion-fa

Copy link

ContributorAuthor

stduhpf commentedDec 13, 2025•
edited
Loading

@leejet that's strange. I can reproduce it with the same prompt though (even with Q8_0 model), but I haven't gotten anything like this in my earlier testing. Maybe There's a linear layer that could use scaling?

Does not seem related to seed.

It's a combination of short prompts + low resolution that seems to cause it.

Copy link

Owner

leejet commentedDec 15, 2025

I think there are still some differences between the implementation in this PR and the official pipeline—for example, the way token padding is handled and whether masks are used. I suspect these differences might be what caused the strange blocky artifacts in the generated images. I tried to fix it, but didn’t succeed; now it’s producing completely black images.

https://github.com/leejet/stable-diffusion.cpp/tree/longcat-fix

Copy link

Yurchikian commentedDec 16, 2025•
edited
Loading

I've tried this code and found out that short prompts lead to such a blocky modern-art generations, while long descriptive prompts produce higher quality results

For example~/projects/longcat.cpp/build/bin/sd --diffusion-model ~/Drive/AI/ComfyUI/models/unet/longcat_bfl_format-Q4_K_M.gguf --diffusion-fa --vae ~/Drive/AI/ComfyUI/models/vae/ae.sft --sampling-method euler --cfg-scale 3.0 --steps 20 --llm ~/Drive/AI/ComfyUI/models/text_encoders/Qwen2.5-VL-7B-Instruct-Q3_K_M.gguf --color -W 512 -H 512 --seed 128 -o output_$(date +%Y-%m-%d_%H-%M-%S).png --vae-tiling -p "Orange cat":

Repeating "Orange Cat" several times:

~/projects/longcat.cpp/build/bin/sd --diffusion-model ~/Drive/AI/ComfyUI/models/unet/longcat_bfl_format-Q4_K_M.gguf --diffusion-fa --vae ~/Drive/AI/ComfyUI/models/vae/ae.sft --sampling-method euler --cfg-scale 3.0 --steps 20 --llm ~/Drive/AI/ComfyUI/models/text_encoders/Qwen2.5-VL-7B-Instruct-Q3_K_M.gguf --color -W 512 -H 512 --seed 128 -o output_$(date +%Y-%m-%d_%H-%M-%S).png --vae-tiling -p "This is a portrait photograph of an Orange Cat sitting on the table by the window. We can see the room interior, fancy 50s style Art-Deco ornaments on the wallpaper. The weather outside is fine, the sun is shining. Wintertime snow on the ground, people playing with snowballs. The cat is large and shaped like a sphere. Whiskers are long and fangs are sharp. Phone snapshot, photograph."

Increasing step count would reduce artifacts even more. However face seems bit blocky anyway, not sure if it would be true for original workflow, can not run non-gguf model on my GPU. Also increasing resolution would not help with those artifacts.

Hope my observations might be helpful

P.S. I've also checked out@leejetlongcat-fix branch. It generated black images indeed, until I added--clip-on-cpu, then i could get image with just "Orange Cat" prompt!

Labels

None yet

Movatterモバイル変換

feat: Longcat-Image / Longcat-Image-Edit support#1053

Are you sure you want to change the base?

feat: Longcat-Image / Longcat-Image-Edit support#1053

Conversation

stduhpf commentedDec 5, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

wbruna commentedDec 5, 2025

Uh oh!

stduhpf commentedDec 6, 2025

Uh oh!

stduhpf commentedDec 6, 2025

Uh oh!

stduhpf commentedDec 7, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

stduhpf commentedDec 8, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

stduhpf commentedDec 8, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

stduhpf commentedDec 8, 2025

Uh oh!

stduhpf commentedDec 8, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

stduhpf commentedDec 8, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

stduhpf commentedDec 8, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Rocky-Lee-001 commentedDec 10, 2025

Uh oh!

stduhpf commentedDec 12, 2025

Uh oh!

stduhpf commentedDec 12, 2025

Uh oh!

leejetDec 13, 2025

Choose a reason for hiding this comment

Uh oh!

stduhpfDec 13, 2025

Choose a reason for hiding this comment

Uh oh!

leejet commentedDec 13, 2025

Uh oh!

stduhpf commentedDec 13, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

leejet commentedDec 15, 2025

Uh oh!

Yurchikian commentedDec 16, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

stduhpf commentedDec 5, 2025•
edited
Loading

stduhpf commentedDec 7, 2025•
edited
Loading

stduhpf commentedDec 8, 2025•
edited
Loading

stduhpf commentedDec 8, 2025•
edited
Loading

stduhpf commentedDec 8, 2025•
edited
Loading

stduhpf commentedDec 8, 2025•
edited
Loading

stduhpf commentedDec 8, 2025•
edited
Loading

stduhpf commentedDec 13, 2025•
edited
Loading

Yurchikian commentedDec 16, 2025•
edited
Loading