- Notifications
You must be signed in to change notification settings - Fork474
feat: Longcat-Image / Longcat-Image-Edit support#1053
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:master
Are you sure you want to change the base?
Conversation
wbruna commentedDec 5, 2025
Thatdoes look a bit like a circuit board... |
stduhpf commentedDec 6, 2025
stduhpf commentedDec 6, 2025
I can't figure out what I'm doing wrong, I think it is supposed to be working just like Flux1, but with different PE indices and Qwen Text Encoder.... Maybe I'm missing an important detail but I can't find it. |
stduhpf commentedDec 7, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
stduhpf commentedDec 8, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
stduhpf commentedDec 8, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
stduhpf commentedDec 8, 2025
Oh no, why are there so many conflicts now? |
stduhpf commentedDec 8, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Using |
stduhpf commentedDec 8, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
stduhpf commentedDec 8, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Rocky-Lee-001 commentedDec 10, 2025
May I ask which comfyui node is used to load this GGUF model? |
stduhpf commentedDec 12, 2025
Now supports UTF-8 encoding properly for the quoted text. (also quote characters are no longer excluded from the prompt after being parsed, seems to help a bit, especially with longer text.) |
remove debug logs
Fix base rope offset for ref images
stduhpf commentedDec 12, 2025
@Rocky-Lee-001 I don't think LongCat-Image is natively supported by ComfyUI yet. You could givehttps://github.com/sooxt98/comfyui_longcat_image a try, maybe it works well with the GGUF node for comfyUI? |
| } | ||
| }; | ||
| classSplitLinear :publicLinear { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
If this part has no effect, I think we can remove the related code. In fact, even if it does have some effect, additional work is required to handle it when LoRA uses QKV format, so I wouldn’t really recommend this approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Thi is used when loading Flux diffusion models with diffusers naming convention, which has the qkv matrices split as individual linear layers rather than one big linear layer. For some reason it is not quite working, not sure why.
leejet commentedDec 13, 2025
I’m not sure whether I did something wrong on my end, but I got a strange image. ![]() |
stduhpf commentedDec 13, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
@leejet that's strange. I can reproduce it with the same prompt though (even with Q8_0 model), but I haven't gotten anything like this in my earlier testing. Maybe There's a linear layer that could use scaling? Does not seem related to seed. It's a combination of short prompts + low resolution that seems to cause it. |
leejet commentedDec 15, 2025
I think there are still some differences between the implementation in this PR and the official pipeline—for example, the way token padding is handled and whether masks are used. I suspect these differences might be what caused the strange blocky artifacts in the generated images. I tried to fix it, but didn’t succeed; now it’s producing completely black images. https://github.com/leejet/stable-diffusion.cpp/tree/longcat-fix |
Yurchikian commentedDec 16, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
I've tried this code and found out that short prompts lead to such a blocky modern-art generations, while long descriptive prompts produce higher quality results For example Repeating "Orange Cat" several times:
Increasing step count would reduce artifacts even more. However face seems bit blocky anyway, not sure if it would be true for original workflow, can not run non-gguf model on my GPU. Also increasing resolution would not help with those artifacts. Hope my observations might be helpful P.S. I've also checked out@leejet |













Uh oh!
There was an error while loading.Please reload this page.
for#1052
sd.exe --diffusion-model ..\ComfyUI\models\unet\LongCat-Image-Q8_0.gguf --vae ..\ComfyUI\models\vae\flux\ae.safetensors --cfg-scale 4.0 --sampling-method euler -v --clip-on-cpu -p "A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night. The city lights are a chaotic blur of neon orange and cool blue, reflecting on the wet asphalt. The scene evokes a sense of being a single component in a vast machine. Superimposed over the image in a sleek, modern, slightly glitched font is the philosophical quote: \"THE CITY IS A CIRCUIT BOARD, AND I AM A LONG CAT.\" -- moody, atmospheric, profound, dark academic" --preview proj --steps 20 --qwen2vl ..\ComfyUI\models\clip\Qwen2.5-VL-7B-Instruct.Q4_K_M.gguf --diffusion-fa --color -W 1024 -H 1024Test models (converted to bfl format) can be found there:
Inference for models in diffusers format seem to be still broken