Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

feat: Longcat-Image / Longcat-Image-Edit support#1053

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
stduhpf wants to merge12 commits intoleejet:master
base:master
Choose a base branch
Loading
fromstduhpf:longcat

Conversation

@stduhpf
Copy link
Contributor

@stduhpfstduhpf commentedDec 5, 2025
edited
Loading

for#1052

sd.exe --diffusion-model ..\ComfyUI\models\unet\LongCat-Image-Q8_0.gguf --vae ..\ComfyUI\models\vae\flux\ae.safetensors --cfg-scale 4.0 --sampling-method euler -v --clip-on-cpu -p "A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night. The city lights are a chaotic blur of neon orange and cool blue, reflecting on the wet asphalt. The scene evokes a sense of being a single component in a vast machine. Superimposed over the image in a sleek, modern, slightly glitched font is the philosophical quote: \"THE CITY IS A CIRCUIT BOARD, AND I AM A LONG CAT.\" -- moody, atmospheric, profound, dark academic" --preview proj --steps 20 --qwen2vl ..\ComfyUI\models\clip\Qwen2.5-VL-7B-Instruct.Q4_K_M.gguf --diffusion-fa --color -W 1024 -H 1024

output

Test models (converted to bfl format) can be found there:

Inference for models in diffusers format seem to be still broken

Green-Sky and JohnLoveJoy reacted with thumbs up emojiGreen-Sky, fszontagh, wbruna, and iwr-redmond reacted with laugh emojiwbruna reacted with hooray emojiJohnClaw reacted with heart emojifszontagh reacted with eyes emoji
@wbruna
Copy link
Contributor

Thatdoes look a bit like a circuit board...

stduhpf reacted with laugh emoji

@stduhpf
Copy link
ContributorAuthor

TODO for when image generation works
image

JohnClaw and Green-Sky reacted with thumbs up emoji

@stduhpf
Copy link
ContributorAuthor

I can't figure out what I'm doing wrong, I think it is supposed to be working just like Flux1, but with different PE indices and Qwen Text Encoder.... Maybe I'm missing an important detail but I can't find it.

JohnClaw reacted with eyes emoji

@stduhpf
Copy link
ContributorAuthor

stduhpf commentedDec 7, 2025
edited
Loading

I tried using my SplitAttention thing on a Flux model converted to diffusers format, and
output
I guess I found what is not working. I will try converting LongCat to Flux format and see if it works.

JohnClaw reacted with heart emoji

@stduhpf
Copy link
ContributorAuthor

stduhpf commentedDec 8, 2025
edited
Loading

I think I got it?
output

With the padding fixed, but with diffusers format:
output

JohnClaw reacted with heart emoji

@stduhpf
Copy link
ContributorAuthor

stduhpf commentedDec 8, 2025
edited
Loading

With the character-level tokenization trick:
output

Might need testing to make sure the current implementation supports languages that don't use the latin alphabet. Also for now it's applied to text wrapped in single quotes ( ') only.

JohnClaw, fszontagh, and Green-Sky reacted with heart emoji

@stduhpfstduhpf marked this pull request as ready for reviewDecember 8, 2025 01:29
@stduhpf
Copy link
ContributorAuthor

Oh no, why are there so many conflicts now?

JohnClaw, Green-Sky, and lastrosade reacted with eyes emoji

@stduhpf
Copy link
ContributorAuthor

stduhpf commentedDec 8, 2025
edited
Loading

Using' as a quote delimiter was a bad idea because it's the same symbol used for apostrophes. I will change it to detect" instead

JohnClaw reacted with thumbs up emoji

@stduhpf
Copy link
ContributorAuthor

stduhpf commentedDec 8, 2025
edited
Loading

Somehow not fully working yet, but it's definitely able to see it's supposed to be a cat holding a sign, maybe because of the vision model
sd.exe --diffusion-model ..\ComfyUI\models\unet\longcat_edit_bfl_format-Q8_0.gguf --vae ..\ComfyUI\models\vae\flux\ae.safetensors --cfg-scale 4.5 --sampling-method euler -v --offload-to-cpu --preview proj --steps 50 --vae-tile-size 128 --qwen2vl ..\ComfyUI\models\clip\Qwen2.5-VL-7B-Instruct.Q4_K_M.gguf --color --seed 0 -r .\assets\flux\flux1-dev-q8_0.png --llm_vision ..\ComfyUI\models\clip_vision\Qwen2.5-VL-7B-Instruct.mmproj-f16.gguf -p "Change the text to say \"I'm a long one\""

refout
flux1-dev-q8_0output

(Also I made the change so it now needs double quotes around literal text)

JohnClaw reacted with heart emoji

@stduhpf
Copy link
ContributorAuthor

stduhpf commentedDec 8, 2025
edited
Loading

output

Somehow couldn't get it to remove the original text, but there it goes

JohnClaw, shengkaixuan, and fszontagh reacted with heart emoji

@stduhpfstduhpf changed the titleWip: Longcat-Image supportLongcat-Image / Longcat-Image-Edit supportDec 8, 2025
@stduhpfstduhpf changed the titleLongcat-Image / Longcat-Image-Edit supportfeat: Longcat-Image / Longcat-Image-Edit supportDec 8, 2025
@Rocky-Lee-001
Copy link

May I ask which comfyui node is used to load this GGUF model?

@stduhpf
Copy link
ContributorAuthor

Now supports UTF-8 encoding properly for the quoted text. (also quote characters are no longer excluded from the prompt after being parsed, seems to help a bit, especially with longer text.)

JohnClaw reacted with heart emoji

@stduhpf
Copy link
ContributorAuthor

May I ask which comfyui node is used to load this GGUF model?

@Rocky-Lee-001 I don't think LongCat-Image is natively supported by ComfyUI yet. You could givehttps://github.com/sooxt98/comfyui_longcat_image a try, maybe it works well with the GGUF node for comfyUI?

JohnClaw reacted with thumbs up emoji

}
};

classSplitLinear :publicLinear {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

If this part has no effect, I think we can remove the related code. In fact, even if it does have some effect, additional work is required to handle it when LoRA uses QKV format, so I wouldn’t really recommend this approach.

JohnClaw reacted with thumbs up emoji
Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thi is used when loading Flux diffusion models with diffusers naming convention, which has the qkv matrices split as individual linear layers rather than one big linear layer. For some reason it is not quite working, not sure why.

JohnClaw reacted with thumbs up emoji
@leejet
Copy link
Owner

I’m not sure whether I did something wrong on my end, but I got a strange image.

.\bin\Release\sd-cli.exe --diffusion-model  ..\models\longcat_bfl_format-Q4_K_M.gguf --vae ..\..\ComfyUI\models\vae\ae.sft  --llm ..\..\ComfyUI\models\text_encoders\Qwen2.5-VL-7B-Instruct-Q8_0.gguf -p 'a lovely cat' --cfg-scale 5.0 -v --offload-to-cpu --diffusion-fa
output
stduhpf and JohnClaw reacted with eyes emoji

@stduhpf
Copy link
ContributorAuthor

stduhpf commentedDec 13, 2025
edited
Loading

@leejet that's strange. I can reproduce it with the same prompt though (even with Q8_0 model), but I haven't gotten anything like this in my earlier testing. Maybe There's a linear layer that could use scaling?

Does not seem related to seed.

It's a combination of short prompts + low resolution that seems to cause it.

JohnClaw reacted with thumbs up emoji

@leejet
Copy link
Owner

I think there are still some differences between the implementation in this PR and the official pipeline—for example, the way token padding is handled and whether masks are used. I suspect these differences might be what caused the strange blocky artifacts in the generated images. I tried to fix it, but didn’t succeed; now it’s producing completely black images.

https://github.com/leejet/stable-diffusion.cpp/tree/longcat-fix

stduhpf and JohnClaw reacted with thumbs up emojistduhpf and JohnClaw reacted with eyes emoji

@Yurchikian
Copy link

Yurchikian commentedDec 16, 2025
edited
Loading

I've tried this code and found out that short prompts lead to such a blocky modern-art generations, while long descriptive prompts produce higher quality results

For example~/projects/longcat.cpp/build/bin/sd --diffusion-model ~/Drive/AI/ComfyUI/models/unet/longcat_bfl_format-Q4_K_M.gguf --diffusion-fa --vae ~/Drive/AI/ComfyUI/models/vae/ae.sft --sampling-method euler --cfg-scale 3.0 --steps 20 --llm ~/Drive/AI/ComfyUI/models/text_encoders/Qwen2.5-VL-7B-Instruct-Q3_K_M.gguf --color -W 512 -H 512 --seed 128 -o output_$(date +%Y-%m-%d_%H-%M-%S).png --vae-tiling -p "Orange cat":
output_2025-12-16_22-24-08

Repeating "Orange Cat" several times:
output_2025-12-16_22-25-39

~/projects/longcat.cpp/build/bin/sd --diffusion-model ~/Drive/AI/ComfyUI/models/unet/longcat_bfl_format-Q4_K_M.gguf --diffusion-fa --vae ~/Drive/AI/ComfyUI/models/vae/ae.sft --sampling-method euler --cfg-scale 3.0 --steps 20 --llm ~/Drive/AI/ComfyUI/models/text_encoders/Qwen2.5-VL-7B-Instruct-Q3_K_M.gguf --color -W 512 -H 512 --seed 128 -o output_$(date +%Y-%m-%d_%H-%M-%S).png --vae-tiling -p "This is a portrait photograph of an Orange Cat sitting on the table by the window. We can see the room interior, fancy 50s style Art-Deco ornaments on the wallpaper. The weather outside is fine, the sun is shining. Wintertime snow on the ground, people playing with snowballs. The cat is large and shaped like a sphere. Whiskers are long and fangs are sharp. Phone snapshot, photograph."
output_2025-12-16_22-28-59

Increasing step count would reduce artifacts even more. However face seems bit blocky anyway, not sure if it would be true for original workflow, can not run non-gguf model on my GPU. Also increasing resolution would not help with those artifacts.

Hope my observations might be helpful

P.S. I've also checked out@leejetlongcat-fix branch. It generated black images indeed, until I added--clip-on-cpu, then i could get image with just "Orange Cat" prompt!
output_2025-12-16_23-29-39

stduhpf and JohnClaw reacted with thumbs up emoji

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@leejetleejetleejet left review comments

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

5 participants

@stduhpf@wbruna@Rocky-Lee-001@leejet@Yurchikian

[8]ページ先頭

©2009-2025 Movatter.jp