Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

feat: introduce GGMLBlock and implement SVD(Broken)#159

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
leejet merged 6 commits intomasterfromsvd
Feb 24, 2024
Merged

Conversation

@leejet
Copy link
Owner

In the past few weeks during my free time, I've been working on implementing this PR. In this PR, I introduced the GGMLBlock, making it easier to implement neural networks. In most cases, it's straightforward to implement the corresponding GGMLBlock from nn.Module. I have implemented the majority of the building blocks for SVD and the SVD pipeline, except for VScalingWithEDMcNoise, which is also relatively simple to implement.

However, I've started to feel fatigued. ggml's batch inference implementation for certain operators has issues, and although I've addressed some problems in this branchhttps://github.com/leejet/ggml/tree/batch-inference, it's not entirely resolved. Furthermore, there are situations where NaN occurs in the implementation of some operators, and these issues also need to be fixed. If I have time in the future, I'll continue addressing these issues with ggml. However, for now, I'll be allocating my free time to other tasks as I've already invested a considerable amount of effort in implementing this PR over the past few weeks. Perhaps I'll merge this PR first, even though the SVD support is broken. This is because the PR introduces GGMLBlock, which makes it convenient to use ggml for implementing neural networks. The test results for batch inference are documented in the comments of the test functions in unet.hpp/vae.hpp; take a look if you're interested.

Green-Sky, Jonathhhan, Amin456789, Cyberhan123, lin72h, and JohnClaw reacted with thumbs up emojiFSSRepo, Amin456789, Cyberhan123, and lin72h reacted with hooray emojiggerganov and lin72h reacted with heart emojilin72h reacted with rocket emoji
@Amin456789
Copy link

Amin456789 commentedJan 27, 2024
edited
Loading

this is amazing news, thank u so much for ur hard work leejet. can't wait u guys fix svd and try it on here

on the side note = convert feature seems still not working for quantize to under fp16 it seems, i tried to convert sdxl turbo to q5.1 and it didnt generate images [i mentioned it in that issues in safetensors topic]. could u please fix it as it will be very useful to convert svd model to quantize to q4.1 for example will be very fast. converting on the fly works but converting to gguf dont

lin72h reacted with thumbs up emoji

@FSSRepo
Copy link
Contributor

Doing batch inference will only be reserved for when you have a lot of VRAM. I think now we will be able to perform a single computation of a UNet in which a batch of conditionalsc = [c, uc] is used.

@FSSRepo
Copy link
Contributor

FSSRepo commentedJan 27, 2024
edited
Loading

I was planning to refactor Stable Diffusion to have an API similar to llama.cpp and also support offloading, computing sd, and controlnet on the GPU with low VRAM. However, upon reviewing this refactoring, I think it's better to just extend what I need to make a web UI work.

lin72h reacted with thumbs up emoji

@Cyberhan123
Copy link
Contributor

I was planning to refactor Stable Diffusion to have an API similar to llama.cpp and also support offloading, computing sd, and controlnet on the GPU with low VRAM. However, upon reviewing this refactoring, I think it's better to just extend what I need to make a web UI work.

@FSSRepo Basically this PR that I'm implementing:#157
I split the loading logic of clip, vae, and unet, and then added the set_options api. I think we did the same thing.

@Cyberhan123
Copy link
Contributor

Cyberhan123 commentedJan 28, 2024
edited
Loading

It is understandable that problems will arise, ggml is a lib mainly to support llama. However, for me, ggml has several advantages that cannot be ignored compared to pytorch. It is small enough (pytorch cuda dependency is about 1GB), and it supports quantification very well, and supports windows rocm.

@Cyberhan123
Copy link
Contributor

GGMLBlock is very educational and this implementation is great.

lin72h reacted with thumbs up emoji

@leejet
Copy link
OwnerAuthor

Doing batch inference will only be reserved for when you have a lot of VRAM. I think now we will be able to perform a single computation of a UNet in which a batch of conditionalsc = [c, uc] is used.

For svd, batch inference is a must, the ne3 is actually batch size * num video frames.

@leejet
Copy link
OwnerAuthor

on the side note = convert feature seems still not working for quantize to under fp16 it seems, i tried to convert sdxl turbo to q5.1 and it didnt generate images [i mentioned it in that issues in safetensors topic]. could u please fix it as it will be very useful to convert svd model to quantize to q4.1 for example will be very fast. converting on the fly works but converting to gguf dont

did you use the fp16-fix vae?

@Cyberhan123
Copy link
Contributor

Cyberhan123 commentedJan 28, 2024
edited
Loading

on the side note = convert feature seems still not working for quantize to under fp16 it seems, i tried to convert sdxl turbo to q5.1 and it didnt generate images [i mentioned it in that issues in safetensors topic]. could u please fix it as it will be very useful to convert svd model to quantize to q4.1 for example will be very fast. converting on the fly works but converting to gguf dont

did you use the fp16-fix vae?

I found a problem when using it now. Regarding generating seeds, if the seed is 42, then the generated pictures are correct.
But if the seeds are random, the pictures will often be generated very strangely. I don’t know much about the behavior on pytorch.

seed 42:
image

random seed
image

Cyberhan123 reacted with eyes emoji

@Cyberhan123
Copy link
Contributor

What shocks me is that for a 768x768 image (sdxl-turbo) on 7900xtx, a single sampling only takes 0.35s. It seems that the performance bottleneck lies in the decoding operation.

@Amin456789
Copy link

Amin456789 commentedJan 28, 2024
edited
Loading

im using taesdxl and it works great, with taesdxl i dont need to use vae fp16 fix and generated images are great with 1 step and lcm sampler for sdxl turbo, however, what i meant was converting the models to q4.1 gguf files with -m convert command in cmd for having smaller models which gave me errors after generating image [the same as converting safetensors topic in issues if i remember]
converting on the fly and making images works great but -m convert --type to quantize like q4.1 somehow curropt the model sdxl turbo i think

@leejet
Copy link
OwnerAuthor

But if the seeds are random, the pictures will often be generated very strangely

@Cyberhan123 I got same result in sd-webui using seed 297003140.

lin72h reacted with eyes emoji

@leejet
Copy link
OwnerAuthor

I will merge this PR even though the SVD support is broken. This is because the PR introduces GGMLBlock, which makes it convenient to use ggml for implementing neural networks. I have other changes that rely on GGMLBlock, such as adding support for stable cascade. I will try to fix the SVD issue later if I have time.

lin72h reacted with thumbs up emojiGreen-Sky, JohnClaw, lin72h, and FSSRepo reacted with hooray emoji

@leejetleejet merged commitb636886 intomasterFeb 24, 2024
rmatif pushed a commit to rmatif/stable-diffusion.cpp that referenced this pull requestApr 8, 2025
@leejetleejet deleted the svd branchSeptember 16, 2025 15:35
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

No reviews

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

5 participants

@leejet@Amin456789@FSSRepo@Cyberhan123

[8]ページ先頭

©2009-2025 Movatter.jp