leejet/stable-diffusion.cppPublic

NotificationsYou must be signed in to change notification settings
Fork478
Star4.9k

feat: introduce GGMLBlock and implement SVD(Broken)#159

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Merged

leejet merged 6 commits intomasterfromsvd

Feb 24, 2024

Merged

feat: introduce GGMLBlock and implement SVD(Broken)#159

leejet merged 6 commits intomasterfromsvd

Feb 24, 2024

Conversation

Copy link

Owner

leejet commentedJan 27, 2024

In the past few weeks during my free time, I've been working on implementing this PR. In this PR, I introduced the GGMLBlock, making it easier to implement neural networks. In most cases, it's straightforward to implement the corresponding GGMLBlock from nn.Module. I have implemented the majority of the building blocks for SVD and the SVD pipeline, except for VScalingWithEDMcNoise, which is also relatively simple to implement.

However, I've started to feel fatigued. ggml's batch inference implementation for certain operators has issues, and although I've addressed some problems in this branchhttps://github.com/leejet/ggml/tree/batch-inference, it's not entirely resolved. Furthermore, there are situations where NaN occurs in the implementation of some operators, and these issues also need to be fixed. If I have time in the future, I'll continue addressing these issues with ggml. However, for now, I'll be allocating my free time to other tasks as I've already invested a considerable amount of effort in implementing this PR over the past few weeks. Perhaps I'll merge this PR first, even though the SVD support is broken. This is because the PR introduces GGMLBlock, which makes it convenient to use ggml for implementing neural networks. The test results for batch inference are documented in the comments of the test functions in unet.hpp/vae.hpp; take a look if you're interested.

leejet added2 commits

January 27, 2024 19:49

introduce GGMLBlock and implement SVD(Broken)

d7990ea

static std::vector<int> flat_index_to_indices

3aa99ed

Copy link

Amin456789 commentedJan 27, 2024•
edited
Loading

this is amazing news, thank u so much for ur hard work leejet. can't wait u guys fix svd and try it on here

on the side note = convert feature seems still not working for quantize to under fp16 it seems, i tried to convert sdxl turbo to q5.1 and it didnt generate images [i mentioned it in that issues in safetensors topic]. could u please fix it as it will be very useful to convert svd model to quantize to q4.1 for example will be very fast. converting on the fly works but converting to gguf dont

Copy link

Contributor

FSSRepo commentedJan 27, 2024

Doing batch inference will only be reserved for when you have a lot of VRAM. I think now we will be able to perform a single computation of a UNet in which a batch of conditionalsc = [c, uc] is used.

Copy link

Contributor

FSSRepo commentedJan 27, 2024•
edited
Loading

I was planning to refactor Stable Diffusion to have an API similar to llama.cpp and also support offloading, computing sd, and controlnet on the GPU with low VRAM. However, upon reviewing this refactoring, I think it's better to just extend what I need to make a web UI work.

Copy link

Contributor

Cyberhan123 commentedJan 28, 2024

I was planning to refactor Stable Diffusion to have an API similar to llama.cpp and also support offloading, computing sd, and controlnet on the GPU with low VRAM. However, upon reviewing this refactoring, I think it's better to just extend what I need to make a web UI work.

@FSSRepo Basically this PR that I'm implementing:#157
I split the loading logic of clip, vae, and unet, and then added the set_options api. I think we did the same thing.

Copy link

Contributor

Cyberhan123 commentedJan 28, 2024•
edited
Loading

It is understandable that problems will arise, ggml is a lib mainly to support llama. However, for me, ggml has several advantages that cannot be ignored compared to pytorch. It is small enough (pytorch cuda dependency is about 1GB), and it supports quantification very well, and supports windows rocm.

Copy link

Contributor

Cyberhan123 commentedJan 28, 2024

GGMLBlock is very educational and this implementation is great.

Copy link

OwnerAuthor

leejet commentedJan 28, 2024

Doing batch inference will only be reserved for when you have a lot of VRAM. I think now we will be able to perform a single computation of a UNet in which a batch of conditionalsc = [c, uc] is used.

For svd, batch inference is a must, the ne3 is actually batch size * num video frames.

Copy link

OwnerAuthor

leejet commentedJan 28, 2024

on the side note = convert feature seems still not working for quantize to under fp16 it seems, i tried to convert sdxl turbo to q5.1 and it didnt generate images [i mentioned it in that issues in safetensors topic]. could u please fix it as it will be very useful to convert svd model to quantize to q4.1 for example will be very fast. converting on the fly works but converting to gguf dont

did you use the fp16-fix vae?

Copy link

Contributor

Cyberhan123 commentedJan 28, 2024•
edited
Loading

on the side note = convert feature seems still not working for quantize to under fp16 it seems, i tried to convert sdxl turbo to q5.1 and it didnt generate images [i mentioned it in that issues in safetensors topic]. could u please fix it as it will be very useful to convert svd model to quantize to q4.1 for example will be very fast. converting on the fly works but converting to gguf dont
did you use the fp16-fix vae?

I found a problem when using it now. Regarding generating seeds, if the seed is 42, then the generated pictures are correct.
But if the seeds are random, the pictures will often be generated very strangely. I don’t know much about the behavior on pytorch.

seed 42:

random seed

Copy link

Contributor

Cyberhan123 commentedJan 28, 2024

What shocks me is that for a 768x768 image (sdxl-turbo) on 7900xtx, a single sampling only takes 0.35s. It seems that the performance bottleneck lies in the decoding operation.

Copy link

Amin456789 commentedJan 28, 2024•
edited
Loading

im using taesdxl and it works great, with taesdxl i dont need to use vae fp16 fix and generated images are great with 1 step and lcm sampler for sdxl turbo, however, what i meant was converting the models to q4.1 gguf files with -m convert command in cmd for having smaller models which gave me errors after generating image [the same as converting safetensors topic in issues if i remember]
converting on the fly and making images works great but -m convert --type to quantize like q4.1 somehow curropt the model sdxl turbo i think

Copy link

OwnerAuthor

leejet commentedJan 29, 2024

But if the seeds are random, the pictures will often be generated very strangely

@Cyberhan123 I got same result in sd-webui using seed 297003140.

leejet added4 commits

February 24, 2024 19:31

Merge branch 'master' into svd

9f7c92d

make IMG2VID unusable because it's broken

f718014

rm tensor.hpp

9566f59

add sdxl vae warning

2467fd6

Copy link

OwnerAuthor

leejet commentedFeb 24, 2024

I will merge this PR even though the SVD support is broken. This is because the PR introduces GGMLBlock, which makes it convenient to use ggml for implementing neural networks. I have other changes that rely on GGMLBlock, such as adding support for stable cascade. I will try to fix the SVD issue later if I have time.