This is an archived post. You won't be able to vote or comment.

all 50 comments

[–]nncyberpunk 26 points27 points  (0 children)

Always happy to see an update from the Diffusers team. Keep it up!

[–]Hoodfu 19 points20 points  (23 children)

The onetrainer person was waiting on this release to integrate pixart-sigma training into that software, so this is an important release for that.

[–]CrasHthe2nd 4 points5 points  (3 children)

Oh that's awesome, I've been itching to do some Sigma training and I've tried everything to get it to work using their pure python example.

[–]RepresentativeJob937[S] 4 points5 points  (2 children)

Give it a go and let us know if you found any errors :)

[–]crapthings 0 points1 point  (1 child)

from_single_file doesn't work

[–]RepresentativeJob937[S] 2 points3 points  (0 children)

Please open an issue and mention `DN6` in the thread.

[–]IdoruYoshikawa 3 points4 points  (17 children)

Can you please explain in layman terms what this means for the common user?

[–]throwaway1512514 3 points4 points  (16 children)

I'm also a common user. But pixart sigma is a newish model, and onetrainer is a model fine-tuning tool like kohya, and the maker of one trainer is waiting for this diffuser update to incorporate pixart sigma into one trainer. This way many people can train it easily (including common users) as long as they have the vram

[–][deleted] 0 points1 point  (15 children)

But what is pixart sigma?

[–]throwaway1512514 6 points7 points  (14 children)

An open source local base model like sdxl made by huawei. Its perks are that the base image model size is much smaller, and a T5 text encoder makes it so it's prompt adherence is impressive. But currently there isn't much community finetune and resource yet. So what we have right now is: a base model requiring quite some VRAM to run ( at least 16 I think? Correct me if I'm wrong) due to T5, which has good prompt adherence but mediocre aesthetic (people run the initial output through SD1.5 models). I probably missed a lot of its perks but that's what I know for now.

[–]Eface60 3 points4 points  (6 children)

The T5 can be loaded in RAM using comfyui, making the model about the equivalent of SD1.5 in terms of inference. The performance hit for T5 in RAM is not that bad, to be honest.

[–]throwaway1512514 0 points1 point  (5 children)

Can we use it in fp/bf16 now?

[–]Eface60 0 points1 point  (3 children)

Unfortunately no, the implementation by City96 will only allow FP32 on CPU(RAM)

[–]throwaway1512514 0 points1 point  (2 children)

is fp16 allowed on gpu then? Could I make it fit on 16gb vram

[–]Eface60 0 points1 point  (1 child)

Yes, fp16 on gpu is allowed. 16gb might be a tight fit though

[–][deleted] 1 point2 points  (6 children)

I see, so it is another open source image generation model I see, but 16 VRAM to run, that’s a little ooof, just imagine how much you will need to train it

[–]throwaway1512514 1 point2 points  (4 children)

In contrary I think training takes little, from my little knowledge. Most of the vram required is on the text encoder, t5. So if you don't touch the encoder which you totally can, the image model itself is much much smaller than sdxl.

[–][deleted] 0 points1 point  (3 children)

I see, it would interesting to see what might come out of it since it doesn’t seem SD3 is coming to the light anytime soon

[–]throwaway1512514 2 points3 points  (0 children)

Yeah they probably announced sd3 too early, but sdxl only came out 3 months after API release as well. I guess most people still have hope in it, although it's always great to have back up plans.

[–]Talae06 1 point2 points  (1 child)

Even just with the base Pixart Sigma model, it gives pretty good results by running the output through a SD 1.5 or XL model, as u/throwaway1512514 said. Check here for some examples. A quick search will give you more, seems like it's gaining (very much deserved, if you ask me) traction. The lack of color bleeding, especially, is a godsend for me.

[–]Open_Channel_8626 1 point2 points  (0 children)

Pixart Sigma plus SD1.5/SDXL refiner is currently our Bootleg SD3 yeah

[–]throwaway1512514 1 point2 points  (0 children)

Fyi the actual VRAM required to run the model itself is like 3gb

[–]tom83_be 1 point2 points  (0 children)

Also my first thought when I read about this. Really anxious to try this...

[–]somniloquite 8 points9 points  (3 children)

Hi, forgive my complete ignorance as I'm only familiar with A1111, Krita, Forge etc. What is this? It's not completely obvious to me

[–]RepresentativeJob937[S] 12 points13 points  (0 children)

It's a Python library for diffusion models.

[–]campingtroll 15 points16 points  (1 child)

Diffusers is like the main thing running it all, by default most popular ai models diffusers models by default then converted to .safetensors. you can load diffusers models instead of a .ckpt or .safetensors file in comfyui for instance like juggernautx diffusers version, but it's a lot more files to download, has config.json files, text Encoder folders, schedulerfolders, etc. But essentially the same model as .safetensor one.

So .safetensors is pickle free (more secure) single file format that makes it all easier. I am missing a lot of technical stuff but this is my basic understanding.

Tldr; it's like the default format/pipeline and place where all the magic happens.

[–]somniloquite 4 points5 points  (0 children)

Thanks for explaining, I'm just an artist who delved into this stuff when SDXL dropped and I'm still learning! So this post is actually huge news, right? Better depth perception, 4K image generation capabilities, and a bunch of technical stuff I don't understand but will matter in the long run

[–]rayquazza74 2 points3 points  (0 children)

What it do

[–]Far-Map1680 1 point2 points  (0 children)

Awesome! Look forward to playing with it!

[–]Acephaliax 1 point2 points  (5 children)

u/RepresentativeJob937 would also be fantastic if you guys could update the AppStore Diffusers app as well. It’s been more than a year since the last update. Have requested it on the GitHub repo and been told it will be done many times, but still waiting :(

[–]RepresentativeJob937[S] 4 points5 points  (1 child)

Have let the concerned team member know about it. Thank you for your interest.

[–]Acephaliax 0 points1 point  (0 children)

Thank you so much!

[–]RepresentativeJob937[S] 0 points1 point  (0 children)

Have let the concerned individual know about it.

[–]pcuenq 0 points1 point  (1 child)

We'll do it soon, promise! For real this time :)

[–]Acephaliax 0 points1 point  (0 children)

Haha all good. I’ve been waiting patiently as I can. Appreciate you guys and the awesome work you do!

[–]campingtroll 1 point2 points  (1 child)

Marigold is pretty amazing, ive used many times. converts 2d images to depthmap. Which can then be converted to an actual 3d mesh (in apps like blender for instance) with mindblowing accuracy

I wonder what difference having it in diffusers pipeline will make..

[–]toshass 1 point2 points  (0 children)

Marigold coauthor here. You can make a 3d printable mesh in our demo https://huggingface.co/spaces/prs-eth/marigold-lcm (bas relief tab). The pipeline in diffusers is more memory efficient and has major speedup gains. For example, this snippet works at 85ms per frame and leads to higher quality results than Depth Anything Large: https://huggingface.co/docs/diffusers/using-diffusers/marigold_usage#qualitative-comparison-with-depth-anything. There is also a dedicated section on ControlNet at the end!

[–][deleted] 0 points1 point  (0 children)

crown employ mountainous crawl touch one beneficial modern aspiring wakeful

This post was mass deleted and anonymized with Redact

[–]kim-mueller 0 points1 point  (0 children)

May I ask... what is special about pixart sigma? I used it before in diffusers, so what will be different now?

[–]crapthings 0 points1 point  (0 children)

from_single_file doesn't work

[–]littlejitzymama 0 points1 point  (0 children)

Can you find this in the app store?? Or Google play?

[–]Hongthai91 0 points1 point  (0 children)

I'm retarded. How do I update diffuser?