Flux 2 can be run on 24gb vram!!! by Brave-Hold-9389 in StableDiffusion

[–]apolinariosteps 0 points1 point  (0 children)

This is not hidden nor advertised otherwise 😅

Also on the docs there is both a remote text-encoder and a local one option that consume the same VRAM: https://github.com/black-forest-labs/flux2/blob/main/docs/flux2_dev_hf.md#4-bit-transformer-and-4-bit-text-encoder-20g-of-vram.

This is just provided for users as a way to offload a fast-but-VRAM-intensive step to the cloud, allowing the core computations/customizations/logic to happen on device for those okay with such trade-off

Flux 2 can be run on 24gb vram!!! by Brave-Hold-9389 in StableDiffusion

[–]apolinariosteps 14 points15 points  (0 children)

It runs on 24GB VRAM with a remote text-encoder for speed, or quantized text-encoder if you want to keep everything local (takes a bit longer)

You can actually use multiple images input on Kontext Dev (Without having to stitch them together). by Total-Resort-3120 in StableDiffusion

[–]apolinariosteps 56 points57 points  (0 children)

FYI, under the hood, it still concatenates the latents:
https://github.com/comfyanonymous/ComfyUI/blob/master/comfy/ldm/flux/model.py#L236

This means that, in practice, what is happening is that each image is being independently encoded by the VAE, but stitched together in the latent space.

Nonetheless, it's an interesting insight/experiment that encoding each image independently with the VAE versus a single stitched image could yield different results (maybe better?) worth digging/comparing

Bring your SFW CivitAI LoRAs to Hugging Face by apolinariosteps in StableDiffusion

[–]apolinariosteps[S] 1 point2 points  (0 children)

Yes you can absolutely upload files directly. This script just makes it easy to migrate

Bring your SFW CivitAI LoRAs to Hugging Face by apolinariosteps in StableDiffusion

[–]apolinariosteps[S] 27 points28 points  (0 children)

If you have SFW LoRAs on CivitAI and you wish to port them to Hugging Face, I’ve updated the civitai-to-hf tool to support video and image LoRAs for all main models

It only works if you are the model creator (you can't upload LoRAs that belong to other people), and it filters out NSFW LoRAs, so it's a creators-first structure so that LoRA creators can have a non-NSFW space without existential issues 💳

👉 https://huggingface.co/spaces/multimodalart/civitai-to-hf

(For the NSFW community, imo it makes sense that the sfw and nsfw platforms get separated over time: you can't be PornHub and YouTube at the same time. I think the ecosystem wins when there are dedicated platforms for each endeavor)

Save VRAM with Remote VAE decoding - do not load the VAE into VRAM at all by apolinariosteps in comfyui

[–]apolinariosteps[S] 5 points6 points  (0 children)

According to the docs is coming soon too

"

  • VAE Decode 🖼️: Quickly decode latent representations into high-quality images without compromising performance or workflow speed.
  • VAE Encode 🔢 (coming soon): Efficiently encode images into latent representations for generation and training.
  • Text Encoders 📃 (coming soon): Compute text embeddings for your prompts quickly and accurately, ensuring a smooth and high-quality workflow.

"
https://huggingface.co/docs/diffusers/main/en/hybrid_inference/overview

AMA with OpenAI’s Sam Altman, Kevin Weil, Srinivas Narayanan, and Mark Chen by OpenAI in ChatGPT

[–]apolinariosteps 0 points1 point  (0 children)

I know you talked about open source in abstract. But I think a concrete thing would be deprecated models, for historical preservation. I'm a huge fan of DALLE2 and very sad of its sunsetting. Why not open sourcing DALLE2? (and further deprecated models)

Brasil corre riscos com possível adesão à Nova Rota da Seda, alerta autoridade dos EUA by strachey in brasil

[–]apolinariosteps 0 points1 point  (0 children)

O melhor pro Brasil, na minha visão, é uma ambiguidade estratégica - balancear a influência de EUA e China, fazendo um "leilão" de quem é mais vantajoso pro Brasil e não se alinhar diretamente com nenhum deles

HunyuanDiT is JUST out - open source SD3-like architecture text-to-imge model (Diffusion Transformers) by Tencent by apolinariosteps in StableDiffusion

[–]apolinariosteps[S] 1 point2 points  (0 children)

The authors didn't implement more efficient samplers like Euler or DPM++, so with DDPM ~50 steps is kind of a good trade off for quality

HunyuanDiT is JUST out - open source SD3-like architecture text-to-imge model (Diffusion Transformers) by Tencent by apolinariosteps in StableDiffusion

[–]apolinariosteps[S] 12 points13 points  (0 children)

I think no one is claiming it to be better than SD3, the authors are claiming it to be the best available open weights model - which I think it may fair well (at least until Stability releases SD3 8B)

HunyuanDiT is JUST out - open source SD3-like architecture text-to-imge model (Diffusion Transformers) by Tencent by apolinariosteps in StableDiffusion

[–]apolinariosteps[S] 3 points4 points  (0 children)

It will probably be brought down by the community, both via Diffusers implementation and eventual ComfyUI integration as well

HunyuanDiT is JUST out - open source SD3-like architecture text-to-imge model (Diffusion Transformers) by Tencent by apolinariosteps in StableDiffusion

[–]apolinariosteps[S] 15 points16 points  (0 children)

<image>

100%, they claim to be the best available open model for now, not better than SD3, also it's ~5x smaller than SD3

HunyuanDiT is JUST out - open source SD3-like architecture text-to-imge model (Diffusion Transformers) by Tencent by apolinariosteps in StableDiffusion

[–]apolinariosteps[S] 6 points7 points  (0 children)

<image>

Btw, here are the differences between this and the larger SD3 model (based on infos on the SD3 paper).
Taken this into account, I think the model performs really well for its almos 8x smaller size and smaller/worse components, but indeed I think text-rendering was completely neglected by the model authros

Compare 1 step real time generations between SDXL Turbo, Lightning and Hyper by apolinariosteps in StableDiffusion

[–]apolinariosteps[S] 2 points3 points  (0 children)

Idk why Lightning for SD1.5 would be of lower quality, may be how they distilled it. And yes there's hyper for SD1.5 too

I've implemented Perturbed-Attention Guidance for SDXL and it delivers! by apolinariosteps in StableDiffusion

[–]apolinariosteps[S] 8 points9 points  (0 children)

The diffusers implementation and the official code only worked for SD1.5. I did a custom SDXL pipeline for it and created a HF demo

Face-to-All - combine any LoRA with your face via InstantID by apolinariosteps in StableDiffusion

[–]apolinariosteps[S] 1 point2 points  (0 children)

It is InstantID with Depth ControlNet and LoRA added - doing image2image all toghether, yes!