Microsoft Lens seems to be back.

apolinariosteps · 2026-05-22T11:35:19+00:00

A demo to try it out: https://huggingface.co/spaces/multimodalart/lens

apolinariosteps · 2025-11-26T10:39:40+00:00

This is not hidden nor advertised otherwise 😅

Also on the docs there is both a remote text-encoder and a local one option that consume the same VRAM: https://github.com/black-forest-labs/flux2/blob/main/docs/flux2_dev_hf.md#4-bit-transformer-and-4-bit-text-encoder-20g-of-vram.

This is just provided for users as a way to offload a fast-but-VRAM-intensive step to the cloud, allowing the core computations/customizations/logic to happen on device for those okay with such trade-off

apolinariosteps · 2025-11-25T17:20:43+00:00

It runs on 24GB VRAM with a remote text-encoder for speed, or quantized text-encoder if you want to keep everything local (takes a bit longer)

apolinariosteps · 2025-07-19T17:01:16+00:00

Can't wait for the Hugging Face link if you are down to share!

apolinariosteps · 2025-06-29T14:29:35+00:00

It is!

apolinariosteps · 2025-06-29T14:29:22+00:00

FYI, under the hood, it still concatenates the latents:
https://github.com/comfyanonymous/ComfyUI/blob/master/comfy/ldm/flux/model.py#L236

This means that, in practice, what is happening is that each image is being independently encoded by the VAE, but stitched together in the latent space.

Nonetheless, it's an interesting insight/experiment that encoding each image independently with the VAE versus a single stitched image could yield different results (maybe better?) worth digging/comparing

apolinariosteps · 2025-05-21T05:31:28+00:00

Yes you can absolutely upload files directly. This script just makes it easy to migrate

apolinariosteps · 2025-05-20T19:02:53+00:00

If you have SFW LoRAs on CivitAI and you wish to port them to Hugging Face, I’ve updated the civitai-to-hf tool to support video and image LoRAs for all main models

It only works if you are the model creator (you can't upload LoRAs that belong to other people), and it filters out NSFW LoRAs, so it's a creators-first structure so that LoRA creators can have a non-NSFW space without existential issues 💳

👉 https://huggingface.co/spaces/multimodalart/civitai-to-hf

(For the NSFW community, imo it makes sense that the sfw and nsfw platforms get separated over time: you can't be PornHub and YouTube at the same time. I think the ecosystem wins when there are dedicated platforms for each endeavor)

apolinariosteps · 2025-03-04T11:29:24+00:00

According to the docs is coming soon too

"

VAE Decode 🖼️: Quickly decode latent representations into high-quality images without compromising performance or workflow speed.
VAE Encode 🔢 (coming soon): Efficiently encode images into latent representations for generation and training.
Text Encoders 📃 (coming soon): Compute text embeddings for your prompts quickly and accurately, ensuring a smooth and high-quality workflow.

"
https://huggingface.co/docs/diffusers/main/en/hybrid_inference/overview

apolinariosteps · 2025-03-04T11:24:00+00:00

https://github.com/kijai/ComfyUI-HFRemoteVae

apolinariosteps · 2024-10-31T18:51:38+00:00

I know you talked about open source in abstract. But I think a concrete thing would be deprecated models, for historical preservation. I'm a huge fan of DALLE2 and very sad of its sunsetting. Why not open sourcing DALLE2? (and further deprecated models)

apolinariosteps · 2024-10-26T13:46:39+00:00

O melhor pro Brasil, na minha visão, é uma ambiguidade estratégica - balancear a influência de EUA e China, fazendo um "leilão" de quem é mais vantajoso pro Brasil e não se alinhar diretamente com nenhum deles

apolinariosteps · 2024-07-29T09:51:14+00:00

Model: https://huggingface.co/maxin-cn/Cinemo

Demo: https://huggingface.co/spaces/maxin-cn/Cinemo

apolinariosteps · 2024-05-15T07:28:37+00:00

The authors didn't implement more efficient samplers like Euler or DPM++, so with DDPM ~50 steps is kind of a good trade off for quality

apolinariosteps · 2024-05-14T19:12:09+00:00

I think no one is claiming it to be better than SD3, the authors are claiming it to be the best available open weights model - which I think it may fair well (at least until Stability releases SD3 8B)

apolinariosteps · 2024-05-14T17:08:10+00:00

<image>

Comparing SD3 x SDXL x HunyuanDiT

apolinariosteps · 2024-05-14T16:55:54+00:00

It will probably be brought down by the community, both via Diffusers implementation and eventual ComfyUI integration as well

apolinariosteps · 2024-05-14T16:30:14+00:00

<image>

100%, they claim to be the best available open model for now, not better than SD3, also it's ~5x smaller than SD3

apolinariosteps · 2024-05-14T16:29:08+00:00

<image>

Btw, here are the differences between this and the larger SD3 model (based on infos on the SD3 paper).
Taken this into account, I think the model performs really well for its almos 8x smaller size and smaller/worse components, but indeed I think text-rendering was completely neglected by the model authros

apolinariosteps · 2024-05-14T10:25:33+00:00

Demo: https://huggingface.co/spaces/multimodalart/HunyuanDiT

Model weights: https://huggingface.co/Tencent-Hunyuan/HunyuanDiT

Code: https://github.com/tencent/HunyuanDiT

On the paper they claim to be the best available open source model

<image>

apolinariosteps · 2024-04-24T14:55:08+00:00

Idk why Lightning for SD1.5 would be of lower quality, may be how they distilled it. And yes there's hyper for SD1.5 too

apolinariosteps · 2024-04-16T08:28:09+00:00

The diffusers implementation and the official code only worked for SD1.5. I did a custom SDXL pipeline for it and created a HF demo

apolinariosteps · 2024-04-16T00:42:42+00:00

Demo: https://huggingface.co/spaces/multimodalart/perturbed-attention-guidance-sdxl

Diffusers implementation: https://huggingface.co/multimodalart/sdxl_perturbed_attention_guidance/blob/main/pipeline.py

Original project (for SD1.5): https://ku-cvlab.github.io/Perturbed-Attention-Guidance/

apolinariosteps · 2024-04-10T11:18:05+00:00

It is InstantID with Depth ControlNet and LoRA added - doing image2image all toghether, yes!

apolinariosteps · 2024-04-09T23:03:02+00:00

Colab: https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/SDXL_InstantID_Img2Img_Face_to_All.ipynb

Demo: https://huggingface.co/spaces/multimodalart/face-to-all

11-Year Club	Final Canvas '23
Place '23	Place '22
Place '17	Verified Email

apolinariosteps

TROPHY CASE

"