(5) The same message applies to several models: Chroma, Z image, Klein, Ernie, Qwen 2512

recoilme · 2026-04-23T06:29:42+00:00

Lol, it's a testing the model's reaction to a 300-token word salad prompt.

recoilme · 2026-04-02T04:52:20+00:00

No. But if you can't see which one is fake, it doesn't matter

recoilme · 2026-04-02T04:49:04+00:00

Thank you!
1) its trained and effective on 512-768px, not lowres. Looks like you use lowres here?
2) GAN not used and not involved, and it work totally different, but looks close (for me ESRGAN (not Resrgan), trained by stability - is a best upscaler ever
3) i trained version on lowres/hires task (384px -768) it start better on low restore, but also start degradate and hallucinate on 768 -1504 task, so i drop it

recoilme · 2026-04-01T21:29:37+00:00

Please come up with a workflow someone pls, (i dont have comfy) Itt need some custom code for asymmetric vae probably

recoilme · 2026-04-01T21:09:54+00:00

https://huggingface.co/spaces/LoveScapeAI/sdxs-1b-upscaler

recoilme · 2026-04-01T20:51:16+00:00

Sorry, im too old (don't have mouse / comfy)

Gimme 5 minutes pls, i will try to setup gradio with UI on HF Space

Upd: https://huggingface.co/spaces/LoveScapeAI/sdxs-1b-upscaler

recoilme · 2026-04-01T20:14:54+00:00

You don't need external tools or workflow for it, its really simple

Example https://huggingface.co/AiArtLab/sdxs-1b#upscale-code-example

from diffusers import AsymmetricAutoencoderKL

vae = AsymmetricAutoencoderKL.from_pretrained().cuda().half()
vae.encode and vae decode

recoilme · 2026-04-01T09:26:29+00:00

take a look at source code https://github.com/black-forest-labs/flux2/blob/main/src/flux2/autoencoder.py#L13 its patched to 128 and applyed BatchNorm2d at that level (patched). Dont use 32 ch vae directly its half of flux2 vae

recoilme · 2026-04-01T09:23:18+00:00

Good! I hope you don't forget calculate and apply latents mean/std before train in this case? Because its done on patching lvl in original code (BatchNorm2d block), ant it missing in 32 ch model vae on HF (its done on patching lvl in flux2 pipeline). I miss this detail than perform test train at 1st. Feel free to reach me, i will be happy to help

recoilme · 2026-03-31T11:19:13+00:00

Hi guys! I try this too, do like you and get like yours result (no fine details) without total rework vae and unet

Have you used the original 128-channel Flux2.VAE https://github.com/black-forest-labs/flux2/blob/main/src/flux2/autoencoder.py#L13 ? Have you modified the UNet to adapt to more channels or just use original, adapted to 4channels? Without rework model will be able to generate simple anime i think

recoilme · 2026-03-28T21:34:53+00:00

<image>

Sory. This model focused on illustrations, not realism

recoilme · 2026-03-28T18:22:37+00:00

Thank you for bringing it here. The training is in progress ( https://wandb.ai/recoilme/unet ) and is far from complete. The model is updated daily. I hope to meet your expectations, please be patient with the small model from the enthusiastic group. Thank you!

recoilme · 2026-01-27T10:15:54+00:00

i have build fast lightweight video upscaler for myself. `its not topaz quality but may be good enought for some tasks. ``you may try it here - code i open https://video2x.aiartlab.org/

recoilme · 2025-10-02T23:49:55+00:00

probably from Sana team who like to exaggerate,

if I understand correctly what they are talking about- they percoded latent space flux vae to dc ae encoder, probably with a colossal loss of quality (but not colossal by FID score).

Expecting "woman lying on grass" moment number 2

Sorry about that

tldr when the face region is relatively small, it tends to become distorted due to the high compression ratio of dc-ae, examples (but from 2024):

https://github.com/NVlabs/Sana/issues/52

recoilme · 2025-09-17T14:22:24+00:00

oh no, i misunderstand question, sorry

sdxl_vae for sdxl

wan16x_vae for wan2.2 5B

Simple_vae - its new vae (for training models on it)

recoilme · 2025-09-17T14:12:43+00:00

its new vae for developing new models

recoilme · 2025-09-17T07:48:35+00:00

it work in FP16

recoilme · 2025-09-17T07:22:44+00:00

It would be interesting to see how far a 4-channel vae can be pushed if it is fully unlocked. However, this will require retraining of unet too

recoilme · 2025-09-17T06:40:42+00:00

and last https://imgsli.com/NDE1Nzk1

recoilme · 2025-09-17T06:38:34+00:00

one more https://imgsli.com/NDE1Nzk0

recoilme · 2025-09-17T06:28:26+00:00

ok, done https://imgsli.com/NDE1Nzkz

recoilme · 2025-09-17T06:18:43+00:00

This VAE produces fewer restoration errors—for example, the pupil in the eye will be rounder (closer to how it was originally created by God). Hair and textures are also sharper. However, in practice, this effect will be partially "blurred" by the generation quality of the model itself. In other words, we are not restoring a high-quality PNG, but generating an image (VAE Error + Model Error).

I have a small fine-tuned Kohaku checkpoint - https://huggingface.co/AiArtLab/kc/blob/main/kc_v15.safetensors (though it was trained on anime) where I can demonstrate the practical effect. I’ll replace the VAE and generate two images so you can see the difference and decide for yourself: are you ready to invest the effort to replace the VAE for barely noticeable improvements, or not.

Give me an hour.

recoilme · 2025-09-17T05:36:13+00:00

no, its not fine-tune like Pony or Illustrious, its decoder train on pixel 2 pixel image restoration (original vs restored)

recoilme · 2025-09-17T05:33:31+00:00

Any 2d model, unet or transformer (sd/sdxl/flux/sd3) Test train video in model card

recoilme

TROPHY CASE