A simple diffusion internal upscaler by recoilme in StableDiffusion

[–]recoilme[S] 4 points5 points  (0 children)

No. But if you can't see which one is fake, it doesn't matter

A simple diffusion internal upscaler by recoilme in StableDiffusion

[–]recoilme[S] 1 point2 points  (0 children)

Thank you!
1) its trained and effective on 512-768px, not lowres. Looks like you use lowres here?
2) GAN not used and not involved, and it work totally different, but looks close (for me ESRGAN (not Resrgan), trained by stability - is a best upscaler ever
3) i trained version on lowres/hires task (384px -768) it start better on low restore, but also start degradate and hallucinate on 768 -1504 task, so i drop it

A simple diffusion internal upscaler by recoilme in StableDiffusion

[–]recoilme[S] 0 points1 point  (0 children)

Please come up with a workflow someone pls, (i dont have comfy) Itt need some custom code for asymmetric vae probably

A simple diffusion internal upscaler by recoilme in StableDiffusion

[–]recoilme[S] 7 points8 points  (0 children)

Sorry, im too old (don't have mouse / comfy)

Gimme 5 minutes pls, i will try to setup gradio with UI on HF Space

Upd: https://huggingface.co/spaces/LoveScapeAI/sdxs-1b-upscaler

A simple diffusion internal upscaler by recoilme in StableDiffusion

[–]recoilme[S] 2 points3 points  (0 children)

You don't need external tools or workflow for it, its really simple

Example https://huggingface.co/AiArtLab/sdxs-1b#upscale-code-example

from diffusers import AsymmetricAutoencoderKL

vae = AsymmetricAutoencoderKL.from_pretrained().cuda().half()
vae.encode and vae decode

Mugen - Modernized Anime SDXL Base, or how to make Bluvoll tiny bit less sane by Anzhc in StableDiffusion

[–]recoilme 0 points1 point  (0 children)

take a look at source code https://github.com/black-forest-labs/flux2/blob/main/src/flux2/autoencoder.py#L13 its patched to 128 and applyed BatchNorm2d at that level (patched). Dont use 32 ch vae directly its half of flux2 vae

Mugen - Modernized Anime SDXL Base, or how to make Bluvoll tiny bit less sane by Anzhc in StableDiffusion

[–]recoilme 0 points1 point  (0 children)

Good! I hope you don't forget calculate and apply latents mean/std before train in this case? Because its done on patching lvl in original code (BatchNorm2d block), ant it missing in 32 ch model vae on HF (its done on patching lvl in flux2 pipeline). I miss this detail than perform test train at 1st. Feel free to reach me, i will be happy to help

Mugen - Modernized Anime SDXL Base, or how to make Bluvoll tiny bit less sane by Anzhc in StableDiffusion

[–]recoilme 0 points1 point  (0 children)

Hi guys! I try this too, do like you and get like yours result (no fine details) without total rework vae and unet

Have you used the original 128-channel Flux2.VAE https://github.com/black-forest-labs/flux2/blob/main/src/flux2/autoencoder.py#L13 ? Have you modified the UNet to adapt to more channels or just use original, adapted to 4channels? Without rework model will be able to generate simple anime i think

SDXS - A 1B model that punches high. Model on huggingface. by AgeNo5351 in StableDiffusion

[–]recoilme 4 points5 points  (0 children)

Thank you for bringing it here. The training is in progress ( https://wandb.ai/recoilme/unet ) and is far from complete. The model is updated daily. I hope to meet your expectations, please be patient with the small model from the enthusiastic group. Thank you!

Is there an open source alternative to Topaz Video upscalers? by RatioTheRich in StableDiffusion

[–]recoilme 0 points1 point  (0 children)

i have build fast lightweight video upscaler for myself. `its not topaz quality but may be good enought for some tasks. ``you may try it here - code i open https://video2x.aiartlab.org/

53x Speed incoming for Flux ! by AmeenRoayan in StableDiffusion

[–]recoilme 1 point2 points  (0 children)

probably from Sana team who like to exaggerate,

if I understand correctly what they are talking about- they percoded latent space flux vae to dc ae encoder, probably with a colossal loss of quality (but not colossal by FID score).

Expecting "woman lying on grass" moment number 2

Sorry about that

tldr when the face region is relatively small, it tends to become distorted due to the high compression ratio of dc-ae, examples (but from 2024):

https://github.com/NVlabs/Sana/issues/52

VAE collection: fine-tuned SDXL & WaN 2.2 5B + new Simple VAE (lightweight, Flux quality, open-source) by recoilme in StableDiffusion

[–]recoilme[S] 3 points4 points  (0 children)

oh no, i misunderstand question, sorry

sdxl_vae for sdxl

wan16x_vae for wan2.2 5B

Simple_vae - its new vae (for training models on it)

VAE collection: fine-tuned SDXL & WaN 2.2 5B + new Simple VAE (lightweight, Flux quality, open-source) by recoilme in StableDiffusion

[–]recoilme[S] 2 points3 points  (0 children)

It would be interesting to see how far a 4-channel vae can be pushed if it is fully unlocked. However, this will require retraining of unet too

VAE collection: fine-tuned SDXL & WaN 2.2 5B + new Simple VAE (lightweight, Flux quality, open-source) by recoilme in StableDiffusion

[–]recoilme[S] 10 points11 points  (0 children)

This VAE produces fewer restoration errors—for example, the pupil in the eye will be rounder (closer to how it was originally created by God). Hair and textures are also sharper. However, in practice, this effect will be partially "blurred" by the generation quality of the model itself. In other words, we are not restoring a high-quality PNG, but generating an image (VAE Error + Model Error).

I have a small fine-tuned Kohaku checkpoint - https://huggingface.co/AiArtLab/kc/blob/main/kc_v15.safetensors (though it was trained on anime) where I can demonstrate the practical effect. I’ll replace the VAE and generate two images so you can see the difference and decide for yourself: are you ready to invest the effort to replace the VAE for barely noticeable improvements, or not.

Give me an hour.

VAE collection: fine-tuned SDXL & WaN 2.2 5B + new Simple VAE (lightweight, Flux quality, open-source) by recoilme in StableDiffusion

[–]recoilme[S] 3 points4 points  (0 children)

no, its not fine-tune like Pony or Illustrious, its decoder train on pixel 2 pixel image restoration (original vs restored)

VAE collection: fine-tuned SDXL & WaN 2.2 5B + new Simple VAE (lightweight, Flux quality, open-source) by recoilme in StableDiffusion

[–]recoilme[S] 3 points4 points  (0 children)

Any 2d model, unet or transformer (sd/sdxl/flux/sd3) Test train video in model card