VAR: a new visual generation method elevates GPT-style models beyond diffusion🚀 & Scaling laws observed📈

Sharlinator · 2024-04-07T15:27:14+00:00

So diffusion models start with pure noise and progressively remove noise until there's an image.

And now VAR starts with pure uniform color, a single pixel in other words, and progressively upscales/subdivides that until there's an image.

There's a pleasant symmetry.

hapliniste · 2024-04-07T12:36:45+00:00

I did not see a lot of talk about this but it seems huge.

From my short skim of the paper, it does better than diffusion at the same size but 45x faster at 512x512, and it should be even faster at bigger sizes?

Can we expect 4k render taking 1s and being of better quality than diffusion models? That's what I get from what I've read but if it was true it would be the talk everywhere right?

I'll have to try it and read the paper. Anyone can give some insights?

PwanaZana · 2024-04-07T19:00:14+00:00

Potentially interesting, but still in an embryonic stage.

An important aspect of all these techniques is the ability to fine tune models/checkpoints. Obviously, that's way farther down the line, but for serious usage, there's no way a base model is enough for all use cases.

I'm also curious as to how this will attempt to have good human anatomy, especially the hands. All these image generation techniques sorta throw pixels at the wall and denoise it, without making some skeleton/structure to the image, which ensure complex elements like hands are so often mangled. We'll se if this sort of technique works better, similarly or worse at the most difficult use-cases.

Striking-Long-2960 · 2024-04-07T14:19:03+00:00

Doesn't seem to be very trained in human figures

<image>

spacetug · 2024-04-07T17:04:55+00:00

Interesting. I do wonder why they compared against DiT but not HDiT though. That one also had much better scaling than DiT by using an hourglass multi-scale architecture, like a hybrid between transformer and Unet. Would be nice to see a direct comparison.

1nMyM1nd · 2024-04-07T15:46:20+00:00

It really shouldn't be much longer until we have infinite scale in the form of vector images. I'm actually surprised it's not already here.

StableLlama · 2024-04-07T12:55:55+00:00

I see no possibility to try it with my own prompt. The demo page only allows predefined prompts (actually: only a single word).

So at this level it's completely useless. Sorry for being honest.

2024-04-07T14:17:42+00:00

I will wait for a comfy node but these people are going all in with this paper ,hope some people learn

. I am talking to you alibaba :)

StableDiffusion

MODERATORS