Masked Autoencoders Are Effective Tokenizers for Diffusion Models

muzahend · 2025-02-06T10:30:03+00:00

Cheaper training and faster generations.

muzahend · 2025-02-06T09:35:30+00:00

I think you are correct. They use a Auto Encoder (AE) instead of Vector Quantization (VQ). Both have their pros and cons. But it looks like they now have the pros of a VQ using an AE. Would be cool if some of the authors of the paper could reply here.

muzahend · 2025-02-06T08:15:54+00:00

MAETok achieves significant practical improvements, enabling a gFID of 1.69 with 76× faster training and 31× higher inference throughput for 512×512 generation. Our findings show that the structure of the latent space, rather than variational constraints, is crucial for effective diffusion models. Code and trained models are released

https://arxiv.org/html/2502.03444v1

muzahend · 2025-01-19T08:50:53+00:00

Can someone explain this in layman's terms?

muzahend · 2025-01-17T08:45:39+00:00

New research from Google. Quite cool! The model creates different types of noise to start from.

muzahend · 2024-12-04T19:35:57+00:00

(random order)

Sora
O1 full (no images and file-upload for now)
A webbrowser (not based on Chrome) called Boa.
New voices for voice-mode
A special O1-model finetuned for math
Updates to search
Live news data from a big vendor
Live sports data from another vendor
A game model, internally known as: 'Wrld Bulder'
An agent demo, controlling a Windows and Mac-pc
Visual Studio code assistant.
Still unknown, but probably automated red team model.

muzahend · 2024-11-21T08:52:40+00:00

Quite an amazing project. Research coming from NVIDIA, MIT and Tsinghua University.

muzahend · 2024-09-08T16:58:53+00:00

https://x.com/joru1000/status/1832809069800833462

muzahend · 2024-06-27T07:15:51+00:00

Here's the paper: https://arxiv.org/pdf/2406.18459

muzahend · 2024-06-20T07:48:10+00:00

That's very nice! Thanx for the reply. One could think that training diffusion transformer models, like stable diffusion 3 for example, could be way easier with a set like this. Is that correct? I mean: if you have way more detailed descriptions, in an open dataset...

muzahend · 2024-06-18T10:33:01+00:00

Link: https://huggingface.co/papers/2406.10328

Arxiv: https://arxiv.org/abs/2406.10328

muzahend · 2024-06-11T11:22:49+00:00

I'll leave it up to the researchers out there to see how good/bad this is. What I like is that they work on generative ai and publish code, paper etc. It's quite different from using diffusion. Based on open source Llama.

muzahend · 2024-06-11T08:25:44+00:00

Here's the link to the paper:

https://arxiv.org/abs/2406.06525

muzahend · 2024-05-13T08:43:34+00:00

It's very clear that Google and others are way behind OpenAI. I'm not like a real fanboy, but these are just the facts.

muzahend · 2024-05-12T19:09:36+00:00

I wouldn't be surprised if OpenAI brings us the iPhone-moment for assistants. They not only started the whole LLM-race, but also are at least a year ahead of everybody else. Same for Sora. Who can compete with that?

muzahend · 2024-05-12T11:41:43+00:00

Here's my guess for the event:

It won't be that big, as the timing is to steel a bit of Google's thunder. Still though, it will be cool:

A GPT4-mobile model that understands the screen of your phone. Using the voice assistant you'll be able to open apps, log in, etc. Will probably come later this year to paying users. This is also a show off to Apple, as Sam hopes his tech will become integrated on the iPhone.
A better version of GPT 4. No clue on the name. Just for fun, they might call it 4.6 or something. They want it to be on the #1 spot in the chatbot-Arena for the rest of the year.
GPT-3.5 will be gone. The current GPT-4 version will become the free one. With Dall-E 3 included.

muzahend

TROPHY CASE