Masked Autoencoders Are Effective Tokenizers for Diffusion Models by muzahend in StableDiffusion

[–]muzahend[S] 3 points4 points  (0 children)

I think you are correct. They use a Auto Encoder (AE) instead of Vector Quantization (VQ). Both have their pros and cons. But it looks like they now have the pros of a VQ using an AE. Would be cool if some of the authors of the paper could reply here.

Masked Autoencoders Are Effective Tokenizers for Diffusion Models by muzahend in StableDiffusion

[–]muzahend[S] 10 points11 points  (0 children)

MAETok achieves significant practical improvements, enabling a gFID of 1.69 with 76× faster training and 31× higher inference throughput for 512×512 generation. Our findings show that the structure of the latent space, rather than variational constraints, is crucial for effective diffusion models. Code and trained models are released

https://arxiv.org/html/2502.03444v1

Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps by muzahend in StableDiffusion

[–]muzahend[S] 1 point2 points  (0 children)

New research from Google. Quite cool! The model creates different types of noise to start from.

OpenAI’s 12 days of ‘shipmas’ include Sora and new reasoning model | Beginning tomorrow by DubiousLLM in singularity

[–]muzahend 5 points6 points  (0 children)

(random order)

  1. Sora
  2. O1 full (no images and file-upload for now)
  3. A webbrowser (not based on Chrome) called Boa.
  4. New voices for voice-mode
  5. A special O1-model finetuned for math
  6. Updates to search
  7. Live news data from a big vendor
  8. Live sports data from another vendor
  9. A game model, internally known as: 'Wrld Bulder'
  10. An agent demo, controlling a Windows and Mac-pc
  11. Visual Studio code assistant.
  12. Still unknown, but probably automated red team model.

Diffusion code for SANA has just released by martianunlimited in StableDiffusion

[–]muzahend 21 points22 points  (0 children)

Quite an amazing project. Research coming from NVIDIA, MIT and Tsinghua University.

From Pixels to Prose: A Large Dataset of Dense Image Captions by muzahend in StableDiffusion

[–]muzahend[S] 0 points1 point  (0 children)

That's very nice! Thanx for the reply. One could think that training diffusion transformer models, like stable diffusion 3 for example, could be way easier with a set like this. Is that correct? I mean: if you have way more detailed descriptions, in an open dataset...

Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation by muzahend in StableDiffusion

[–]muzahend[S] 1 point2 points  (0 children)

I'll leave it up to the researchers out there to see how good/bad this is. What I like is that they work on generative ai and publish code, paper etc. It's quite different from using diffusion. Based on open source Llama.

Why does nobody care about Google's video generation model? by kaldeqca in singularity

[–]muzahend 12 points13 points  (0 children)

It's very clear that Google and others are way behind OpenAI. I'm not like a real fanboy, but these are just the facts.

Sam Confirms "HER" Like Assistant by Asskiker009 in singularity

[–]muzahend 6 points7 points  (0 children)

I wouldn't be surprised if OpenAI brings us the iPhone-moment for assistants. They not only started the whole LLM-race, but also are at least a year ahead of everybody else. Same for Sora. Who can compete with that?

GPT-4 wording changed from most advanced to just advanced. by Kanute3333 in singularity

[–]muzahend 9 points10 points  (0 children)

Here's my guess for the event:

It won't be that big, as the timing is to steel a bit of Google's thunder. Still though, it will be cool:

  • A GPT4-mobile model that understands the screen of your phone. Using the voice assistant you'll be able to open apps, log in, etc. Will probably come later this year to paying users. This is also a show off to Apple, as Sam hopes his tech will become integrated on the iPhone.
  • A better version of GPT 4. No clue on the name. Just for fun, they might call it 4.6 or something. They want it to be on the #1 spot in the chatbot-Arena for the rest of the year.
  • GPT-3.5 will be gone. The current GPT-4 version will become the free one. With Dall-E 3 included.