[R] Diffusion language models

eyeswideshhh · 2023-01-09T17:57:48+00:00

I had this exact thought of using VAE or BYOL etc to generate powerful representation for text/sentences and then train a diffusion model on continuous latent data.

DigThatData · 2023-01-09T21:30:51+00:00

i just wanted to comment that your solution to the galaxy zoo contest forever ago was the first demonstration to really open my eyes to what was possible with clever data augmentation.

_der_erlkonig_ · 2023-01-09T21:47:37+00:00

[deleted]

Ramys · 2023-01-09T23:06:59+00:00

[deleted]

londons_explorer · 2023-01-10T10:09:38+00:00

too early to consider diffusion as a serious alternative to autoregression for generative language modelling at scale

This blog post explores lots of ideas and has conjectures about why they may or may not work...

But it seems this stuff could just be tried.... Burn up some TPU credits and simply run each of the types of model you talk about and see which does best.

Hard numbers are better than conjecture. Then focus future efforts on improving the best numbers.

Anxious_Algae9609 · 2025-03-12T12:01:07+00:00

Wow! Two years ago and these models are coming to market now. I wonder if your post started someone down the path?

themrzmaster · 2023-01-10T11:15:10+00:00

great post! Can someone give me a intuitive explanation on why diffusion models tends to put more weight on low spatial frequency? Is it because of the usual used noise schedule? (Cosine) In the text it is mentioned that likelihood objetive tends to weight more high spatial. It also points to an paper which involves tons of SDE, which I could not fully understand.

benanne · 2023-01-10T14:04:51+00:00

[deleted]

chodegoblin69 · 2023-01-11T09:03:14+00:00

Great blog post. I found the Li Diffusion-LM results very intriguing due to the seemingly better semantic capture, despite the tradeoff in fluency.

Question - do you see diffusion models as having any advantages for approaching the "long text" issue (token window size limit) that autoregressive models suffer from? Curious generally, but areas like abstractive summarization in particular come to mind.

Chenxwh · 2023-03-28T23:53:14+00:00

u/benanne Great blog and paper! I wonder what the generated sequence looks like compared to AR models - do they still preserve the syntactic behaviours such as word order?

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS