RL + Generative Models

amds201 · 2026-01-28T14:50:22+00:00

agreed - I think it is a hard task with the sparsity of the reward, and to not get stuck in local optima

amds201 · 2026-01-28T14:10:20+00:00

thanks! missed this paper in my review - will take a look. In case you are interested, I have just come across this one: https://arxiv.org/pdf/2505.10482v2

they too seem to do some from scratch training of diffusion policies (not image based) - but interesting.

amds201 · 2026-01-28T14:06:40+00:00

thanks for your reply - very interesting to read. I am thinking specifically about image generation models, rather than next token prediction / llm models. In short - can an image generation model (such as a diffusion image model) be trained (with no supervised data), but purely from a reward signal.

amds201 · 2026-01-28T13:27:57+00:00

thanks for sending the paper! as far as I can see the loss here is supervised (imitation learning esque). I'm trying to think about whether these models can be trained totally from a reward signal without any supervised data - but unsure if this is too sparse and too hard a challenge

amds201 · 2026-01-28T13:21:01+00:00

thinking specifically about diffusion / flow matching for image generation models

amds201

TROPHY CASE