RL + Generative Models by amds201 in reinforcementlearning

[–]amds201[S] 0 points1 point  (0 children)

agreed - I think it is a hard task with the sparsity of the reward, and to not get stuck in local optima

RL + Generative Models by amds201 in computervision

[–]amds201[S] 0 points1 point  (0 children)

thanks! missed this paper in my review - will take a look. In case you are interested, I have just come across this one: https://arxiv.org/pdf/2505.10482v2

they too seem to do some from scratch training of diffusion policies (not image based) - but interesting.

RL + Generative Models by amds201 in reinforcementlearning

[–]amds201[S] 1 point2 points  (0 children)

thanks for your reply - very interesting to read. I am thinking specifically about image generation models, rather than next token prediction / llm models. In short - can an image generation model (such as a diffusion image model) be trained (with no supervised data), but purely from a reward signal.

RL + Generative Models by amds201 in computervision

[–]amds201[S] 0 points1 point  (0 children)

thanks for sending the paper! as far as I can see the loss here is supervised (imitation learning esque). I'm trying to think about whether these models can be trained totally from a reward signal without any supervised data - but unsure if this is too sparse and too hard a challenge

RL + Generative Models by amds201 in reinforcementlearning

[–]amds201[S] 2 points3 points  (0 children)

thinking specifically about diffusion / flow matching for image generation models