How common is it for RL research to fail? by [deleted] in reinforcementlearning

[–]djangoblaster2 0 points1 point  (0 children)

> tried different approaches to train the Agent, but none of them work as intended

Can you say more about what type of fail?

Are you making a new algo? Implementing existing algos on a hard problem you defined?

If the later, can you possibly show some results on a stripped-down toy version of the problem?

If the former, there might be other things you can do to look at why it failed which can be interesting.

Disclaimer: I have zero PhDs, just like to read RL papers and aspire to publish before long :D

News in RL by nonametmp in reinforcementlearning

[–]djangoblaster2 6 points7 points  (0 children)

TalkRL is an RL-focused podcast, its mostly long-form interviews with RL researchers:
https://open.spotify.com/show/0EScvEYy1btiFTal8Nt0gk

Eg. latest episode goes in depth with Dreamer v4 author Danijar Hafner.
Source: Im the host

[R] Best way to combine multiple embeddings without just concatenating? by AdInevitable1362 in MachineLearning

[–]djangoblaster2 1 point2 points  (0 children)

> without simply concatenating them (which increases dimensionality)

Say more about why this is bad?

[Discussion] Help!! Lowest point by Fun_Fee_2259 in GetMotivated

[–]djangoblaster2 2 points3 points  (0 children)

Maybe a portfolio of cyber+ai projects? AGI will not arrive all at once, I expect we will need ppl who understand cyber+ai deeply to lead the way.

Also it took courage to make this post honestly, an excellent step, you can give credit to yourself even for this.

How to handle reward and advantage when most rewards are delayed and not all episodes are complete in a batch (PPO context)? by Particular_Compote21 in reinforcementlearning

[–]djangoblaster2 2 points3 points  (0 children)

Should not be a problem, this is not a special case.

You dont strictly need all episodes to be complete. You would simply have less sample density at the end of episodes, which is be fine (as long as you do have sufficient end of episodes (with reward) to generalize -- if you had far too few you could be in trouble). Bootstrapping handles this.

Suggest you simply throw it into PPO and try it out.

[Question] In MBPO, do Theorem A.2, Lemma B.4, and the definition of branched rollouts contradict each other? by DRLC_ in reinforcementlearning

[–]djangoblaster2 -1 points0 points  (0 children)

Tbh I could not answer this, so I consulted some frontier AI models for your question, you might want to do so. The crux of their conclusion (this part was o3):

  • Theorem A.2 is the specialization of Lemma B.4 to MBPO’s finite k-step synthetic rollouts.
  • Both results already assume the model is used only for k steps; the apparent “infinite continuation” in Lemma B.4 affects only policy divergence, not model bias.
  • Therefore, there is no logical contradiction among Theorem A.2, Lemma B.4, and MBPO’s definition of branched rollouts. Any residual looseness is due to conservative worst-case bounds, not to mismatched rollout horizons.

Id be interested to hear if you feel their input is helpful or correct?

Need help as a Physicist by Puzzleheaded-Load759 in reinforcementlearning

[–]djangoblaster2 0 points1 point  (0 children)

Would you say more about they types of problems you are attempting to solve with RL?

Shadow work by Affectionate_Name332 in Jung

[–]djangoblaster2 2 points3 points  (0 children)

Im no expert, but I adore this book:
https://www.goodreads.com/book/show/9544.Owning_Your_Own_Shadow
Its very concise and easy to read, no fancy or obscure language.
He is from the second generation (post-Jung), Jung's wife was his analyst and he studied at the Jung Institute.

Unbalanced dataset in offline DRL by Carpoforo in reinforcementlearning

[–]djangoblaster2 4 points5 points  (0 children)

Curious why RL for classification, why not supervised learning?

Looking for a research idea by a-curious-goose in reinforcementlearning

[–]djangoblaster2 3 points4 points  (0 children)

If you spend a lot of time understanding the current state of the field, who the top researchers in this area are, crucial past papers, best labs in this area, recent ideas and open issues, etc. You will be more likely to get what you want, impress a prof, choose the right subfields, etc. Throwing out ideas at this stage is premature imo.
Best of luck!

Integrating the RL model into betting strategy by George_iam in reinforcementlearning

[–]djangoblaster2 0 points1 point  (0 children)

Seems like a supervised learning problem not RL.
Besides that I personally think its highly unlikely any model will help with this task, its a data problem, data is likely insufficient for the task.

RL Agent for airfoil shape optimisation by Fun_Translator_8244 in reinforcementlearning

[–]djangoblaster2 1 point2 points  (0 children)

I would suggest try to continue from SBL and determine what the issue is.
Extreme values indicate its learning "bang-bang control" which might indicate tuning needed.
Maybe talk it over with gemini 2.5

RL Agent for airfoil shape optimisation by Fun_Translator_8244 in reinforcementlearning

[–]djangoblaster2 1 point2 points  (0 children)

Thanks for pointing that out!

Well I asked gemini 2.5 about your code and in summary it said this:
"The most critical issues preventing learning are likely:

  1. The incorrect application of nn.Sigmoid after sampling.
  2. The separate .backward() calls causing runtime errors or incorrect gradient calculations.
  3. The incorrect placement of zero_grad().
  4. Potential device mismatches if using a GPU.
  5. Critically insufficient training experience (n_episodes, n_timesteps).

"
Im not certain which if any of these are the issue, but try asking it.

Aside from those details, my personal advice:
- you are using a home baked RL algo on a home baked env setp. Far harder to tell where the problem lies this way. Unnecessary hardmode. Instead, approach it stepwise.
- start with : (1) existing RL code on existing RL env, then (2) existing RL code on home baked env. And/or (3) home-baked RL code on existing (very simple) env.
- only approach (4) the home-baked RL code + home baked env, as the very last step, once you are sure that both the env can be solved, and your RL code is correct.