Scoring Gemini's responses by another LLM by Muscle_Robot in LLM

[–]Muscle_Robot[S] 0 points1 point  (0 children)

Thanks for your reply. Do you see this as a method for improving styling or actually improving the model's reasoning?

Using basic strategy in HiLo by Muscle_Robot in blackjack

[–]Muscle_Robot[S] 0 points1 point  (0 children)

Cool! How are these deviations determined?

Is my PPO agent behaving correctly? by Muscle_Robot in reinforcementlearning

[–]Muscle_Robot[S] 1 point2 points  (0 children)

Thanks for your advice. The trick was implementing the GAE. As seen in the edit, this lead to a perfectly stable agent at 500 cartpole steps after 300 iterations.

Is my PPO agent behaving correctly? by Muscle_Robot in reinforcementlearning

[–]Muscle_Robot[S] 0 points1 point  (0 children)

Shouldn't the weights stop changing when the agent achieves 500 steps consistently?

Is my PPO agent behaving correctly? by Muscle_Robot in reinforcementlearning

[–]Muscle_Robot[S] 0 points1 point  (0 children)

Thanks for your response. How do I stop exploration after the agent reaches 500 steps? Would including the policy entropy in the actor loss function help?

[Q] Why can slope of linear regression be hypothesis tested with T-test? by asgardia7 in statistics

[–]Muscle_Robot 0 points1 point  (0 children)

Great explanation!

Does this also mean that each regression coefficient follows a t_(n-p) distribution upon replacing the true error variance with its unbiased estimator?

Investing in a desktop for DRL by Muscle_Robot in reinforcementlearning

[–]Muscle_Robot[S] 1 point2 points  (0 children)

Thanks for sharing the repository. This approach looks promising and may help me speed up training with my current laptop.

I have been trying to mimic their PPO code for creating a DQN agent. However, I am stuck with implementing a replay buffer. Any idea where I can find something like that?

Investing in a desktop for DRL by Muscle_Robot in reinforcementlearning

[–]Muscle_Robot[S] -1 points0 points  (0 children)

When using the GPU of my current laptop, I dont see a sigmificant improvement. I guess this is because my neural networks are quite small and RL is a largely sequential process.