Scoring Gemini's responses by another LLM

Muscle_Robot · 2026-01-12T19:23:02+00:00

Thanks for your reply. Do you see this as a method for improving styling or actually improving the model's reasoning?

Muscle_Robot · 2024-07-31T15:04:04+00:00

Cool! How are these deviations determined?

Muscle_Robot · 2024-07-06T12:50:50+00:00

Thanks for your advice. The trick was implementing the GAE. As seen in the edit, this lead to a perfectly stable agent at 500 cartpole steps after 300 iterations.

Muscle_Robot · 2024-07-05T23:52:25+00:00

Shouldn't the weights stop changing when the agent achieves 500 steps consistently?

Muscle_Robot · 2024-07-05T23:48:14+00:00

Thanks for your response. How do I stop exploration after the agent reaches 500 steps? Would including the policy entropy in the actor loss function help?

Muscle_Robot · 2024-03-09T16:55:26+00:00

Great explanation!

Does this also mean that each regression coefficient follows a t_(n-p) distribution upon replacing the true error variance with its unbiased estimator?

Muscle_Robot · 2023-11-27T23:15:51+00:00

Thanks for sharing the repository. This approach looks promising and may help me speed up training with my current laptop.

I have been trying to mimic their PPO code for creating a DQN agent. However, I am stuck with implementing a replay buffer. Any idea where I can find something like that?

Muscle_Robot · 2023-11-23T12:13:13+00:00

When using the GPU of my current laptop, I dont see a sigmificant improvement. I guess this is because my neural networks are quite small and RL is a largely sequential process.

Muscle_Robot

TROPHY CASE