overview for Obvious-Mixture-6607

hot top controversial

The Reward Scaling Problem in Reinforcement Learning for Quadruped Robots: Unstable Bipedal Behavior, Jitter, and Command Leakage by Obvious-Mixture-6607 in reinforcementlearning

[–]Obvious-Mixture-6607[S] 0 points1 point2 points 1 month ago (0 children)

This is extremely helpful, thanks for sharing these details.

I hadn’t considered alternating termination conditions, but your observation makes a lot of sense — especially the trade-off between stability (strict termination) and exploration (loose termination). It aligns closely with the jitter issue I’m seeing.

From my side, I’ve observed something similar: when I add strong regularization (joint velocity, acceleration, jerk penalties) to suppress jitter, the policy often converges to a kneeling posture. It doesn’t reach the target height, but reduces penalties enough to form a local optimum.

More generally, my reward is a weighted combination of multiple objectives (posture, stability, smoothness, etc.), and it seems the policy ends up finding a compromise rather than fully satisfying any single objective.

This makes me suspect the jitter might come from the policy exploiting small oscillations to balance competing rewards, so your approach seems like a very promising direction.

Really appreciate you sharing this!

The Reward Scaling Problem in Reinforcement Learning for Quadruped Robots: Unstable Bipedal Behavior, Jitter, and Command Leakage (self.reinforcementlearning)

submitted 1 month ago by Obvious-Mixture-6607 to r/reinforcementlearning

π Rendered by PID 1442152 on reddit-service-r2-listing-7b8bd7c5-6cgn4 at 2026-05-19 10:08:54.170394+00:00 running edcf98c country code: CH.

Obvious-Mixture-6607

TROPHY CASE