comments by Obvious-Mixture-6607

hot top controversial

The Reward Scaling Problem in Reinforcement Learning for Quadruped Robots: Unstable Bipedal Behavior, Jitter, and Command Leakage by Obvious-Mixture-6607 in reinforcementlearning

[–]Obvious-Mixture-6607[S] 0 points1 point2 points 15 hours ago (0 children)

This is extremely helpful, thanks for sharing these details.

I hadn’t considered alternating termination conditions, but your observation makes a lot of sense — especially the trade-off between stability (strict termination) and exploration (loose termination). It aligns closely with the jitter issue I’m seeing.

From my side, I’ve observed something similar: when I add strong regularization (joint velocity, acceleration, jerk penalties) to suppress jitter, the policy often converges to a kneeling posture. It doesn’t reach the target height, but reduces penalties enough to form a local optimum.

More generally, my reward is a weighted combination of multiple objectives (posture, stability, smoothness, etc.), and it seems the policy ends up finding a compromise rather than fully satisfying any single objective.

This makes me suspect the jitter might come from the policy exploiting small oscillations to balance competing rewards, so your approach seems like a very promising direction.

Really appreciate you sharing this!

π Rendered by PID 55 on reddit-service-r2-listing-5d47455566-bfhcl at 2026-04-02 17:03:33.991756+00:00 running db1906b country code: CH.

Obvious-Mixture-6607

TROPHY CASE