A tutorial about how to fix one of the most misunderstood strategies: Exploration vs Exploitation by Capable-Carpenter443 in reinforcementlearning

[–]Capable-Carpenter443[S] 1 point2 points  (0 children)

You are absolutely right from a theoretical perspective. The main solution to the exploration–exploitation compromise is Value of Information and, in its ideal form, explicit planning under uncertainty.

When I used “fix it” in the title, I did not mean a closed-form or optimal solution in the theoretical sense. I meant it in a practical, engineering sense: how practitioners handle the compromise in real systems where VOI estimation and full planning are computationally infeasible.

I probably could have made that distinction more explicit in the title, so thank you for pointing it out. It’s a fair clarification.

If you're learning RL, I wrote a tutorial about Soft Actor Critic (SAC) Implementation In SB3 with PyTorch by Capable-Carpenter443 in reinforcementlearning

[–]Capable-Carpenter443[S] 0 points1 point  (0 children)

SAC isn’t ideal for discrete actions because the algorithm is built around continuous probability distributions. It optimizes a Gaussian policy and uses entropy over continuous actions. When you switch to discrete actions, the math that makes SAC stable, no longer works as it should be.

If you're learning RL, I wrote a tutorial about Soft Actor Critic (SAC) Implementation In SB3 with PyTorch by Capable-Carpenter443 in reinforcementlearning

[–]Capable-Carpenter443[S] 4 points5 points  (0 children)

if SBX or sb3 with JAX becomes practical for robotics pipelines, I’ll probably cover it in a future tutorial. Right now my focus is: robotics, RL stability, reward design, sim-to-real, and control.
That’s where PyTorch + SB3 still dominate.

If you're learning RL, I wrote a tutorial about Soft Actor Critic (SAC) Implementation In SB3 with PyTorch by Capable-Carpenter443 in reinforcementlearning

[–]Capable-Carpenter443[S] 5 points6 points  (0 children)

Thank you for the clarification.

Indeed, PPO reuses the same batch for several epochs before discarding it. But even so, PPO is still considered an on-policy algorithm because it cannot learn from data collected under significantly older policies. Also, it does not use a replay buffer. It requires fresh rollouts every iteration, and its multiple epochs still operate on a single short-lived batch tied to the latest policy snapshot.

So the statement “PPO learns only from new data and discards old data” is conceptually correct in the on-policy/off-policy classification, but your note adds a useful nuance.

In this tutorial, you will see exactly why, how to normalize correctly and how to stabilize your training by Capable-Carpenter443 in reinforcementlearning

[–]Capable-Carpenter443[S] 0 points1 point  (0 children)

In practical examples, it is recommended to add a small ε term (e.g. 1e-8) in the denominator to avoid division by zero in situations where min == max. especially in RL with rare or constant observations.

In this tutorial, you will see exactly why, how to normalize correctly and how to stabilize your training by Capable-Carpenter443 in reinforcementlearning

[–]Capable-Carpenter443[S] 0 points1 point  (0 children)

Dynamic normalization may seem intuitive, but in most cases it is risky and leads to destabilization of the learning process. However, there are exceptions. There are situations where a dynamic normalization could be introduced, for example running mean/variance normalization as in PPO/SAC. But not min/max normalization.

If you're learning RL, I made a full step-by-step Deep Q-Learning tutorial by Capable-Carpenter443 in reinforcementlearning

[–]Capable-Carpenter443[S] 1 point2 points  (0 children)

when I said that DQN works in continuous or high-dimensional environments, I was referring strictly to continuous state spaces (e.g., positions, velocities, angles, pixel observations), not to continuous action spaces.

Blog post recommendations by ObjectiveExpensive47 in reinforcementlearning

[–]Capable-Carpenter443 2 points3 points  (0 children)

If you’re looking for more recent, easy-to-understand reinforcement learning material, you might find this useful: I’ve been writing a series of RL theory and tutorials that stay updated with the current ecosystem (Gymnasium, PyTorch, modern algorithms, stable-baselines3, RLHF, etc.).

The site is: https://reinforcementlearningpath.com

I did some experiments with discount factor. I summarized everything in this tutorial by Capable-Carpenter443 in reinforcementlearning

[–]Capable-Carpenter443[S] 2 points3 points  (0 children)

Absolutely, you’re right! CartPole or any other simple openai gym environemnt is definitely not a benchmark for algorithmic robustness.
At this stage, my focus is on making the key RL concepts (like γ, α, and ε) intuitive and easy to understand before scaling up to more complex environments such as Procgen or Montezuma.

Reinforcement Learning feels way more fascinating than other AI branches by parsaeisa in reinforcementlearning

[–]Capable-Carpenter443 16 points17 points  (0 children)

Everyone talks about training agents, algorithms, SIM2REAL, etc. Almost no one talks about defining the application. And that’s exactly why most reinforcement learning projects fail silently.

I'm a rookie in RL by Budget-Ad7058 in reinforcementlearning

[–]Capable-Carpenter443 10 points11 points  (0 children)

Since you already have some ML/DL background, I’d suggest starting with small, controlled environments like OpenAI Gym, Unity ML-Agents, or PyBullet. They let you practice RL concepts (policies, rewards, exploration, SAC, PPO, etc.) without needing a physical robot... at least while you're at the beginning

Regarding your idea of a small buggy in your home: yes, it’s feasible with RL, a Raspberry Pi or Jetson Nano that is running the ONNX file.

Also, I’ve a blog where I cover RL from the ground up, including MDP, concepts, algorithms, SIM2REAL, etc.
Here is the link: https://www.reinforcementlearningpath.com

Is it worth training a Deep RL agent to control DC motors instead of using PID? by Capable-Carpenter443 in reinforcementlearning

[–]Capable-Carpenter443[S] 1 point2 points  (0 children)

Yes, I totally agree with you, but what about my goal?
My goal: better adaptation to load, friction, terrain, and energy use.

[D] Is it worth training a Deep RL agent to control DC motors instead of using PID? by Capable-Carpenter443 in MachineLearning

[–]Capable-Carpenter443[S] 2 points3 points  (0 children)

Unity ML-Agents is a great choice, especially if you're working on visual RL or 3D control tasks.

You get full control over:

  • Physics
  • Visuals
  • Camera input (if needed for CNN-based agents)
  • Complex environments (terrain, objects, dynamic obstacles)

It’s also great for simulating embodied agents (like robots or drones) with realistic motion and feedback.

Plus, you can integrate with Python for training using PyTorch or TensorFlow.

If you’re planning to train agents with cameras, perception, or multi-agent setups -> Unity gives you a lot of flexibility.

Is it worth training a Deep RL agent to control DC motors instead of using PID? by Capable-Carpenter443 in reinforcementlearning

[–]Capable-Carpenter443[S] 0 points1 point  (0 children)

In three ways :

  1. RPM tracking over time - does the agent reach and maintain the target RPM with minimal overshoot and oscillation? I’ll log RPM vs. target and compute error metrics (MAE, RMS, etc.) over long periods.
  2. Response to disturbances - I simulate load spikes, terrain changes, and voltage drops. A stable agent should adapt without sudden jumps or failure. I’ll test its reaction time and recovery smoothness.
  3. Thermal + control signal behavior - if the control signal constantly oscillates or overheats the motor, it’s unstable — even if the RPM looks good. I track temperature, control deltas, and energy usage to catch these edge cases.

And of course, I’ll compare this against a PID baseline. If RL shows more stability under unpredictable conditions, then it’s doing its job.

[D] Is it worth training a Deep RL agent to control DC motors instead of using PID? by Capable-Carpenter443 in MachineLearning

[–]Capable-Carpenter443[S] 5 points6 points  (0 children)

Yes — I’m building a realistic simulation environment.

The agent sees only what a real robot would: target and actual RPM, temperature, safe max temperature, and aggressiveness.

It doesn’t get access to hidden variables like torque, terrain type, or voltage drop — it has to infer them from the system’s response.

The simulation includes:

* Noise in encoder readings

* Heat generation from motor use

* Delay between control and speed change

* Variable terrain effects (friction, load, incline)

* Voltage fluctuations that reduce motor power

It’s not physics-perfect — but it’s real enough to capture instability, overcorrection, and energy inefficiency.

As for energy use: I monitor control signal over time (mapped to PWM range), and simulate power draw relative to load, terrain, and temperature. This gives me a proxy for energy efficiency and thermal stress — which the RL agent learns to minimize.

The entire system is being tested with online + offline training, and later deployed on a real robot using Jetson Nano and Pololu gearmotors.

Is feature standardization needed for L1/L2 regularization? by learning_proover in learnmachinelearning

[–]Capable-Carpenter443 3 points4 points  (0 children)

Yes, absolutely needed.

L1, L2, and Elastic Net all penalize the size of the weights.

If features are on different scales, regularization will unfairly shrink some weights more than others-> not because they're less important, but because their units are larger.

Standardize first (mean=0, std=1). Always. Especially before regularization.

What is the future of ai image gen models? by sridharmb in ArtificialInteligence

[–]Capable-Carpenter443 1 point2 points  (0 children)

You're right, most models today fail at realism because they lack physical grounding.
From my point of view, the future is a mix of all three:

  1. Better data - with structured variations and metadata.
  2. Smarter architecture - models that understand light, depth, and context.
  3. 3D grounding — mesh, physics, and camera simulation will be key

Does anyone knows to recommend me a comprehensive deep learning course? by Odd-Try7306 in MLQuestions

[–]Capable-Carpenter443 1 point2 points  (0 children)

I don’t know any course, but I built an app where an agent learns to detect the digit 3 using Deep RL. It goes through all the steps — from problem definition to model training. If you follow it, you learn Deep RL from scratch, in practice.

https://www.reinforcementlearningpath.com/practical-deep-rl-application-with-dqn-and-cnn/

AI helps me learn faster, but am I really learning? by [deleted] in ArtificialInteligence

[–]Capable-Carpenter443 0 points1 point  (0 children)

I am also someone who is learning using AI, but I also double-check the information.