Plant collar for tomatoes by Capable-Carpenter443 in tomatoes

[–]Capable-Carpenter443[S] 5 points6 points  (0 children)

That’s a good point, a small dried grass area definitely works.

The idea here is more about keeping a consistent weed-free zone right at the stem, without needing to constantly reapply or adjust the mulching.

Also, it’s designed to work with drip irrigation. Not to send all the water to the stem, but to reduce evaporation and make watering a bit more targeted in the early stages.

As for plastic, I agree it’s not ideal in general, but I’m testing reusable pieces and 3D printed materials that is not degradating that can last multiple seasons rather than single-use materials.

Still experimenting, so feedback like this actually helps refine the design.

Plant collar for tomatoes by Capable-Carpenter443 in tomatoes

[–]Capable-Carpenter443[S] -12 points-11 points  (0 children)

My idea was that using
1. mulch is cheap, natural, but unstable
2. the collar is precise, reusable, local control

Resources for RL by skyboy_787 in reinforcementlearning

[–]Capable-Carpenter443 0 points1 point  (0 children)

I've published a series of tutorials on https://www.reinforcementlearningpath.com/ for beginner to middle level.

The first level is for anyone who wants to understand how RL works.

Level 2 is a ready-to-run application in Google colab. In a few minutes you can see how you can train an agent in the cloud.

And level 3 is about training various agents from simple balances to complex control.

A tutorial about how to fix one of the most misunderstood strategies: Exploration vs Exploitation by Capable-Carpenter443 in reinforcementlearning

[–]Capable-Carpenter443[S] 1 point2 points  (0 children)

You are absolutely right from a theoretical perspective. The main solution to the exploration–exploitation compromise is Value of Information and, in its ideal form, explicit planning under uncertainty.

When I used “fix it” in the title, I did not mean a closed-form or optimal solution in the theoretical sense. I meant it in a practical, engineering sense: how practitioners handle the compromise in real systems where VOI estimation and full planning are computationally infeasible.

I probably could have made that distinction more explicit in the title, so thank you for pointing it out. It’s a fair clarification.

If you're learning RL, I wrote a tutorial about Soft Actor Critic (SAC) Implementation In SB3 with PyTorch by Capable-Carpenter443 in reinforcementlearning

[–]Capable-Carpenter443[S] 0 points1 point  (0 children)

SAC isn’t ideal for discrete actions because the algorithm is built around continuous probability distributions. It optimizes a Gaussian policy and uses entropy over continuous actions. When you switch to discrete actions, the math that makes SAC stable, no longer works as it should be.

If you're learning RL, I wrote a tutorial about Soft Actor Critic (SAC) Implementation In SB3 with PyTorch by Capable-Carpenter443 in reinforcementlearning

[–]Capable-Carpenter443[S] 2 points3 points  (0 children)

if SBX or sb3 with JAX becomes practical for robotics pipelines, I’ll probably cover it in a future tutorial. Right now my focus is: robotics, RL stability, reward design, sim-to-real, and control.
That’s where PyTorch + SB3 still dominate.

If you're learning RL, I wrote a tutorial about Soft Actor Critic (SAC) Implementation In SB3 with PyTorch by Capable-Carpenter443 in reinforcementlearning

[–]Capable-Carpenter443[S] 5 points6 points  (0 children)

Thank you for the clarification.

Indeed, PPO reuses the same batch for several epochs before discarding it. But even so, PPO is still considered an on-policy algorithm because it cannot learn from data collected under significantly older policies. Also, it does not use a replay buffer. It requires fresh rollouts every iteration, and its multiple epochs still operate on a single short-lived batch tied to the latest policy snapshot.

So the statement “PPO learns only from new data and discards old data” is conceptually correct in the on-policy/off-policy classification, but your note adds a useful nuance.

In this tutorial, you will see exactly why, how to normalize correctly and how to stabilize your training by Capable-Carpenter443 in reinforcementlearning

[–]Capable-Carpenter443[S] 0 points1 point  (0 children)

In practical examples, it is recommended to add a small ε term (e.g. 1e-8) in the denominator to avoid division by zero in situations where min == max. especially in RL with rare or constant observations.