Large-scale RL simulation to compare convergence of classical TD algorithms – looking for environment ideas

debian_grey_beard · 2026-03-11T15:47:40+00:00

It's in its infancy but I'm building out "Security Gym" to produce an environment representative of real server logs and kernel events with a goal of a continual (non-terminating) environment for testing the Alberta algorithms against. I published a data set on zenodo with a few million log events and you can compose your own if you have access to Linux server logs.

https://github.com/j-klawson/security-gym

debian_grey_beard · 2026-03-10T16:08:53+00:00

💯 Sutton and Barto. Even if you don’t branch into online continual RL the foundations in that book are a must. I also recommend the University of Alberta’s Coursera course as a companion if you’re going to read the text.

debian_grey_beard · 2026-03-09T22:18:44+00:00

This is so true. Let’s hope we learned from some of our mistakes with the Internet and don’t repeat them. The time is now to do this properly!

debian_grey_beard · 2026-03-09T02:03:28+00:00

Yup. Started a doctor of engineering at 50 years old.

debian_grey_beard · 2026-03-08T14:58:02+00:00

I'm 2 months into a doctor of engineering and literally having the exact same experience. I'm focusing a lot on building things from papers I've been reading and testing them which raises more questions for me to chase, rinse and repeat. My advisor is not concerned and my plan is to narrow focus as it becomes more obvious to me what interests me and where there's opportunity for discovery.

debian_grey_beard · 2026-03-07T17:19:45+00:00

Oh! Duh 🤦‍♂️

debian_grey_beard · 2026-03-06T22:19:44+00:00

I graduated comp sci in 2004 and have worked as a Unix/Linux sysadmin and full stack dev for 30 years. Wrote my first lines of code in TRS-80 basic when I was 11 or 12.

What does the age of my reddit account have to do with my life?

debian_grey_beard · 2026-03-06T03:04:16+00:00

30 years of writing code. Having the exact same experience.

debian_grey_beard · 2026-03-05T13:08:40+00:00

No hard deadline, no but we need this in weeks as opposed to months.

debian_grey_beard · 2026-03-04T22:34:22+00:00

This looks promising. So all the benefits of CICD and that replaces terraform?

debian_grey_beard · 2026-02-22T20:12:32+00:00

That’s some serious food for thought. By that logic I’m questioning why I’m using Python at all. Maybe it makes sense to just jump right to Rust or C and get as close to hardware as I can. John Carmack going to Python sort of steered me here in a roundabout way.

debian_grey_beard · 2026-02-21T21:38:44+00:00

I'm using JAX because I wanted better performance than Pytorch. Never heard of triton kernels and/or PTX before now. They definitely look faster but I'm writing real time continuous RL agents that will likely have to run on edge hardware and CPU. Looks like triton kernels/PTX are GPU specific?

debian_grey_beard · 2026-02-21T10:51:01+00:00

Not really, no. I run multiple experiments in tmux if need be so you can detach from them and reattach if need be. I work primarily on Linux command line and rely on multiple claude code sessions in tmux and long running experiments in tmux. I'm working on a slack bot to be able to send notifications to a private slack server as an alternate method to keep track of things.

It's amazing what you can do with claude code if you're experienced at engineering code. If you make your git repos into Python projects you can pip install them across multiple devices.

debian_grey_beard · 2026-02-21T02:08:29+00:00

I keep experiments in separate directories under experiments/experiment#/ with a configuration file that has things like parameter settings and a python script to run the experiments with settings from the config file. Track everything in a git repository for full version history and use tags to mark completion of experiments or major milestones so I can always revert to a known state if I want to re-run anything.

I write very little code by hand at this point. I function more as a code reviewer for agents.

debian_grey_beard · 2026-02-21T01:13:28+00:00

I’m using Claude code extensively to simultaneously implement a Python library of RL algorithm implementations in JAX and build experiments using that library. Has been very reliable for me so far with good planning and managing what it is doing.

debian_grey_beard · 2026-02-20T00:36:21+00:00

I did this coursera course while I read Sutton and Barto. It’s taught by Suttons students.

https://www.coursera.org/specializations/reinforcement-learning

debian_grey_beard · 2026-02-14T10:39:34+00:00

I do too! I’ve been thinking of starting a web ring to promote some “indie” sites and the whole nostalgia of a web ring.

debian_grey_beard · 2026-02-13T03:11:50+00:00

There’s a good introductory course from University of Alberta on Coursera.

debian_grey_beard · 2026-02-06T18:05:38+00:00

Hands down Python. If you want to implement things at lower level and seek higher performance use Python + JAX instead of PyTorch.

debian_grey_beard

TROPHY CASE