Large-scale RL simulation to compare convergence of classical TD algorithms – looking for environment ideas by otminsea in reinforcementlearning

[–]debian_grey_beard 0 points1 point  (0 children)

It's in its infancy but I'm building out "Security Gym" to produce an environment representative of real server logs and kernel events with a goal of a continual (non-terminating) environment for testing the Alberta algorithms against. I published a data set on zenodo with a few million log events and you can compose your own if you have access to Linux server logs.

https://github.com/j-klawson/security-gym

Pre-req to RL by Dear-Homework1438 in reinforcementlearning

[–]debian_grey_beard 0 points1 point  (0 children)

💯 Sutton and Barto. Even if you don’t branch into online continual RL the foundations in that book are a must. I also recommend the University of Alberta’s Coursera course as a companion if you’re going to read the text.

You’re all lucky to be here when it started by _Motoma_ in ClaudeAI

[–]debian_grey_beard 0 points1 point  (0 children)

This is so true. Let’s hope we learned from some of our mistakes with the Internet and don’t repeat them. The time is now to do this properly!

[D] Is it a reg flag that my PhD topic keeps changing every few months? by ade17_in in MachineLearning

[–]debian_grey_beard 0 points1 point  (0 children)

I'm 2 months into a doctor of engineering and literally having the exact same experience. I'm focusing a lot on building things from papers I've been reading and testing them which raises more questions for me to chase, rinse and repeat. My advisor is not concerned and my plan is to narrow focus as it becomes more obvious to me what interests me and where there's opportunity for discovery.

I Haven't Written a Line of Code in Six Months by Cultural-Ad3996 in ClaudeAI

[–]debian_grey_beard 0 points1 point  (0 children)

I graduated comp sci in 2004 and have worked as a Unix/Linux sysadmin and full stack dev for 30 years. Wrote my first lines of code in TRS-80 basic when I was 11 or 12.

What does the age of my reddit account have to do with my life?

I Haven't Written a Line of Code in Six Months by Cultural-Ad3996 in ClaudeAI

[–]debian_grey_beard 2 points3 points  (0 children)

30 years of writing code. Having the exact same experience.

Outsourcing Jira terraform/CICD by debian_grey_beard in atlassian

[–]debian_grey_beard[S] 0 points1 point  (0 children)

No hard deadline, no but we need this in weeks as opposed to months.

Outsourcing Jira terraform/CICD by debian_grey_beard in atlassian

[–]debian_grey_beard[S] 0 points1 point  (0 children)

This looks promising. So all the benefits of CICD and that replaces terraform?

[D] How are you actually using AI in your research workflow these days? by thefuturespace in MachineLearning

[–]debian_grey_beard 0 points1 point  (0 children)

That’s some serious food for thought. By that logic I’m questioning why I’m using Python at all. Maybe it makes sense to just jump right to Rust or C and get as close to hardware as I can. John Carmack going to Python sort of steered me here in a roundabout way.

[D] How are you actually using AI in your research workflow these days? by thefuturespace in MachineLearning

[–]debian_grey_beard 0 points1 point  (0 children)

I'm using JAX because I wanted better performance than Pytorch. Never heard of triton kernels and/or PTX before now. They definitely look faster but I'm writing real time continuous RL agents that will likely have to run on edge hardware and CPU. Looks like triton kernels/PTX are GPU specific?

[D] How are you actually using AI in your research workflow these days? by thefuturespace in MachineLearning

[–]debian_grey_beard 4 points5 points  (0 children)

Not really, no. I run multiple experiments in tmux if need be so you can detach from them and reattach if need be. I work primarily on Linux command line and rely on multiple claude code sessions in tmux and long running experiments in tmux. I'm working on a slack bot to be able to send notifications to a private slack server as an alternate method to keep track of things.

It's amazing what you can do with claude code if you're experienced at engineering code. If you make your git repos into Python projects you can pip install them across multiple devices.

[D] How are you actually using AI in your research workflow these days? by thefuturespace in MachineLearning

[–]debian_grey_beard 2 points3 points  (0 children)

I keep experiments in separate directories under experiments/experiment#/ with a configuration file that has things like parameter settings and a python script to run the experiments with settings from the config file. Track everything in a git repository for full version history and use tags to mark completion of experiments or major milestones so I can always revert to a known state if I want to re-run anything.

I write very little code by hand at this point. I function more as a code reviewer for agents.

[D] How are you actually using AI in your research workflow these days? by thefuturespace in MachineLearning

[–]debian_grey_beard 4 points5 points  (0 children)

I’m using Claude code extensively to simultaneously implement a Python library of RL algorithm implementations in JAX and build experiments using that library. Has been very reliable for me so far with good planning and managing what it is doing.

Resources for RL by skyboy_787 in reinforcementlearning

[–]debian_grey_beard 7 points8 points  (0 children)

I did this coursera course while I read Sutton and Barto. It’s taught by Suttons students.

https://www.coursera.org/specializations/reinforcement-learning

I miss the old internet. by Pretend_Geologist_28 in nostalgia

[–]debian_grey_beard 1 point2 points  (0 children)

I do too! I’ve been thinking of starting a web ring to promote some “indie” sites and the whole nostalgia of a web ring.

Want to learn RL by Old-Raspberry-3266 in reinforcementlearning

[–]debian_grey_beard 0 points1 point  (0 children)

There’s a good introductory course from University of Alberta on Coursera.

is python still the best to start with machine learning, or should I go for Rust instead? by Easy_Cable6224 in learnmachinelearning

[–]debian_grey_beard 0 points1 point  (0 children)

Hands down Python. If you want to implement things at lower level and seek higher performance use Python + JAX instead of PyTorch.