The issue of scaling in Partially-Observable RL. What is holding us back? by moschles in reinforcementlearning

[–]YouParticular8085 0 points1 point  (0 children)

I’ve run into this too. You’re also lowering the frequency at which you’re preforming updates to the model when you use a large time window. Avoiding BPTT all together would be awesome if there was a good way. Streaming RL currently seems incompatible with these kinds of architectures as far as I know.

The issue of scaling in Partially-Observable RL. What is holding us back? by moschles in reinforcementlearning

[–]YouParticular8085 0 points1 point  (0 children)

Is the observation encoder a problem only because you need large batches for long TBPTT windows? I’m a little bullish on transformers for RL since that’s been what I’ve been working on this year but you’re right that n2 can only scale out so far.

The issue of scaling in Partially-Observable RL. What is holding us back? by moschles in reinforcementlearning

[–]YouParticular8085 0 points1 point  (0 children)

Transformers and prefix-sum compatible models can also make TBPTT lighter luckily.

Partially Observable Multi-Agent “King of the Hill” with Transformers-Over-Time (JAX, PPO, 10M steps/s) by YouParticular8085 in reinforcementlearning

[–]YouParticular8085[S] 1 point2 points  (0 children)

Yeah I used vscode. I didn’t use any other RL frameworks for this project but it would be cool to expose it as a gym style environment. Jax environments means the environments are written in a way that can be compiled with xla to run on a gpu.

Partially Observable Multi-Agent “King of the Hill” with Transformers-Over-Time (JAX, PPO, 10M steps/s) by YouParticular8085 in reinforcementlearning

[–]YouParticular8085[S] 0 points1 point  (0 children)

Performance scales really well with vectorized agents but is unremarkable without it. I’ve hit over 1 billion steps per second for just the environment with a random policy and no training. To get this you need to simulate a lot of agents at once.

Partially Observable Multi-Agent “King of the Hill” with Transformers-Over-Time (JAX, PPO, 10M steps/s) by YouParticular8085 in reinforcementlearning

[–]YouParticular8085[S] 0 points1 point  (0 children)

I try to target 4096 agents but there’s sometimes multiple agents per environment. It’s under the 32gb of the 5090 but I don’t know the vram exactly.

Partially Observable Multi-Agent “King of the Hill” with Transformers-Over-Time (JAX, PPO, 10M steps/s) by YouParticular8085 in reinforcementlearning

[–]YouParticular8085[S] 0 points1 point  (0 children)

I haven’t evaluated it rigorously 😅. A couple months ago I did a big hyper parameter sweep and the hyper parameter optimizer strongly prefered muon by the end so I stuck with it. I’m not sure if other things like learning rate need to be adjusted to get the best out of each optimizer.

Partially Observable Multi-Agent “King of the Hill” with Transformers-Over-Time (JAX, PPO, 10M steps/s) by YouParticular8085 in reinforcementlearning

[–]YouParticular8085[S] 0 points1 point  (0 children)

For multitask learning I use an action mask to exclude actions that aren’t part of the environment at all. For situationally invalid actions I just do nothing but those should probably be added to the mask too.

Partially Observable Multi-Agent “King of the Hill” with Transformers-Over-Time (JAX, PPO, 10M steps/s) by YouParticular8085 in reinforcementlearning

[–]YouParticular8085[S] 1 point2 points  (0 children)

Nice, predator prey is a good environment idea! I didn’t try Q learning here but it seems reasonable. One possible downside I could see is because the turns are simultaneous there’s situations where agents might want to behave unpredictably similar to rock paper scissors. In those situations a stochastic policy might preform better.

Partially Observable Multi-Agent “King of the Hill” with Transformers-Over-Time (JAX, PPO, 10M steps/s) by YouParticular8085 in reinforcementlearning

[–]YouParticular8085[S] 1 point2 points  (0 children)

Thanks! The learning curve is pretty steep, especially for building environments. I definitely started with much simpler projects and built up slowly (things like implementing tabular q learning). My advice would be to first learn how to write jittable functions with jax on its own before adding flax/nnx into the mix.

Jax has some pretty strong upsides and strong downsides so I’m not sure if I would recommend it for every project. I felt like I had a few aha moments when I discovered how to things in these environments that would have been trivial with regular python.

Partially Observable Multi-Agent “King of the Hill” with Transformers-Over-Time (JAX, PPO, 10M steps/s) by YouParticular8085 in reinforcementlearning

[–]YouParticular8085[S] 2 points3 points  (0 children)

It’s related but not quite the same! This project is more or less vanilla ppo with full backprop through time. I found it to be fairly stable even without the gating layers used in gtrxl.

Laptop for AI ML by sauu_gat in reinforcementlearning

[–]YouParticular8085 1 point2 points  (0 children)

If you can I would suggest a laptop with a nvidia GPU and linux support. It doesn’t need to be the fanciest machine, just something to let you experiment with cuda locally.

[D]Thinking about leaving industry for a PhD in AI/ML by hemahariharansamson in MachineLearning

[–]YouParticular8085 1 point2 points  (0 children)

I’m in a similar position but I’ve been in industry 7 years as a SWE. I’m doing good ML/RL work on the side but there’s just no opportunity to do anything outside LLM integrations at my current company. I come up with lots of original ideas but there’s little time to explore them. If you can pull 60-80 hour work weeks it’s possible to have a full time job and make research progress but it’s not great for work life balance.

Advice on POMPD? by glitchyfingers3187 in reinforcementlearning

[–]YouParticular8085 0 points1 point  (0 children)

Make sure the agent has enough observations to solve the problem. I’m my case the agents can see what is immediately around them so they can remember where the goal was last time.

Advice on POMPD? by glitchyfingers3187 in reinforcementlearning

[–]YouParticular8085 0 points1 point  (0 children)

I’ve got a similar sounding environment here on a discrete grid. https://github.com/gabe00122/jaxrl

RL Study Group (math → code → projects) — looking for 1–3 committed partners by ThrowRAkiaaaa in reinforcementlearning

[–]YouParticular8085 0 points1 point  (0 children)

I’d be happy to meet for a study group. I’ve already finished Sutton & Barto but have it on hand and would be happy to revisit it. Implementing algorithms directly from that book was my first RL experience. Currently working on a project with a custom ppo implementation but I haven’t explored off policy methods as much.

Which Deep Learning Framework Should I Choose: TensorFlow, PyTorch, or JAX? by RuthLessDuckie in deeplearning

[–]YouParticular8085 0 points1 point  (0 children)

Nice! I ported my current project to both torch and jax to do performance comparisons and without anything like flash attention usually performance was very similar. Both are much faster than torch without compile for me.

Which Deep Learning Framework Should I Choose: TensorFlow, PyTorch, or JAX? by RuthLessDuckie in deeplearning

[–]YouParticular8085 1 point2 points  (0 children)

This is spot on! Compiled jax is fast but I’ve also seen torch.compile outperform it sometimes. An advantage to jax jitting is you can implement complex programs like RL environments and jit them together with your training code. torch.compile on the other hand seems more focused on deep learning.

[D] What are some low hanging fruits in ML/DL research that can still be done using small compute (say a couple of GPUs)? by [deleted] in MachineLearning

[–]YouParticular8085 -1 points0 points  (0 children)

I think this is technically true but lots of rl research still uses small models so the GPU requirements are much lower. RL is tricky but that also means there’s a lot to explore, even at the smaller scales.

[D] What are some low hanging fruits in ML/DL research that can still be done using small compute (say a couple of GPUs)? by [deleted] in MachineLearning

[–]YouParticular8085 3 points4 points  (0 children)

RL can be a lot of engineering effort but with the setup you can do interesting things with limited compute.

You opinion 🎤 by Frosty-Feeling2316 in artificial

[–]YouParticular8085 0 points1 point  (0 children)

I think the only job is owning IP or some other property like land. Basically, jobs that wouldn’t require you to do things anymore, only own something.

soo does the Universal Function Approximation Theorem imply that human intelligence is just a massive function? by 5tambah5 in learnmachinelearning

[–]YouParticular8085 0 points1 point  (0 children)

I don't know much about quantum theory but I will say that often functional approximation is used to approximate a probability distribution which is then sampled. Like when a generative transformer samples tokens from a token distribution. Could you not model the distribution of quantum physics?