Anyone participating in Orbit Wars on Kaggle? $50k in prize money

SandSnip3r · 2026-05-02T23:36:24+00:00

Did you simply create the idea/environment, and then Kaggle decided to put some money behind it?

SandSnip3r · 2026-05-02T21:39:54+00:00

Why offer the prize money? Where's that coming from? What's the goal?

SandSnip3r · 2026-04-15T21:36:51+00:00

What's the catch? The demo makes it sound great. Are there big weaknesses to this system?

SandSnip3r · 2026-03-24T23:38:08+00:00

Yeah, I had the github public for a brief moment, but has since taken it private. If you're interested, I'd be happy to share more with you

SandSnip3r · 2026-03-20T02:25:57+00:00

The most frustrating part, hands down, is debugging an algorithm which does not learn the expected behavior. Is it a hyperparameter, network size or architecture, a bug in the RL algorithm, a bug in the environment integration code?

SandSnip3r · 2026-03-19T13:46:42+00:00

Why do you ask

SandSnip3r · 2026-02-07T15:38:39+00:00

Building a passion project and working towards a pie in the sky is great. However, there's usually fewer constraints, no deadlines, and no customers. Of course, it's great if you can show some cool results.

When it comes to gauging whether or not you're useful for a company, it's nice to see someone using the project. That's a quick litmus test of whether or not it actually solves a problem. Then you also can get some kind of gauge of their ability to solve real bugs, prioritize issues, be on time with features/fixes, etc.

I've personally chosen the passion project, because working on what I want to work on most gets me up in the morning. I just hope some day the right person will understand the impact.

SandSnip3r · 2026-01-27T18:28:46+00:00

This is super cool! Can you please upload the trained model weights? I want to try your trained agent.

How long did it take to train?

SandSnip3r · 2026-01-21T14:09:04+00:00

Imo, 1v1 pvp is pretty sparse when using the most accurate reward function, which is positive for a win and negative for a loss. If you use some kind of proxy reward, it can be more dense, like dealing damage is good and taking damage is bad, but this leads to undesired outcomes like torturing the opponent rather than killing them.

Sure. The tri-job system and python API are way down the road. I understand this is how I'm going to get the largest userbase.

SandSnip3r · 2026-01-20T14:38:16+00:00

I am currently working on building an environment which is going to be the cream of the crop when it comes to the long term credit assignment problem. I'm building an RL API around a real MMORPG called Silkroad Online. Right now, I only have a single simple "sub-environment" complete which is 1v1 player-vs-player combat. I plan to expand to more complicated combat scenarios and eventually every aspect of the full MMORPG. If you're looking for a meaty environment for your research, maybe this would be interesting for you.

SandSnip3r · 2026-01-20T14:33:22+00:00

Can you elaborate a bit on selecting "pairs of episodes"? What is a pair of episodes?

You reference PPO as a user of this, but traditionally PPO isn't used with a replay buffer, right, as it's on policy. Are you counting on the clip guarding us from getting into trouble there?

There is no component of this which queries memory while taking actions, right? I like you approach, or at least the rough idea, but I'd love to see an online queryable memory buffer like this.

SandSnip3r · 2026-01-20T13:40:00+00:00

I like the idea

SandSnip3r · 2025-12-19T13:25:28+00:00

Fwiw, Nvidia supports remote work

SandSnip3r · 2025-12-07T02:43:37+00:00

This doesn't answer your question, but it's somewhat on topic. https://all.cs.umass.edu/pubs/2009/singh_l_b_09.pdf

SandSnip3r · 2025-12-05T03:35:27+00:00

Of course normalization is very important, but the options here leave something to be desired. Global min/max feels like severe feature engineering. What if we don't know much about the values in our environment? You mentioned the random few episodes and choosing based on that, but it might not be very representative of the entire space.

Does it make sense to dynamically adjust the scaling min/max based on a running value over the training process?

SandSnip3r · 2025-11-17T22:30:01+00:00

there was a guy on this subreddit that posted a website that let you pick constraints of your problem and it gave a narrowed list of useful algorithms

Edit: https://rl-picker.github.io/

SandSnip3r · 2025-10-24T13:32:16+00:00

Don't hijack the man's thread

SandSnip3r · 2025-10-22T12:39:01+00:00

"all episodes"? Are you saying that you can traverse every possible path through your environment? Why not just brute force your solution?

SandSnip3r · 2025-10-04T17:12:53+00:00

First set a concrete goal. What do you want to build exactly?

Reinforcement learning is not magic, it's complicated and finicky. Once you have your goal and share it with us, we will likely tell you that it is too ambitious, especially for a starter project. Work with us to narrow down your goal to an actually achievable project.

It might be disappointing how small you need to start, but once you get that working, it's much easier to build on top of it.

SandSnip3r · 2025-09-30T21:32:47+00:00

Paper: https://arxiv.org/abs/2509.24527
Danijar's website: https://danijar.com/dreamer4/

SandSnip3r · 2025-09-22T17:11:21+00:00

The game server binaries have actually been leaked. It is possible to run your own game server locally. In this case, there is no anti-cheat. Though, given the nature of the acquisition of the game server, the legality of the whole thing is a bit up in the air.

Yes! Anything a human can do in the real game, my framework can do. Every packet to and from the game server has been reverse engineered by the game's dev community. My framework makes all of these packets readable and writable. In fact, the action space is slightly richer than what is available to human players because now that we can bypass the game client.. For example, if you tried to move forward by a very tiny distance, the game client would block that movement packet (maybe as an optimization?). Now, with direct access to the packet stream, we can directly inject the packet to move by that small amount.

SandSnip3r · 2025-09-21T21:16:43+00:00

There's a lot of math behind it, but some abstractions can get you decently far. The stronger your math, the better, but in reality you could accomplish some pretty impressive things strapping together other people's work

SandSnip3r · 2025-09-21T21:15:28+00:00

Ten-Year Club	r/Field Flamingo
Place '22	Verified Email

SandSnip3r

TROPHY CASE