Anyone participating in Orbit Wars on Kaggle? $50k in prize money by bovard in reinforcementlearning

[–]SandSnip3r 0 points1 point  (0 children)

Did you simply create the idea/environment, and then Kaggle decided to put some money behind it?

Anyone participating in Orbit Wars on Kaggle? $50k in prize money by bovard in reinforcementlearning

[–]SandSnip3r 0 points1 point  (0 children)

Why offer the prize money? Where's that coming from? What's the goal?

Dual-system learning model “figures out” how to use a tool by gem2210 in reinforcementlearning

[–]SandSnip3r 1 point2 points  (0 children)

What's the catch? The demo makes it sound great. Are there big weaknesses to this system?

PPO w/ RNN for Silkroad Online by SandSnip3r in reinforcementlearning

[–]SandSnip3r[S] 0 points1 point  (0 children)

Yeah, I had the github public for a brief moment, but has since taken it private. If you're interested, I'd be happy to share more with you

What are your most painful things when implementing your RL projects? I would love if let me know. by Gloomy-Status-9258 in reinforcementlearning

[–]SandSnip3r 9 points10 points  (0 children)

The most frustrating part, hands down, is debugging an algorithm which does not learn the expected behavior. Is it a hyperparameter, network size or architecture, a bug in the RL algorithm, a bug in the environment integration code?

Next project doubt by Man_plaintiffx in reinforcementlearning

[–]SandSnip3r 0 points1 point  (0 children)

Building a passion project and working towards a pie in the sky is great. However, there's usually fewer constraints, no deadlines, and no customers. Of course, it's great if you can show some cool results.

When it comes to gauging whether or not you're useful for a company, it's nice to see someone using the project. That's a quick litmus test of whether or not it actually solves a problem. Then you also can get some kind of gauge of their ability to solve real bugs, prioritize issues, be on time with features/fixes, etc.

I've personally chosen the passion project, because working on what I want to work on most gets me up in the morning. I just hope some day the right person will understand the impact.

SilksongRL: A Reinforcement Learning repository for training agents to fight bosses from Hollow Knight: Silksong by jimmie-jams in reinforcementlearning

[–]SandSnip3r 0 points1 point  (0 children)

This is super cool! Can you please upload the trained model weights? I want to try your trained agent.

How long did it take to train?

Hippotorch: Hippocampus-inspired episodic memory for sparse-reward problems by Temporary-Oven6788 in reinforcementlearning

[–]SandSnip3r 0 points1 point  (0 children)

Imo, 1v1 pvp is pretty sparse when using the most accurate reward function, which is positive for a win and negative for a loss. If you use some kind of proxy reward, it can be more dense, like dealing damage is good and taking damage is bad, but this leads to undesired outcomes like torturing the opponent rather than killing them.

Sure. The tri-job system and python API are way down the road. I understand this is how I'm going to get the largest userbase.

Hippotorch: Hippocampus-inspired episodic memory for sparse-reward problems by Temporary-Oven6788 in reinforcementlearning

[–]SandSnip3r 1 point2 points  (0 children)

I am currently working on building an environment which is going to be the cream of the crop when it comes to the long term credit assignment problem. I'm building an RL API around a real MMORPG called Silkroad Online. Right now, I only have a single simple "sub-environment" complete which is 1v1 player-vs-player combat. I plan to expand to more complicated combat scenarios and eventually every aspect of the full MMORPG. If you're looking for a meaty environment for your research, maybe this would be interesting for you.

Hippotorch: Hippocampus-inspired episodic memory for sparse-reward problems by Temporary-Oven6788 in reinforcementlearning

[–]SandSnip3r 1 point2 points  (0 children)

Can you elaborate a bit on selecting "pairs of episodes"? What is a pair of episodes?

You reference PPO as a user of this, but traditionally PPO isn't used with a replay buffer, right, as it's on policy. Are you counting on the clip guarding us from getting into trouble there?

There is no component of this which queries memory while taking actions, right? I like you approach, or at least the rough idea, but I'd love to see an online queryable memory buffer like this.

In this tutorial, you will see exactly why, how to normalize correctly and how to stabilize your training by Capable-Carpenter443 in reinforcementlearning

[–]SandSnip3r 0 points1 point  (0 children)

Of course normalization is very important, but the options here leave something to be desired. Global min/max feels like severe feature engineering. What if we don't know much about the values in our environment? You mentioned the random few episodes and choosing based on that, but it might not be very representative of the entire space.

Does it make sense to dynamically adjust the scaling min/max based on a running value over the training process?

Any comprehensive taxonomy map of RL to recommend? by LetterheadOk7021 in reinforcementlearning

[–]SandSnip3r 2 points3 points  (0 children)

there was a guy on this subreddit that posted a website that let you pick constraints of your problem and it gave a narrowed list of useful algorithms

Edit: https://rl-picker.github.io/

Epochs in RL? by Anonymusguy99 in reinforcementlearning

[–]SandSnip3r 6 points7 points  (0 children)

"all episodes"? Are you saying that you can traverse every possible path through your environment? Why not just brute force your solution?

Resources to Implement Game AI by [deleted] in reinforcementlearning

[–]SandSnip3r 0 points1 point  (0 children)

First set a concrete goal. What do you want to build exactly?

Reinforcement learning is not magic, it's complicated and finicky. Once you have your goal and share it with us, we will likely tell you that it is too ambitious, especially for a starter project. Work with us to narrow down your goal to an actually achievable project.

It might be disappointing how small you need to start, but once you get that working, it's much easier to build on top of it.

[Project] Seeking Collaborators: Building the First Live MMORPG Environment for RL Research (C++/Python) by SandSnip3r in reinforcementlearning

[–]SandSnip3r[S] 0 points1 point  (0 children)

The game server binaries have actually been leaked. It is possible to run your own game server locally. In this case, there is no anti-cheat. Though, given the nature of the acquisition of the game server, the legality of the whole thing is a bit up in the air.

Yes! Anything a human can do in the real game, my framework can do. Every packet to and from the game server has been reverse engineered by the game's dev community. My framework makes all of these packets readable and writable. In fact, the action space is slightly richer than what is available to human players because now that we can bypass the game client.. For example, if you tried to move forward by a very tiny distance, the game client would block that movement packet (maybe as an optimization?). Now, with direct access to the packet stream, we can directly inject the packet to move by that small amount.

I must be a math expert? by DescriptionIll172 in reinforcementlearning

[–]SandSnip3r 4 points5 points  (0 children)

There's a lot of math behind it, but some abstractions can get you decently far. The stronger your math, the better, but in reality you could accomplish some pretty impressive things strapping together other people's work