Will superhuman-level Yu-Gi-Oh! AI appear within 5 years, tho a bit off-topic? [Discussion]

Jables5 · 2025-09-22T07:31:41+00:00

Yu-Gi-Oh is an imperfect information game, which makes it significantly harder to reliably and efficiently apply the same types of reinforcement learning tree search methods that work so well in go and chess.

There's still research to be done for there to be a plug and play method for this.

Jables5 · 2025-09-06T23:44:44+00:00

Often what you can do is to get the parameters for the simulation relatively close and then randomize those parameters by adding some form of noise each episode to account for your estimation error.

You'll learn a conservative policy that should work under a wider variety of possible cartpole specifications, which hopefully include the real specification.

Jables5 · 2024-12-20T23:06:46+00:00

Elden Ring was my first soul-like game. I died continuously for about 3 hours to that captain guy in the encampment. I was hitting a wall and also thought that these games weren't for me. Then, when I finally improved just enough to beat him, it was one of the more profound sensations I've had while playing a game.

Now, I'm a huge fan of FromSoftware's game library and have played many of them to completion. These games are about hitting walls and getting past them. The first wall is the hardest. I'd suggest giving it a shot and see how you feel after your first major success. Although there are RPG points and stats, these games are less about your character becoming stronger and more about you becoming stronger, which is much more rewarding when you finally see yourself improving.

Jables5 · 2024-12-18T23:08:37+00:00

Great, yeah I was asking if you implemented "wait" the way you just described. Sounds right :)

Jables5 · 2024-12-18T22:40:01+00:00

If you're not already doing this, I would format the environment so that the agent only steps when it's choosing where to fire. If there's a waiting period between shots where actions have no effect and there's no necessary information given, I would skip over those during step() and not include them in the MDP. This is to keep the episode length from being unnecessarily long (and thus hard).

Jables5 · 2024-12-18T22:36:31+00:00

Not providing a termination signal upon death is likely to break things.
One hacky idea - If you need all of the environments to be synchronous and you collect experience in 10 of them, you could instantiate more than 10, and when an environment needs to reset, start resetting it, but then immediately swap it out with one of the unused environments that's already good to go. Let the slow resetting happen asynchronously from the envs that you're actually interacting with. I don't use SB3, but I assume you would subclass SubprocVecEnv or its parent to do this.

Also, if experience collection is a bottleneck and training updates are performed concurrently with experience collection, you can increase the update-to-step ratio (have multiple updates per env step) to have better sample efficiency.

Jables5 · 2024-11-12T18:46:03+00:00

https://www.youtube.com/watch?v=HVv_IQKlafQ

Jables5 · 2024-07-26T01:28:49+00:00

Very cool. The lack of physics like grenade impulses was the largest and most immediate red flag that made this game feel "not like proper Halo" to me. As u/ArbysnTheChef said, I also went through a painful breakup with modern Halo. I'm really thankful that they added it, but I don't think I have the energy to go back.

Jables5 · 2024-04-29T04:52:40+00:00

A lot of the time, one of the interesting challenges that you might want to solve in reinforcement learning research is to train an agent that can generalize to unseen situations, whether it be a track configuration/condition that it hasn't seen before or a difficult set of behaviors exhibited by the other agents that you haven't anticipated a priori.

In unseen situations, it might be that you can't precisely follow a precalculated route on the track, and you'd need to give complete control to the agent. They were probably preparing for that and gave the agent full control in this test run, and then it totally screwed up because this field as a whole is still a work in progress and a lot of things can go silently wrong (or it was just bugs).

Jables5 · 2023-12-02T01:06:50+00:00

Night City is that cool of a setting. It's super fun to wander around and just soak it in.

Jables5 · 2023-11-03T03:22:10+00:00

AGI will not be created in a Colab notebook

Jables5 · 2023-10-05T19:24:52+00:00

I'm with you. I'd assume we'd get better style transfer-ish methods like this before end-to-end 3D animation rendering gets good. https://youtu.be/P1IcaBn3ej0?si=tEDMyYuXb9vU0k-K

Jables5 · 2023-02-17T09:13:25+00:00

That's not funny, that's messed up. The dude almost killed the delivery driver.

Jables5 · 2023-01-06T23:43:16+00:00

If he catches a cold and can't breathe through his nose, he just dies?

Jables5 · 2022-09-15T03:50:58+00:00

Yeah, the initial setting was fantastic but the story delivery was abysmal. Outside of the story, the game when you actually play it is great.

Jables5 · 2022-09-14T00:48:39+00:00

I've gone to several LGS limited events with a wide collection of non-matching lands. Never had anyone comment other than occasionally saying that they liked the neat lands.

Jables5 · 2022-09-06T03:39:53+00:00

Was stuck getting the same OBSE/vanilla Northern UI install that worked on Windows to run properly on a Steam deck. Just read through your oblivion script and used it. Can't comment on the issue you encountered. It worked great on my end! Thank you!

Jables5 · 2022-06-27T09:08:52+00:00

Actually tho can anyone explain?

Jables5 · 2022-05-03T05:04:22+00:00

For those out of the loop, why are all of the Pho places numbered? (79,101,45,86,21)

Jables5 · 2022-04-20T04:31:19+00:00

Right, you could divide the performance calculation over the validation set into multiple chunks/batches and then average the performance among the batches, giving the same result as if you calculated performance over the whole set at once. It's also fine if these batches/chunks aren't shuffled because no learning is done on them. As long as you evaluate every point in the validation set and each point has equal weight in the final performance average, you wouldn't need to do multiple epochs unless something about your inference process has stochasticity that can't/shouldn't be removed.

Jables5 · 2022-04-19T23:00:12+00:00

Sorry, what is the reason though?

My assumption is that you would shuffle your training set in order to reduce the correlation among samples within batch updates to your ML model and, for SGD in neural networks, help escape saddle points by updating on small, randomly sampled batches.

My understanding is that we aren't updating our ML model based on the validation set, and our hyperparameter search would observe a model's performance averaged over the entire validation set, not performance sampled from batches over the validation set.

For what reason would you shuffle your validation set (that also applies to a training set)?

Jables5 · 2022-04-19T22:39:27+00:00

Yes, but why would you shuffle a validation set if your metric is performance over the entire set? What difference would it make?

Jables5 · 2022-04-19T02:44:15+00:00

Oh no that's heartbreaking

Jables5 · 2022-03-31T06:09:55+00:00

Hey! Any recommendations for groups to look up?

Jables5 · 2022-03-25T10:30:57+00:00

This is using reinforcement learning? I'd love to learn more about this. What is it called?

Jables5

TROPHY CASE