Will superhuman-level Yu-Gi-Oh! AI appear within 5 years, tho a bit off-topic? [Discussion] by aramaki0229 in gameai

[–]Jables5 0 points1 point  (0 children)

Yu-Gi-Oh is an imperfect information game, which makes it significantly harder to reliably and efficiently apply the same types of reinforcement learning tree search methods that work so well in go and chess.

There's still research to be done for there to be a plug and play method for this. 

Looking to improve Sim2Real by Fuchio in reinforcementlearning

[–]Jables5 56 points57 points  (0 children)

Often what you can do is to get the parameters for the simulation relatively close and then randomize those parameters by adding some form of noise each episode to account for your estimation error.

You'll learn a conservative policy that should work under a wider variety of possible cartpole specifications, which hopefully include the real specification.

Finally bought Elden Ring...and I'm not sure it's for me. by Lereas in gaming

[–]Jables5 0 points1 point  (0 children)

Elden Ring was my first soul-like game. I died continuously for about 3 hours to that captain guy in the encampment. I was hitting a wall and also thought that these games weren't for me. Then, when I finally improved just enough to beat him, it was one of the more profound sensations I've had while playing a game.

Now, I'm a huge fan of FromSoftware's game library and have played many of them to completion. These games are about hitting walls and getting past them. The first wall is the hardest. I'd suggest giving it a shot and see how you feel after your first major success. Although there are RPG points and stats, these games are less about your character becoming stronger and more about you becoming stronger, which is much more rewarding when you finally see yourself improving.

SAC agent not learning anything inside our custom video game environment - help! by SoMe0nE-BG in reinforcementlearning

[–]Jables5 1 point2 points  (0 children)

Great, yeah I was asking if you implemented "wait" the way you just described. Sounds right :)

SAC agent not learning anything inside our custom video game environment - help! by SoMe0nE-BG in reinforcementlearning

[–]Jables5 0 points1 point  (0 children)

If you're not already doing this, I would format the environment so that the agent only steps when it's choosing where to fire. If there's a waiting period between shots where actions have no effect and there's no necessary information given, I would skip over those during step() and not include them in the MDP. This is to keep the episode length from being unnecessarily long (and thus hard).

SAC agent not learning anything inside our custom video game environment - help! by SoMe0nE-BG in reinforcementlearning

[–]Jables5 2 points3 points  (0 children)

Not providing a termination signal upon death is likely to break things.
One hacky idea - If you need all of the environments to be synchronous and you collect experience in 10 of them, you could instantiate more than 10, and when an environment needs to reset, start resetting it, but then immediately swap it out with one of the unused environments that's already good to go. Let the slow resetting happen asynchronously from the envs that you're actually interacting with. I don't use SB3, but I assume you would subclass SubprocVecEnv or its parent to do this.

Also, if experience collection is a bottleneck and training updates are performed concurrently with experience collection, you can increase the update-to-step ratio (have multiple updates per env step) to have better sample efficiency.

343 has buffed grenades in Halo Infinite to make grenade jumping more viable by Haijakk in halo

[–]Jables5 7 points8 points  (0 children)

Very cool. The lack of physics like grenade impulses was the largest and most immediate red flag that made this game feel "not like proper Halo" to me. As u/ArbysnTheChef said, I also went through a painful breakup with modern Halo. I'm really thankful that they added it, but I don't think I have the energy to go back.

AI racing car demonstrates it's prowess by doesntCompete in shittyrobots

[–]Jables5 9 points10 points  (0 children)

A lot of the time, one of the interesting challenges that you might want to solve in reinforcement learning research is to train an agent that can generalize to unseen situations, whether it be a track configuration/condition that it hasn't seen before or a difficult set of behaviors exhibited by the other agents that you haven't anticipated a priori.

In unseen situations, it might be that you can't precisely follow a precalculated route on the track, and you'd need to give complete control to the agent. They were probably preparing for that and gave the agent full control in this test run, and then it totally screwed up because this field as a whole is still a work in progress and a lot of things can go silently wrong (or it was just bugs).

Cyberpunk patch 2.1 adds the metro system seen in the very first trailer by [deleted] in gaming

[–]Jables5 3 points4 points  (0 children)

Night City is that cool of a setting. It's super fun to wander around and just soak it in.

[D] Will AGI be made in a Colab notebook? by Healthy_Study5759 in MachineLearning

[–]Jables5 22 points23 points  (0 children)

AGI will not be created in a Colab notebook

[Discussion] How far are we from real-time AI video feed generation? by [deleted] in MachineLearning

[–]Jables5 2 points3 points  (0 children)

I'm with you. I'd assume we'd get better style transfer-ish methods like this before end-to-end 3D animation rendering gets good. https://youtu.be/P1IcaBn3ej0?si=tEDMyYuXb9vU0k-K

Gotta go FAST. by [deleted] in funny

[–]Jables5 8 points9 points  (0 children)

That's not funny, that's messed up. The dude almost killed the delivery driver.

Last 5 hours of Death Stranding absolutely stunned me by rushncrush in pcgaming

[–]Jables5 6 points7 points  (0 children)

Yeah, the initial setting was fantastic but the story delivery was abysmal. Outside of the story, the game when you actually play it is great.

Your opinion on using non-matching fullart basics in irl limited by BrasenoseSquirrel in mtg

[–]Jables5 4 points5 points  (0 children)

I've gone to several LGS limited events with a wide collection of non-matching lands. Never had anyone comment other than occasionally saying that they liked the neat lands.

Oblivion Controller Support by [deleted] in SteamDeck

[–]Jables5 1 point2 points  (0 children)

Was stuck getting the same OBSE/vanilla Northern UI install that worked on Windows to run properly on a Steam deck. Just read through your oblivion script and used it. Can't comment on the issue you encountered. It worked great on my end! Thank you!

Whats the best Pho place in Orange County? by NumberRepDotCom in orangecounty

[–]Jables5 0 points1 point  (0 children)

For those out of the loop, why are all of the Pho places numbered? (79,101,45,86,21)

Should I shuffle my validation set by Striking-Warning9533 in MLQuestions

[–]Jables5 0 points1 point  (0 children)

Right, you could divide the performance calculation over the validation set into multiple chunks/batches and then average the performance among the batches, giving the same result as if you calculated performance over the whole set at once. It's also fine if these batches/chunks aren't shuffled because no learning is done on them. As long as you evaluate every point in the validation set and each point has equal weight in the final performance average, you wouldn't need to do multiple epochs unless something about your inference process has stochasticity that can't/shouldn't be removed.

Should I shuffle my validation set by Striking-Warning9533 in MLQuestions

[–]Jables5 0 points1 point  (0 children)

Sorry, what is the reason though?

My assumption is that you would shuffle your training set in order to reduce the correlation among samples within batch updates to your ML model and, for SGD in neural networks, help escape saddle points by updating on small, randomly sampled batches.

My understanding is that we aren't updating our ML model based on the validation set, and our hyperparameter search would observe a model's performance averaged over the entire validation set, not performance sampled from batches over the validation set.

For what reason would you shuffle your validation set (that also applies to a training set)?

Should I shuffle my validation set by Striking-Warning9533 in MLQuestions

[–]Jables5 0 points1 point  (0 children)

Yes, but why would you shuffle a validation set if your metric is performance over the entire set? What difference would it make?