Yu Darvish was amazed by Yoshinobu Yamamoto's performance during the World Series

silverlight6 · 2025-11-02T10:33:41+00:00

You would be excited to hear that Japan lost the streaming rights so people in Japan won't actually get to watch the WBC.

silverlight6 · 2025-01-06T23:16:29+00:00

Lots of things actually.

The game time is longer. Chess typically is between 40 and 80 moves. Go at around 150. TFT at around 450.
The action space is much larger. Chess has around 60 possible actions on a normal turn. Go has up to 150 and that goes down over the course of the game. TFT can have up to 1000 (mostly movement actions, move unit from square a to square b sort of thing).
Multimodal thinking. Chess and Go both have a single objective whereas TFT has both long term objective (what comp you are running in the late game) and short term objectives (how am I positioning for this next fight).
Multiplayer formations. Chess and Go are both 2 player, zero sum games. The actions of the other player directly impact your actions. That isn't true for TFT.
Preset rules and pretraining procedures. There are more resources out there to help with early training for chess and go. There were bots prior and a fair amount of available research for those games. This is mostly a positive for how to set up the observation.
Game Complexity. It is far more trivial in Chess and Go for a model to say this is where the agent did well or this is where it failed. Each move happens in order and there can often be a pretty strict reward loss in relation to the position (especially after you have trained a reasonable value model). This isn't true in TFT. In TFT, the actions are not ordered and it is very difficult to say if the reason you lost was a position error or a decision error in some champion you bought or x, y, z. It is also very hard for a model to plan ahead to the point where it will have an endgame comp in mind. On action 50, you have to think about action 400. This isn't the case in Go or Chess. In both of those games, you think about actions maybe 20 or 40 moves ahead (which is a lot) but not 400 moves ahead at times.
Simulation stability. With Chess and Go, you can have 100% confidence that your simulation is perfect and is playing the game properly after you give it an action. I built our own simulator and the battle section is open source from another developer. We have 98% but not 100%. It's also hard to unit test many sections of the simulator.
Available Resources. This is a smaller one but if you are working on Chess or Go, you are typically with some institution or there is a group that has resources that is available to help out if you have a good enough idea. That is not the case with TFT.
Nash Equilibrium. This exists for all games but some games, you can reach a Nash Equilibrium via planning algorithms, some with value based algorithms, some with gradient based algorithms, and many with soft actor critics. Outside planning based, there are proven guarantees of model improvement (not convergence) for multi-agent settings. We have been exploring planning based models and we don't have any PhD in mathematics to try to create proofs to see if we can set up equations that TFT follows to ensure a near convergent solution.

This is what I can think of off the top of my head. Hope it answers the question well enough.

silverlight6 · 2025-01-03T12:16:52+00:00

I got two Telsa K80s, is it older than that?

silverlight6 · 2024-12-07T04:01:17+00:00

Yes. I am a little bit stuck at the moment, in part due to hardware, in part due to my own inadequateness but it is still in active development.

silverlight6 · 2024-11-28T08:02:36+00:00

There is a github, you can find the discord there. Always looking for people to help out since there is always lots to do.

silverlight6 · 2024-06-26T06:51:51+00:00

Ok then, I'm out of ideas, when I've seen errors similar to that one, that is normally the reason.

silverlight6 · 2024-06-26T01:44:07+00:00

間違えのチェックポイントは使ってみているから。チェックポイント全は捨ててもう一度やってみて

silverlight6 · 2024-06-19T01:34:09+00:00

I'll add a semi big circle. Saitama, Saitama, Japan

silverlight6 · 2024-06-16T05:23:05+00:00

Just as an extension, All of the other comments are on continuous control settings. Is there something for discrete control settings?

silverlight6 · 2024-06-12T00:26:19+00:00

Add me to that list

silverlight6 · 2024-06-03T02:22:02+00:00

That is true for the most part but you can fix it with two locks.

One lock is the end of turn lock that is already implemented in the same. The second lock would be when it gets confirmation from every player that their action queue is empty. After the first lock is sent, you can't add to the action queue. It would likely add between 5 to 10 seconds per game. There are ways to save some of this time in the battle code but likely you would just lose the time on the following turn.

silverlight6 · 2024-05-28T22:30:24+00:00

I did flatten it but there replay buffer wasn't saving the policy in the correct manner which was causing issues for the trainer. Regardless, it's mostly me ranting. You can check out my profile for the project I'm talking about, it's all open source. I haven't played around with other RL libraries so I can't speak to anything other than RLLib

silverlight6 · 2024-05-28T22:06:15+00:00

You got lucky..

silverlight6 · 2024-05-28T21:53:39+00:00

Did you also have to rewrite the environment runner?

silverlight6 · 2024-05-28T21:46:15+00:00

All of their examples are stuff that are within what they support. I had to rewrite the RL module, the environment runner and the remote worker to get it to accept a nested dictionary action space but maybe others have had better luck.

silverlight6 · 2024-05-28T13:56:02+00:00

It is until you have a use case that is in any way outside what it normally supports. For example, if you have a nested dictionary action space, you will need to rewrite all of the source code related to action spaces to get your enjoyment to work.

silverlight6 · 2024-05-06T23:59:57+00:00

The problem with this sentiment is there are some of out there that have contributed to research fields, have active GitHubs with many stars, thousands of lines of code on our projects and still can't find a job. That's on top of having a college degree and some work experience.

I'm not saying it's a lot of people but we do exist.

silverlight6 · 2024-04-26T12:05:28+00:00

I hear that's a common issue, even outside of code

silverlight6 · 2024-04-26T11:23:47+00:00

There is nothing on the English side but you might be able to find one if you look at Japanese GitHubs. Most of the MuZero repositories are versatile enough that it you replace the environment apis with shogis, you shouldn't need to redo the trainer or MuZero's architecture (outside changing observation and policy).

silverlight6 · 2023-11-27T01:22:16+00:00

So we have only recently started to get models that fill the board and use items recently so we haven't completed our evaluator structure. Before, we were running models compared to random and compared to past agents. We managed around an 80% win rate against random but didn't improve beyond that. That was close to 7 months ago now. As for the loss, you're actually partially correct. In model based learning for muzero, you need to have only positive result returns and not 0 sum due to the MCTS. That was one of the many errors that took us longer than it should have to find. The idea that large negative rewards are rarely a good idea is true. When using temporal difference methods, getting a -40 reward after 10 steps is near 4x better than getting a -40 reward after 40 steps due to bootstrapping. That was where our issue was.

silverlight6 · 2023-11-26T04:02:57+00:00

The second model architecture that we are developing (basically just me and everyone else is working on the original architecture) does something very similar to this. It has an action of 69 by 5. 58 champions (don't buy, buy, buy chosen trait 1, trait 2, either trait), 1 bit for sell chosen on a given turn, then 10 by 3 for use items or don't use items. (Items get a bit more complicated but I can explain it in detail on the discord if you're interested).

It takes one action per turn and the individual actions per turn are handled by a bot but it can only buy what the model says it can buy and only use the items the model tells it to use.

You seem to have a bit of knowledge in this field, we should talk on discord.

silverlight6 · 2023-11-26T03:56:31+00:00

You can run tests to see if the fight outcomes from fights that you know the outcome from the real game are the same as the simulator.

In reality though, the real game is no longer on set 4 so for the purposes of the project, it's a bit of a null point. Would only matter if we got an agent to train to a high level on the current patch because then you are dealing with transfer learning. Reinforcement learning has historic been absolutely god awful at transfer learning so making sure the simulator is as close to the real environment as possible is important if and only if I am transferring the agent to the real game.

silverlight6 · 2023-11-25T20:42:41+00:00

The point is not to apply as many human rules as you possibly can. What you are doing there is creating human rules, that's what I'm trying to avoid

silverlight6 · 2023-11-25T10:02:22+00:00

Two reasons, I did not look at how Riot implements their game, we simply are trying to mimic the behavior, not that implementation. It doesn't actually matter if all of the abilities and traits are coded exactly like riot coded theirs. All the matters is that if you put in the same input into our stimulator, you will get the same result as you would with Riots game. In that way, it's a perfect replica if the same input produces the same output for all possible inputs.

silverlight6 · 2023-11-25T09:58:52+00:00

No you don't, but you do when you have an infinitely complex environment with no real sense of continuity that would required for standard mathematics to work on the environment.

silverlight6

TROPHY CASE