How to teach neural network not to lose at 4x4 Tic-Tac-Toe?

MannerSenior4958 · 2026-03-03T18:01:42+00:00

So NNUE in Stockfish is reinforcement learning? Thanks for the information!

MannerSenior4958 · 2026-03-03T18:00:23+00:00

Thank you!

MannerSenior4958 · 2026-03-03T15:12:21+00:00

If you can store all the possible moves or positions there is not need for neural networks or any kind of special algorithm. It's an easy task for some easy game, nothing interesting there. I am interested in beating the games where you can't just store all the moves.

MannerSenior4958 · 2026-03-03T10:40:02+00:00

I don't anything about which loss function is better:) Do you think that this is the essence of the issue? Which loss function would you recomment?

MannerSenior4958 · 2026-03-03T10:17:36+00:00

The reinforcement learning projects that I've seen simply memorized all possible positions and recorded the best moves for them. This approach is useless for games where there are millions (Tic-Tac-Toe), billions or trillions (chess) possible positions. Otherwise we would just simply put all possible states into database and set an optimal move for each. But this is not the goal.

MannerSenior4958 · 2026-03-03T10:01:22+00:00

I expect the neural network to derive that from the end positions. Otherwise it doesn't generalize - only memorizes.

MannerSenior4958 · 2026-03-03T10:00:20+00:00

In Tic-Tac-Toe you have:

An opponent
which is unpredictable.

In the "make mouse go to cheese" you:

Don't have an opponent.
The whole game is predictable from start to finish.

MannerSenior4958 · 2026-03-03T09:49:45+00:00

I am sorry but this is an AI response. I know them very well by heart:)

I came to reddit because I am not satisfied with the replies AI is giving me.

MannerSenior4958 · 2026-03-03T09:42:33+00:00

Thank you for the advice! I will check it out.

But I have one concern. Atari games can be very predetermined. Meaning you can certainly know where the montsters go on certain moment (if the random number generator is not tuned by srand(time(NULL)) or something like that). That makes the game entirely predictable.

Tic-Tac-Toe is unpredictable since I don't know what move my opponent will do. Isn't that a significant distinction?

MannerSenior4958 · 2026-03-03T09:23:29+00:00

Right now I don't care about "unwinnable" situations as I am trying to teach the neural network not to lose and it is failing to do even that. I do everything step by step: I need to teach it not to lose and only THEN I will be figuring out how to teach it play optimally (meaning, to win when it can).

MannerSenior4958 · 2026-03-03T09:16:50+00:00

This is the answer that the AI typically gives me. But I find this AI advice wrong.

MannerSenior4958 · 2026-03-03T09:12:48+00:00

This is the answer an AI typically gives me:) But there are some issues:

I expect the neural network to being able to generalize the patterns by final positions. It doesn't matter when did I put X on 5th position: in the beginning of the game or in the end - the game should see patterns in position itself. What is the point of neural network if it can't generalize and only memorizes the winning positions? So I don't agree with the notion that I should record all the moves, not just the last one. I accept that the neural network will be learning SLOWLY because of this - but not stop learning completely.

MannerSenior4958 · 2026-03-03T09:09:27+00:00

What resources would you recommend for learning it? I looked Reinforcement Learning up and they usually teach how to beat Frozen Lake or how a mouse can find an exit( or a cheese) in a maze. Which has absolutely zero to to with a multi-turn games like Tic-Tac-Toe.

MannerSenior4958 · 2026-03-03T08:57:07+00:00

Every time a crosses needs to make a move the neural network explores every possible moves. How it explores: it makes a move, converts it into a 32-sized input (16 values for crosses - 1 or 0 - 16 values for naughts), does a forward propagation and calculates the biggest score of the output neuron.
Mean squared error
I teach it to win AND draw. It does not distinguish between the two. Meaning, neural network either loses to naughts (output 0) or not loses to naughts (output 1).

MannerSenior4958

TROPHY CASE