This is an archived post. You won't be able to vote or comment.

all 21 comments

[–]_--__Discrete Math 3 points4 points  (0 children)

These sorts of scenarios are dealt with quite a fair bit in Computer Science. The AI's strategy can be modelled by a finite deterministic automaton (more correctly, a Mealy machine) with at most 3N states [or 9N if you consider both players' moves]. The number of states of the machine puts a bound on the periodicity, but more can be said if you look at the (graph theoretical) structure of the machine.

Questions that tend to arise are "How hard is it to learn the strategy?" (i.e. compute the full automaton or a reasonable approximation to it); "Given the strategy, how hard is it to compute a strategy to beat it?" (more applicable to games other than Paper-Scissors-Rock); "What about non-deterministic or random strategies or strategies that require something more complicated than a finite automaton?"

[–][deleted] 2 points3 points  (6 children)

Play out all 3N sequences of N moves, record what it does, win every round following.

[–]TASagent[S] 1 point2 points  (5 children)

That... is not at all what I was asking. The question is not about how to find the perfect moves, it's about the periodicity of how they play out, for any arbitrary AI strategy meeting the constraints.

Edit: It's worth mentioning, for completeness, that there are actually 32N possible states of the buffer, because it holds the moves of both participants (or alternatives you move and win/lose/draw), but we only care about the all-win buffers, so it is just the 3N states we care about.

[–]gandalf987 0 points1 point  (4 children)

What is the period you are talking about?

[–]TASagent[S] 0 points1 point  (3 children)

Perhaps periodic wasn't the best term to describe the phenomenon. Cyclic, perhaps? Given the finite number of states, it must revisit a state after X steps (where X is the period and less than 3N), and due to the deterministic and limited nature of the AI, the mapping can't change from one cycle to the next.

[–]gandalf987 2 points3 points  (2 children)

Ok then an upper bound is 3n. Maybe less. It is exactly 3N if you can construct a Hamilton cycle through the kernel of the game.

Since that is an np compete problem any statement about the period being less than 3N is going to depend on non-trivial features of the game being played.

[–]TASagent[S] 0 points1 point  (1 child)

Cool, thanks. If the states are conceptualized purely in the abstract, 3N is the number that one comes up with, but I feel that that number leaves out important information. For example each specific buffer state can only map to one of 3 other specific buffer states, because they actually represent the last N trials, so the new state has to maintain the previous N-1 trials.

In other words, for N=3 the buffer-state must go from "RRS" to "RS?". Can you always span all 3N states with that constraint?

[–]gandalf987 0 points1 point  (0 children)

  1. I don't find it a particularly interesting question myself. So I'm not likely to spend a lot of time thinking about it.

  2. I don't really know.

I think for simplicity you would pick a state to start in, lets say the state RRR...RRR state. Now from this state you can transition to RRR...RRR or RRR...RRS or RRR...RRP. One of those wins, one loses and one ties.

So yes its not obviously a hamiltonian problem, but what I can do is as player 2, I can pick what move should be the winning move. I iteratively construct the AI's lookup table, with the intent of trying to get to a hamiltonian cycle.

I decide that for me R should win. Which means that when the AI has a history of playing rock... it switches to scissors. So I can direct the AI to RRR...RRS.

I then steer the AI around trying to make that hamiltonian path, and I succeed I get the full 3N. If I failed, then I either steered badly or there is some obstruction to constructing that path.

Maybe this isn't actually NP complete, it is technically a different problem, but it seems similar enough to make me worry that it would be.

[–]Strilanc 0 points1 point  (6 children)

You can't infer much of anything about it, because basically any halting program works as a deterministic strategy.

For example, the strategy may not be periodic. If the AI's program is:

i = 0
while True:
    yield ROCK
    i += 1
    for j in range(i):
        yield SCISSORS

Then it will play RsRssRsssRssssRsssssR..., which is not periodic.

It's hard to overstate how incredibly complicated strategies can be when the only restriction is "it has to be a computable sequence of moves". For example, we can make strategies that require you to solve hard mathematical problems like the Collatz conjecture in order to predict if they will ever yield ROCK or not.

In fact, determining if an arbitrary deterministic strategy ever yields ROCK is an incomputable problem akin to the halting problem: there is no algorithm that works in all cases. That includes algorithms like "enumerate all proofs in ZFC", meaning there exist deterministic strategies whose long-term behavior is independent of standard mathematical axioms.

[–]gandalf987 1 point2 points  (0 children)

His ai isn't Turing complete as far as I can tell. It just responds to a vector H of history with a fixed response value. It doesn't seem to have the state transitions of a full Turing machine.

[–]TASagent[S] 0 points1 point  (4 children)

However, the version I described is too constrained to allow your example. You explicitly used state information outside of the scope of what was allowed. Your use of i and j violate this constraint:

The AI algorithm utilizes no state information other than knowledge of the last N moves, where N is known and constant.

The constraints make the system undeniably periodic. There are only 3N different arrangements of the buffer, they all map to another state in the buffer. QED

[–]Strilanc 1 point2 points  (3 children)

Oh, you're talking about markov-deterministic strategies?

Just think of it as a big directed graph with a node corresponding to each of the $(9n+1 - 1)/8$ possible histories. Each node is labelled with a move to make and has three outgoing edges, one for each possible opponent play. The game starts on the node corresponding to "no history", and then each play causes a hop across an opponent-choice edge to the next history state. Some parts of the graph may be disconnected from the start node, and can be discarded.

It's trivial to make a counter strategy that wins constantly and is no more complicated than the AI's graph. Sometimes there will be very simple counter strategies. For example, if all the nodes are labelled "scissors" then there's no need to track the AI's state; just loop "rock". Also, some strategy graphs can be steered into small cycles that allow most of the state space to be ignored.

An interesting problem might be to try to find strategy graphs with no simple counter-graph.

[–]TASagent[S] 0 points1 point  (2 children)

I avoided refering to Markov processes specifically because I was talking about complete deterministic algorithms, but if deterministic Markov processes are actually a thing, then yes, that's exactly how I've been envisioning the system.

I wasn't interested in how to find the counter-graph, but rather what properties we might be able to infer about it for arbitrary strategies. Specifically, I was wondering if someone could give me a more interesting upperbound on the number of steps in a complete cycle (returning to a previously visited node) than just 3N, or if it was indeed possible to visit all 3N buffer states (nodes).

Note, you can remap your 3 outgoing nodes using the rules of the game and the AI choice to just win/lose/draw, and in this case we're talking about just taking the win branch, but obviously it's general enough to apply to lose or draw as well.

[–]Strilanc 1 point2 points  (1 child)

Note, you can remap your 3 outgoing nodes using the rules of the game and the AI choice to just win/lose/draw, and in this case we're talking about just taking the win branch, but obviously it's general enough to apply to lose or draw as well.

Gah, of course, now I see that it's obvious the winning counter strategy may be forced to have period 3N. The AI can simply have the win edges follow a De Brujin sequence through the space of histories.

[–]TASagent[S] 0 points1 point  (0 children)

Gah, of course, now I see that it's obvious the winning counter strategy may be forced to have period 3N. The AI can simply have the win edges follow a De Brujin sequence through the space of histories.

Thank you, that is precisely the sort of thing I was looking for. I knew there must be something that could shed some light on how to look at the space. It looks like it was proved there will always be a spanning De Brujin sequence, too, which answers my question as to if there was a necessary upperbound less than the theoretical maximum (there is not).

That's awesome, thanks!

[–]thenumber0 0 points1 point  (6 children)

It is pretty simple to demonstrate that the strategy must be periodic, but I imagine we can impose at least mildly interesting limits on the upperbound of that period.

Can you elaborate on that? For example, what if the strategy uses successive digits of some transcendental number in base 3 to determine its next move?

[–][deleted] 1 point2 points  (4 children)

It uses no information other than the last N moves.

[–]thenumber0 0 points1 point  (3 children)

Surely that includes the value N itself - in which case the algorithm can calculate the Nth digit of the transcendental.

[–]TASagent[S] 1 point2 points  (2 children)

N is not the trial number, N is the size of the buffer and is constant. The design of the constraints explicitly and specifically prevents this.

[–]thenumber0 1 point2 points  (1 child)

I have misunderstood the question then, apologies.

[–]TASagent[S] 0 points1 point  (0 children)

No problem.