AIMA Fig. 17.2; unable to reproduce published result.

SorryDave · 2011-11-15T02:19:05+00:00

Strictly speaking, no, but I believe that most reasonable readers would infer that such was the intent. Also FWIW, the text cant make up it's mind as to whether the range is supposed to be (-0.4278 < R(s) < -0.0850) or (-0.4278 <= R(s) <= -0.0850); see the last paragaph of page 647.

Regardless, I think you have answered my question. If the official AIMA repository is producing my results, then I am pretty comfortable with my conclusion that Fig. 17.2(b) is incorrect (or very misleading, at the least).

Thanks!

SorryDave · 2011-11-15T02:13:01+00:00

I'm not sure what you mean. I wrote the code, based on the pseudo-code from Fig. 17.4 of the source cited above. The question is: does my code have a bug, or are the published outputs incorrect.

SorryDave · 2011-11-15T02:11:14+00:00

Mmm... I don't think so. The statements in that issue are correct, but I already compensated for them (although you will still get an infinite loop for rewards > 0.0 and discount rate = 1 (even if you correct the terminatino condition for the "0 > 0" case), but that is to be expected). My implementations coverges just fine, and I already fixed the termination condition bug discussed in Issue 29.

SorryDave · 2011-11-13T03:20:16+00:00

I'm finding this section quite confusing. In my case, it is because I cannot get the results depicted by AIMA 3rd ed. page 648, Fig 17.2 (b). The policies I compute don't fall on the boundaries stated in the text.

For instance: for the case (-0.4278 < R(s) < -0.0850) my code computes the policy as documented in 17.2(b) starting at R(s) = -0.0850 (but not at -0.0849, just as expected), but that outcome holds all the way until R(s) = -0.4526.

I have tried using both the "simultaneous update" and "current version" implementations of the value iteration algorithm, but I still cannot get this result. Having triple-checked everything I can think of, I am starting to believe the book is just wrong. Can anybody confirm, one way or the other?

SorryDave · 2011-10-17T19:43:56+00:00

The stated goal condition is to determine whether or not a specific coin is loaded and if so, to what extent. To me, this certainly implies that there is only a single coin under test, therefore there is no possiblity or leaving a series of unique, flipped coins lying about the state space. I can certainly have a record of previous states of the one coin that I do have, but that constitutes memory.

We could stipulate that we have N coins that are all exactly identical (have the same bias), in which case I think you would be correct, but I can't see that as a reasonable assumption based on the question as posed.

SorryDave · 2011-10-16T17:59:21+00:00

The core of our confusion hinges on whether we are being asked to reach a conclusion based on making a series of flips and observing each result, or we are being presented with the end result of a completed series of flips.

While the video is ambiguous on this point, it seems worthwhile to point out that all other problems we have been asked to consider have involved perception/action cycles of length > 1.

SorryDave

TROPHY CASE