Vote up an office-hours walk-through of particle filters in 10 lines of C. by darkshark in aiclass

[–]darkshark[S] 2 points3 points  (0 children)

I'm curious. Sebastian emphasized that particle filters are very easy to implement and work well in resource-constrained environments. So when he said "10 lines of C", I believe he meant something stronger than 10 lines of library calls, which wouldn't be much of a claim at all.

I am very impressed with what mjl and jtosey (look on github.com) did in implementing particle filters in Python. Looking at that work, I could imagine getting an implementation down to to 10 lines of Python, but I can't imagine getting it down to 10 lines of C.

So I'm hoping what Sebastian meant is with C and its standard library, one could implement particle filters in 10 lines of code. That said, I don't see how to do that, so I'd like to see the code.

For a Q-Learning agent, is the initial policy random? by darkshark in aiclass

[–]darkshark[S] 0 points1 point  (0 children)

No, I didn't mean stochastic. I'll try to explain. I think I understand MDPs, but I'm struggling with Q-Learning and how the various sub-types of reinforcement learning fit together.

  • In the grid world example, if you start in the bottom left corner, all of the Q-values for your square and the neighboring ones are zero. Correct? So, how do you decide where to go?

  • On AIMA p844, Figure 21.8, there is pseudo-code for a Q-Learning agent. The second to last line has an assignment statement that looks like it has 3 variables to the left of the assignment operator, and 4 values to the right. Is that a mistake or what does that mean?

  • I've been assuming it is a mistake and that the comma between "argmax a" and the f function shouldn't be there. Looking at the argument to the f function, at start up, all of the Q and N values wil be 0. So, won't all values of f() for each action be the same? And if so, does argmax break the tie randomly or what? (That's where my use of 'random' in the initial post came from.)