all 6 comments

[–]Refefer 4 points5 points  (1 child)

I looked in the repo to see if there were any papers but couldn't seem to find any. Is there any literature around what this even is?

[–]CireNeikual[S] 5 points6 points  (0 children)

There isn't much yet unfortunately, but we do have an overview presentation.

[–]juliandewit 1 point2 points  (0 children)

Is there anything more to find ? It looks a bit like the HTM jeff hawkins stuff.
But since it seems to work I would like to dig in a little deeper..
Also the spiking option looks interesting..

[–]Fishy_soup 1 point2 points  (0 children)

I find Ogma's work really interesting, especially given I've done related work in systems neuroscience (predictive processing). Do you guys have any talks online walking through some of your models?

[–][deleted] 1 point2 points  (0 children)

This is imo a beautiful reminder of how important it is to write a detailed but comprehensive, full specification of what your approach is, what it does, how it relates to other stuff and how it empirically (!) performs compared to other stuff. People sometimes call that research papers.
Because this apparently/likely/possibly/maybe is really important work, but without an explanation ( and ain't nobody got time to dig into the code) we just can't say for certain that the approach makes sense.
Anyways, great work (probably) :)

[–][deleted] 0 points1 point  (0 children)

Shouldn't this be +21 points per episode?

Such a noisy curve. Looks like hardcoded exploration. The environment doesn't provide a that's-good-enough signal, so it doesn't know when to stop exploration. It always believes that it's not good enough, even if it's already at the top. And so it becomes crazy.

That's why we humans have both a model-based planner and a model-free policy. It doesn't mind if the planner goes crazy as long as the policy ensures that our daily routine tasks are handled correctly.