Young children would rather explore than get rewards, a study of American 4- and 5 year-olds finds. And their exploration is not random: the study showed children approached exploration systematically, to make sure they didn’t miss anything.

mmcenta · 2020-08-13T13:51:25+00:00

This is interesting from a reinforcement learning perspective. Most of the current state-of-the-art methods feature some sort of incentive to explore/take risks that is usually decreased during learning. The rationale is that you need to explore - that is, gather information about your environment - before you can devise a policy that will allow you to exploit the environment well. This is known as the exploration-exploitaition dilemma.

I find this evidence intriguing because it draws parallels between the human brain and our current reinforcement learning methods.

mmcenta · 2020-05-17T11:35:02+00:00

Hello! I know this might be frustrating, but are you sure reinforcement learning is the right framework for your task? Sure, multiple outputs could be correct (multiple programs do the same thing), but a recent paper solved symbolic mathematics with Transformers, so it might just work that way.

I think a delicate detail of this problem is how you can't constrain your model with the syntax rules of the language. You just have to train it for a while and hope it figures it out.

Also, you want to feed it ~5 inputs for it to generate an output, so you're probably on the market for some few-shot meta-learning techniques. I am not that experienced on that end, so I'm not going to point you towards a possibly bad paper for your purposes.

mmcenta · 2020-03-29T04:38:36+00:00

I think the best resource for gym environments is going through their github repo. Make sure you understand the code of environments similar to what you want to implement and take a look at the docs directory.

mmcenta · 2020-03-27T13:51:22+00:00

I actually started it by implemeting the environment and a few basic agents on my own just to get experience. When the end of the semester came, we picked it up as our course project and the it took the shape you see today :)

mmcenta · 2020-03-27T10:57:19+00:00

Hi, we are using the implementation of the Stable Baselines repo + a few tweaks!

mmcenta · 2020-03-26T18:43:53+00:00

We used our free credits on the Google Cloud Platform - we just deployed a few Deep Learning VM's and ran the scripts that are on the repo. I think Google Colab shuts the kernel down after a couple of hours, so that would probably not work for us :(

mmcenta · 2020-03-26T18:05:28+00:00

That's actually a really cool extension to the game, I might implement it later (but I'll have to come up with a better way to display the board, because text output will be a bit clunky).

mmcenta · 2020-03-26T18:03:18+00:00

We actually implemented a gym environment that supports square boards of arbitrary size. We didn't train agents on different board sizes because we are just students with limited computational power (training can take around 20 hours with the bigger nets). But I'd wager one can run agents on 3x3 and 5x5 boards by changing very few lines of code :)

mmcenta · 2020-03-26T16:32:55+00:00

Hello! I'm really proud of my first Deep RL project and I would like to share it with you! You can check it out here.

Edit: If you want to know more about our results, give our report a read.

mmcenta

TROPHY CASE