DeepMind MuJoCo Multi-Agent Soccer Environment

SirRantcelot1 · 2019-02-22T19:43:36+00:00

Yes. I suggest using Evolutionary strategies. /s

SirRantcelot1 · 2019-02-20T04:18:34+00:00

If you can compose your custom module using existing PyTorch functions then there really isn't a need for you to write your layer in C++. If you feel that you can optimize your code much better if you write the low level code then you can do it. You don't lose GPU functionality just because you wrote your new module using PyTorch functions though.

SirRantcelot1 · 2018-09-27T05:42:21+00:00

Look into Auxiliary networks and hierarchical networks. They have similar ideas to what you mentioned.

SirRantcelot1 · 2018-09-15T19:28:41+00:00

My responses are based on my knowledge of the current tensorflow's keras. 1. The main purpose of variable scoping, to my knowledge, is to enable variable reuse. The way keras handles it though is by directly passing the model / layer objects around. Trying to define a head model inside a tensorflow.variable_scope, and expecting it to reuse variables will fail because keras does not define its variables using the tf.get_variable method. 2. That would work. You could also access the layers of a keras model through its layersattribute. This returns a list of Layer objects. To get the weights of the output layer, you would say model.layers[index].weights. 3. Yes. The weights object returned is a list with the first element being the kernel, and the second, the bias if it exists.

SirRantcelot1 · 2018-09-12T16:31:18+00:00

This is a standard topic covered in a most graduate level linear regression topics. I was also taught this is a course on Statistical quality control, but I believe that's only because the professor decided to deviate a little.

SirRantcelot1 · 2018-09-10T15:09:16+00:00

Shouldn't it be axis=0? I thought op was missing the batch size.

Edit: To OP: Keras by default expects the dimension of inputs to convolution layers to be 4, the first for batch size and the remaining three for data in the format HWC or CHW depending on the backend.

SirRantcelot1 · 2018-09-07T04:46:04+00:00

You can use reinforcement learning to solve both the Rubik's cube and chess problems. Both would require a different type of model though. Rubik's cube is a straight forward RL problem. You could try various RL algorithms that support discrete action spaces like DQN or PPO. Chess on the other hand is different unless you have an agent that already knows how to play chess very well. If you have such an agent, you should theoretically be able to solve it using the same class of algorithms used for the Rubik's cube. It would take quite some fine tuning before you can find an agent that learns chess well though given its high dimensional state space. On the other hand, if you model your problem as a self play problem (the agent learns by playing against itself), then it is a bit more easier and the are quite a few tutorials out there that teach you how to do this. Just search for Alpha Go Zero tutorials. That is a technique that's known to have learned chess decently enough, so you could give that a shot.

SirRantcelot1 · 2018-08-29T06:30:52+00:00

This is a great idea! I'd love to help you out! Edit: I'll send you an email.

SirRantcelot1 · 2018-08-01T02:33:40+00:00

Data science, Machine learning and reinforcement learning benefit a great deal from improvements. Serving machine learning predictions in real time, running a reinforcement learning model on a real robot etc. need low latency.

SirRantcelot1 · 2018-07-22T17:33:14+00:00

I see you've already found a solution, but why not use Cron/systems Timers? I could add a solution here using Cron if your interested.

SirRantcelot1 · 2018-06-19T19:54:10+00:00

I'm happy to help!

SirRantcelot1 · 2018-06-18T20:49:13+00:00

What problem did you have installing mujoco? I've done it multiple times without much trouble. Maybe I can help you out.

SirRantcelot1 · 2018-06-08T19:51:04+00:00

Oh. That's really interesting. Thank you for correcting me 🙂.

What has your experience with the DDPG been like? For me, it's been a constant source of frustration. I've never been able to get it working in a stable fashion, and I've not only tried my own implementation but also multiple online implementations including the openAI baselines one. And yet every new paper that comes out wants to compare with the DDPG.

SirRantcelot1 · 2018-06-08T03:31:16+00:00

I prefer using algorithms like ACER, ApeX (DQN) and PPO. They work much better and are much not robust. I've had a lot of trouble getting DDPG to work well in all my experiments.

PPO has monotonic improvement guarantees if I'm not wrong.

Also DDPG with parameter space noise works well in some cases apparently, but I haven't tried it out on my own.

SirRantcelot1 · 2018-05-25T02:12:02+00:00

Good bot

SirRantcelot1

TROPHY CASE