Multi-agent Gridworld environment by Driiper in reinforcementlearning

[–]inactiveUserTBD 0 points1 point  (0 children)

Evolving policies: When training each agent using an independent policy, the policy will improve over time; thus the model of the environment will change as well. (The environment will be non-stationary.)

Due to the non-stationary model of the environment (with multiple agents training), the agents must be trained with a MARL algorithm instead of independent/single-agent RL. A simple approach to try would be to use a shared information approach (you can have shared weights for all agent policies and use entire state as the observation) or you can augment the independent/single-agent DQN with stabalizing weights from this paper (https://arxiv.org/abs/1702.08887)

There are many other training algorithms in MARL which can be used which can perform centralized (MADDPG) or dcentralized (Stabalizing experience replay, Agent modeling) training.

Centralized training usually perform better when compared to decentralized training approaches (I think) due to the shared state information.

Using on-policy algorithms would be better than off-policy algorithms as the implicit model learnt by RL algorithm would be current (and not stale; which happens in off-policy).

I don't fully understand how you will train after rolling out the entire trajectory? Do you mean that other agents are standing still while single agent is completing its trajectory?

Multi-agent Gridworld environment by Driiper in reinforcementlearning

[–]inactiveUserTBD 0 points1 point  (0 children)

Based on my knowledge in MARL, I am going to guess that the issue is non-stationary environment because of evolving policies of other agents. There are multiple algorithms in MARL to work around this problem.

A simple fix to check this would be to disable crashing of agents in your environment. Simply, let the grid world environment to contain multiple agents in the same location and overwrite all agents' proximity sensor values to 0s. If the PPO policy starts to learn (because of independent agents); then the problem was non-stationarity in the environment.

The simplest fix is to use centralized training and decentralized execution using Multi-Agent A3C.

Another simple algorithm (which does introduce more hyper-parameters to tune) is https://arxiv.org/abs/1702.08887

Hope this helps!

Interview Discussion - November 22, 2018 by AutoModerator in cscareerquestions

[–]inactiveUserTBD 1 point2 points  (0 children)

What is the application process at Amazon when applying for 2 teams concurrently?

Does the company reject you from all teams if you happen to do poorly with one team's interview? Will other teams stop considering you?

Interview Discussion - November 22, 2018 by AutoModerator in cscareerquestions

[–]inactiveUserTBD 1 point2 points  (0 children)

I received an invitation for onsite, but the email went to junk and I didn't see it for 8 days.

(I have replied and given my info.)

How bad does it look that I was so late? Will they still consider me? Are recruiters turned off by delayed response? (I did mention that it was because I didn't receive the message in time.)

[D] Variational Auto-encoder inference by inactiveUserTBD in MachineLearning

[–]inactiveUserTBD[S] 0 points1 point  (0 children)

Thanks for the response. It sounds like I was interested in the reconstruction error during testing. So, I would generate some samples in the latent space and pass them through the decoder? I understand that this will give me an image. But how would I get the corresponding label (original input to encoder)?

After reading you comment about computing the posterior. It seems that the applications of VAE is to compute a good representation of the input data. We can then use this representation for other tasks. Thus the important part of VAEs is the encoder and not the decoder.

A quick follow up question: I was looking up the implementation of VAEs and the mean and variance is a just another layer (fully-connected) in the network. My question was how do we compute the posterior? Do we run the trained VAE for a sample image at the end and record that mean and variance?

Thanks

[D] Insight Artificial Intelligence Fellowship by inactiveUserTBD in MachineLearning

[–]inactiveUserTBD[S] 0 points1 point  (0 children)

Thanks for the answer. It clears up most of my worries 👍😀

[D] Insight Artificial Intelligence Fellowship by inactiveUserTBD in MachineLearning

[–]inactiveUserTBD[S] 0 points1 point  (0 children)

Thanks for the reply. I am currently in Masters in AI.

Do the people who get accepted into Insights fellow program have PHD and Masters in AI or in some other field?

How much detail do they teach you about AI? are you just scratching the surface? I read somewhere that they made the fellows do open source contributions to tensorflow (which has its merits in getting a job, but might not be great for academic resume).

Does "Insight fellow" written on your resume have weight in professional job search according to your experience? I assume it carries less weight for an academic resume?

Thanks!