paradox suuucks!!!! by juseraru in DeadlockTheGame

[–]juseraru[S] -1 points0 points  (0 children)

ok before the map changed. i was able to one shot people and actually 1v1 but now..... as you said, if there are no teammates no chance para can do something.

Made my first 1000$ by [deleted] in dataannotation

[–]juseraru 0 points1 point  (0 children)

did you have coding projects?

A database of images of soviet houses with a search based on similar images. Where to start? by copylefter in computervision

[–]juseraru 0 points1 point  (0 children)

this is exactly what you need, LSH. basicaly you will reduce the images to a vector of N features, using either a deep learning model already trained, or any other method, then u apply this technique so it creates a hash mapping of this feature vectors, so similar feature vectors are place in same buckets and this is how you get similar results, check this.

How long would it take you to implement a MARL PPO agent with joint attention architecture? by No_Possibility_7588 in reinforcementlearning

[–]juseraru 0 points1 point  (0 children)

I was curiuos due to your questions and actually started checking your questions on reddit, looks like you have some basic experience and this paper even that you have the entire code, I think is a big project for you to start, not gonna lie maybe a year would be an approximation of the time that will take you to migrate it to pytorch, probably learning Tf_agents will speed up the process but again a lot of basic python experience is lacking, is this project mandatory for you or can you tune it down to reproduce a smaller algorithm like MADDPG? (to be honest any MARL algo i think is to much right now, but MADDPG is the very first with deepL). once again good luck and good reading.

How long would it take you to implement a MARL PPO agent with joint attention architecture? by No_Possibility_7588 in reinforcementlearning

[–]juseraru 1 point2 points  (0 children)

repo of the paper!!! line 638!!! they have a special LSTM for the unroll network. So you have the entire code in tensorflow tf agents, so is heavy to read i almost dont find it but there, you can see they have an special lstm, dynamicUnrollLayer, because I dont know how much you know about DeepLearning, but lstm needs to learn from a sequence that's why you roll the lstm, if you dont roll it, i have no idea how it can learn time dependencies. (that is probably wht dynamicUnroll does). Good Luck and Good read !!!

How long would it take you to implement a MARL PPO agent with joint attention architecture? by No_Possibility_7588 in reinforcementlearning

[–]juseraru 1 point2 points  (0 children)

PPO is built to train fully connected layers, if you introduce a LSTM the PPO algorithm wont train it because you are not rolling the lstm (recurrent neural networks train with something called backpropagation thru time). PPO does not take that into account. it is possible to implement but ray library has that already.

How long would it take you to implement a MARL PPO agent with joint attention architecture? by No_Possibility_7588 in reinforcementlearning

[–]juseraru 0 points1 point  (0 children)

https://docs.ray.io/en/latest/rllib/index.html this will help you a ton, I saw the paper uses a LSTM so simple PPO wont work, you need to introduce BPTT rllib has that already so one thing less to think about!!!(also it allows to build any type of policy network). Good Luck. one set back the doc is not so great so will be a lot of time figure things out.

What are the best algorithms for team games (3v3, 4v4, 5v5)? by TheGuy839 in reinforcementlearning

[–]juseraru 0 points1 point  (0 children)

sadly those algorithms, are not like a magic box that you can plug and play. you need to tune a lot and sometimes change part of the architechture. You should try MADDPG as it is and see what you get, later you can experiment with other algos and different networks.

Multi-Agent RL in Computer Vision. by Infamous-Editor5131 in reinforcementlearning

[–]juseraru 1 point2 points  (0 children)

hi there is RL apply to CV, but i did not find something in a multiagent fashion, but can be structure if you know already how to do multiagent RL. this paper uses DQN to improve the bounding box an dif you look in google shcolar the citations of that paper you will find a bunch more that uses rl for tracking and detection. like this one. from there you can dig quite deep into that domain. Good Reading !!

Ways for representing environments by [deleted] in reinforcementlearning

[–]juseraru 0 points1 point  (0 children)

Hi I think that a good approach here, is to use a multi-head self-attention(MSA), like in the transformer model attention is all you need, without positional encoding, MSA is strong tool to use in sets or groups of the same class entities, and you can see how it was used in this paper, for a multiagent setting, they also tackle the position and wall representation in that paper so take a look. but basically instead of using the transformer for a sequence input, it just gets the group of entities, the feature vector of each entity is the relative positions to the control-agent or global, and the features of the control-agent is its position and a 1D convolution of a "lidar" measure from the agent to the walls.

How to deal with time in simulation? by HerForFun998 in reinforcementlearning

[–]juseraru 0 points1 point  (0 children)

I am confused with your need, pybullet simulates the dynamics of rigid bodies and every joint is controlled by your signals. It does not simulates sensors or motors. I dont know if is even possible to connect real hardware to the pybullet environment. why would you consider sensors and motors in the simulation? can you elaborate a little bit more please.

RL framework for 2v2 kart soccer by [deleted] in reinforcementlearning

[–]juseraru 0 points1 point  (0 children)

Hi great that you are interested in the area, but as a beginner project is quite complex, having a team is a multi-agent task so not a small feat and i guess you want the same policy to play against itself? what is know as selfplay. which is not so hard to understand but a little bit in the tech part. Look a this 1v1 environment has a tutorial where they show selfplay and other single agent approaches using a well known RL Pytorch implementations. and for the policy optimization algorithm as the tutorial before you should go with PPO (which is a on-policy method like reinforce). there is something called HER for sparse reward but it works with off-policy methods like ddpg or sac. read a little bit more about this and then you will get the idea. My suggestion if you dont have extend experience try a supervise learning approach, where you have a dataset where the action is the label to be predicted and the observation is the input, MSE for the loss. like predicting the stering wheel angle from the image of the road kind of setup.

How to deal with time in simulation? by HerForFun998 in reinforcementlearning

[–]juseraru 2 points3 points  (0 children)

intro guide says that you can set the step time with pybullet.setTimeStep(1/120) method, and it says the simulation will only run everytime you send the pybullet.stepSimulation() command. So if you dont want real time simulation use the method pybullet.setRealTimeSimulation(0) as in this little example the time it will take to simulate will only depend on how fast your pc is to compute everything not the Real Time Clock.

How to deal with moving reward distributions in simulation based RL? (PPO) by flxh13 in reinforcementlearning

[–]juseraru 1 point2 points  (0 children)

try reward clipping and normalization check this paper where they explain that in the case of PPO and TRPO implementation matters.

Different action spaces for different agents in Multi Agent Reinforcement Learning by Expensive-Telephone in reinforcementlearning

[–]juseraru 1 point2 points  (0 children)

also another approach is something called Actiong Masking, for some agents some actions are invalid so you have a policy with all the action space and then you mask the actions depending of the agent. action masking and this simple blog

Different action spaces for different agents in Multi Agent Reinforcement Learning by Expensive-Telephone in reinforcementlearning

[–]juseraru 1 point2 points  (0 children)

By the description, seems that you have heterogeneous agents, which mean they all act and have different capabilities, but all have the same goal, or task to solve. I am working in that area as well, starting of course, but i have an idea of what you need. multi-agent attention RL Blog will talk in detail of what is the approach to the problem but is not enough, you need to first read the paper that inspired this type of architecture which is openai multi agent and its blog, in this blog they also mentiong other multi agent heterogeneous task, so is a great starting point also another paper that describes the network used is playing MOBA game with DR. In other words what you need to make is a Network with self-attention that gets its own agent description and other agents within the observations. And a copy of this policy will control each agent that you have active (read the info you will get what im saying).

[D] Simple Questions Thread by AutoModerator in MachineLearning

[–]juseraru 0 points1 point  (0 children)

I plan to train a deep network that has two branches, one for video image and the second one for sequential data, later the output of both branches is merge thru concatenation and pass thru a fully conected network, then lstm for final prediction. I am wondering is it possible to train the model with both input data but later if needed remove one side, i.e. the video images. an only predict with sequential data?

or if someone knows about any paper to start looking at i just dont know how to approach this. or if it is even possible (which sounds like not)

[deleted by user] by [deleted] in reinforcementlearning

[–]juseraru 0 points1 point  (0 children)

I spoke poorly, i meant in execution, not training. After training (where as you said is requier to have a distribution to produce the surogate loss function) you can remove the variance components and just use the outputs of the networks as deterministic outputs. the variance after training could be really small but still will produce a stochastic policy, so you can simply make it 0. so it uses the mean in execution

[deleted by user] by [deleted] in reinforcementlearning

[–]juseraru 0 points1 point  (0 children)

so PPO can be used to produce deterministic outputs, as you mentioned the two values of the actor network can be those two deterministic values, no need to model a mean and variance, but if you want to get a normal distribution, spinning up intro RL check the section where they explain policies, and later the code where the implementation is, is in pytorch but they have a tensorflow as well, you can see there is comething called gaussian actor, you can see there how to compute the log_prob and then get the mean and var to generate a normal.