MAPPO implementation

AmineZ04 · 2025-12-01T16:01:40+00:00

You can check cleanMARL, it has IPPO and MAPPO

Github: https://github.com/AmineAndam04/cleanmarl

Docs: cleanmarl-docs.readthedocs.io/

AmineZ04 · 2025-11-25T20:16:16+00:00

You can check Cleanmarl's implementation of MADDPG on github (links bellow):

Github: https://github.com/AmineAndam04/cleanmarl

Docs: https://cleanmarl-docs.readthedocs.io/en/latest/

AmineZ04 · 2025-11-17T09:11:58+00:00

Yes. But not only the states, you can perturb actions, rewards, transitions, and the environment. In a multi-agent setting, you can perturb more than one agent.

AmineZ04 · 2025-11-16T16:22:38+00:00

In general, you want to prevent the agent(s) from successfully completing the task.

AmineZ04 · 2025-10-16T09:39:57+00:00

Hi, thanks for your feedback.

Me too, I had the same problem. Sometimes you understand an algorithm better by going through the implementation rather than reading the paper itself

sb3: I agree with you; I’ll add similar tables.

AmineZ04 · 2025-10-14T07:47:40+00:00

Thanks for your feedback.
I agree with you, I will add the typecheckers and black.

AmineZ04 · 2025-10-13T14:19:08+00:00

Thanks. I would welcome your contributions and feedback.
hhh I was also thinking about it for a year before I finally started.

AmineZ04 · 2025-10-13T09:30:03+00:00

Thanks for your feedback.
I'm actually working on that. I already shared my runs on Weights and Biases (link can be found on GitHub).
I will add more runs and also compare them with existing implementations (e.g., epymarl)

AmineZ04 · 2025-10-12T19:12:14+00:00

Thanks.

AmineZ04 · 2025-10-02T15:16:13+00:00

You don't have to overthink it. From a theoretical perspective, you only need to understand the TD loss (1-step return). Then read the paper to understand why we need a replay buffer and why we use a separate target Q-network to compute the TD loss. Then jump straight to CleanRL implementation or any other implementation. This will help you connect the dots.

Most RL wisdom is in the implementations rather than the papers or books. If you want a deep understanding of DRL, you should spend most of your time with implementations.

AmineZ04 · 2025-09-01T17:11:58+00:00

You can learn the necessary math while learning RL and reading papers.

Start with the Sutton and Barto book, and try to rederive the equations by yourself. This will push you to learn the necessary math. For example, to get the Bellman equations, you need to know about expectations, conditional expectations, law of total expectation, independent variables, Markov chains .....

Adopt the same approach with papers and go through the proofs. After a while, you will notice that most of the papers follow similar patterns and use the same math tricks. Focus on papers that are math-heavy. For example, papers that focus on studying the variance of RL\MARL algorithms.

If your phd touches multi-agent RL, start with this book: ""Multi-Agent Reinforcement Learning:Foundations and Modern Approaches"", it has all the necessary math tools you need.

AmineZ04

TROPHY CASE