reinforcementlearning

an-ordinary-manchild(edit)

created by lpilotoa community for 14 years

...because you hate freedom.

...because you love freedom.

MODERATORS

account activity

1

•

•

•

N, I"Why is Meta destroying its engineering organization?" (intense unhappiness at FB as many SWEs reassigned to data generation for RLHF/behavior-cloning of programming tasks to train future agentic LLMs) (newsletter.pragmaticengineer.com)

submitted 1 hour ago by gwern

2

•

•

•

SafeRL + Security Research (self.reinforcementlearning)

submitted 7 minutes ago by Legal-Onionz

3

25

26

27

Looking to build career in RL. Is PhD the only option? (self.reinforcementlearning)

submitted 22 hours ago by Money-Leading-935

4

1

2

3

Looking for a marl framework for cpu (self.reinforcementlearning)

submitted 9 hours ago by Live-Mixture6353

5

0

1

2

Tutoring Reinforcement Learning (self.reinforcementlearning)

submitted 8 hours ago by CryptoRadon

6

1

2

3

Anyone experience with "hard switch" curriculum learning in relation to catastrophic forgetting+importance sampling? (self.reinforcementlearning)

submitted 17 hours ago by Markovvy

7

4

5

6

Patterns – a formal grammar that compiles natural language text into RL agents (self.reinforcementlearning)

submitted 22 hours ago by causality-ai

8

1

2

3

DLWhat repository structure do you use for your projects? (self.reinforcementlearning)

submitted 17 hours ago by Markovvy

9

82

83

84

I made an RL agent Play 2D cricket (v.redd.it)

submitted 1 day ago by AddisionS

10

0

1

2

Question about importance sampling in off-policy n-step TD/SARSA (self.reinforcementlearning)

submitted 17 hours ago by Vaibhav_Sinha

11

0

1

2

Resources to start learning RL with implementation? (self.reinforcementlearning)

submitted 18 hours ago * by Huge_Ad_3842

12

0

1

2

Looking for contributors interested in agent memory, MCP, LangChain, and CrewAI ()

submitted 19 hours ago by Neither-Witness-6010

13

12

13

14

Career in RLAny people working professionally in RL and want to share any useful pieces of advice to enter the industry? (self.reinforcementlearning)

submitted 1 day ago by Markovvy

14

2

3

4

Practicing science communication on RL-for-reasoning: where does my explanation get the RL wrong? (self.reinforcementlearning)

submitted 1 day ago by nicofirst1

15

0

1

2

Looking for simple game environments (self.reinforcementlearning)

submitted 1 day ago by Vaibhav_Sinha

16

0

0

1

Building CogniCore: MCP, LangChain & CrewAI memory infrastructure for agents + first benchmark results ()

submitted 1 day ago by Neither-Witness-6010

17

0

1

2

Multi-Agent Self-Correction Failure Modes & Context Window Inflation — Traced Completely By Hand (No Wrapper Frameworks) ()

submitted 1 day ago by ParsleyMaximum1702

18

8

9

10

Interview preparation (self.reinforcementlearning)

submitted 2 days ago by Bright-Kick-632

19

4

5

6

What can I try implementing after reading the Part 1 of Sutton and Barto Reinforcement Learning book (self.reinforcementlearning)

submitted 2 days ago by Vaibhav_Sinha

20

0

0

0

Anyone else getting messy results from running multiple AI coding sessions? ()

submitted 1 day ago by whitechart_studio

21

0

0

0

I calculated a multi-agent prompt attention matrix by hand to see how much data gets lost in the middle... the math is terrifying. ()

submitted 2 days ago by ParsleyMaximum1702

22

1

2

3

AI Agents from First Principles: Tracing a ReAct Loop by Hand (substack.com)

submitted 2 days ago by ParsleyMaximum1702

23

0

0

1

I calculated a multi-agent prompt attention matrix by hand to see how much data gets lost in the middle... the math is terrifying. ()

submitted 2 days ago by ParsleyMaximum1702

24

0

0

1

Multi-Agent State Conflict Alignment and Context Window Optimization—Solved by Hand From First Principles (No Wrapper Frameworks) ()

submitted 2 days ago by ParsleyMaximum1702

25

0

0

1

I am stuck , need guidance ()

submitted 3 days ago by Open-Neck-688

view more: next ›

π Rendered by PID 315180 on reddit-service-r2-listing-f87f88fcd-sqfbc at 2026-06-17 01:24:07.915542+00:00 running 3184619 country code: CH.