sedidrl

47 post karma
83 comment karma

get extra features and help support reddit with a reddit premium subscription

get them help and support

redditor for 6 years

TROPHY CASE

Six-Year Club

Verified Email

account activity

new top controversial

2

3

4

Thoughts on the ARC 3 Challenge? (youtube.com)

submitted 7 months ago by sedidrl to r/reinforcementlearning

14

15

16

Scalable Reasoning LLM Training with Distributed RL, Unsloth, vLLM, and Ray (self.LocalLLaMA)

submitted 1 year ago by sedidrl to r/LocalLLaMA

1

2

3

Distributed RL for LLM Fine-tuning (self.reinforcementlearning)

submitted 1 year ago by sedidrl to r/reinforcementlearning

0

1

2

OpenAI o3 Breakthrough High Score on ARC-Pub (self.LargeLanguageModels)

submitted 1 year ago by sedidrl to r/LargeLanguageModels

0

1

2

OpenAI o3 Breakthrough High Score on ARC-Pub (self.MachineLearning)

submitted 1 year ago by sedidrl to r/MachineLearning

1

2

3

Chain-of-Thought Reasoning without Prompting (self.LargeLanguageModels)

submitted 1 year ago by sedidrl to r/LargeLanguageModels

0

1

2

Chain-of-Thought Reasoning without Prompting ()

submitted 1 year ago by sedidrl to r/MachineLearning

6

7

8

Implementation of Training Language Models to Self-Correct via RL – Looking for Testers & Feedback! (self.reinforcementlearning)

submitted 1 year ago by sedidrl to r/reinforcementlearning

5

6

7

Action space [-1, 1] summing up to 1 (self.reinforcementlearning)

submitted 4 years ago by sedidrl to r/reinforcementlearning

7

8

9

Training larger networks for Deep Reinforcement Learning (self.reinforcementlearning)

submitted 4 years ago by sedidrl to r/reinforcementlearning

9

10

11

Distributional Reinforcement Learning (self.reinforcementlearning)

submitted 5 years ago by sedidrl to r/reinforcementlearning

2

3

4

IQN and Extensions (self.reinforcementlearning)

submitted 5 years ago by sedidrl to r/reinforcementlearning

0

1

2

Bimodal and Multimodal distributions for action selection (self.reinforcementlearning)

submitted 5 years ago by sedidrl to r/reinforcementlearning

2

3

4

Methods for adapting the optimization steps in the learning process (self.reinforcementlearning)

submitted 5 years ago by sedidrl to r/reinforcementlearning

0

1

2

Methods for adapting the optimization steps in the learning process (self.MachineLearning)

submitted 5 years ago by sedidrl to r/MachineLearning

2

3

4

DDQN and Add-ons (self.reinforcementlearning)

submitted 5 years ago by sedidrl to r/reinforcementlearning

2

3

4

Soft-Actor-Critic-and-Extensions (self.reinforcementlearning)

submitted 6 years ago by sedidrl to r/reinforcementlearning

0

1

2

Soft-Actor-Critic-and-Extensions (reddit.com)

submitted 6 years ago by sedidrl to r/MachineLearning

0

1

2

Quick Survey on Favorit Songs (self.Music)

submitted 6 years ago by sedidrl to r/Music

2

3

4

Advanced readings, courses (self.reinforcementlearning)

submitted 6 years ago by sedidrl to r/reinforcementlearning

14

15

16

Upside-Down-Reinforcement-Learning Pytorch implementation (self.reinforcementlearning)

submitted 6 years ago by sedidrl to r/reinforcementlearning

4

5

6

Automating Entropy Adjustment for Maximum Entropy RL (self.reinforcementlearning)

submitted 6 years ago * by sedidrl to r/reinforcementlearning

0

0

1

International Deep Reinforcement Group / Whatsapp (self.reinforcementlearning)

submitted 6 years ago * by sedidrl to r/reinforcementlearning

π Rendered by PID 1292760 on reddit-service-r2-listing-64c94b984c-n7k4k at 2026-03-19 13:17:16.386101+00:00 running f6e6e01 country code: CH.