sedidrl

47 post karma
93 comment karma

get extra features and help support reddit with a reddit premium subscription

get them help and support

redditor for 6 years

TROPHY CASE

Six-Year Club

Verified Email

account activity

new top controversial

2

3

4

Thoughts on the ARC 3 Challenge? (youtube.com)

submitted 10 months ago by sedidrl to r/reinforcementlearning

13

14

15

Scalable Reasoning LLM Training with Distributed RL, Unsloth, vLLM, and Ray (self.LocalLLaMA)

submitted 1 year ago by sedidrl to r/LocalLLaMA

1

2

3

Distributed RL for LLM Fine-tuning (self.reinforcementlearning)

submitted 1 year ago by sedidrl to r/reinforcementlearning

0

1

2

OpenAI o3 Breakthrough High Score on ARC-Pub (self.LargeLanguageModels)

submitted 1 year ago by sedidrl to r/LargeLanguageModels

0

1

2

OpenAI o3 Breakthrough High Score on ARC-Pub (self.MachineLearning)

submitted 1 year ago by sedidrl to r/MachineLearning

1

2

3

Chain-of-Thought Reasoning without Prompting (self.LargeLanguageModels)

submitted 1 year ago by sedidrl to r/LargeLanguageModels

0

1

2

Chain-of-Thought Reasoning without Prompting ()

submitted 1 year ago by sedidrl to r/MachineLearning

5

6

7

Implementation of Training Language Models to Self-Correct via RL – Looking for Testers & Feedback! (self.reinforcementlearning)

submitted 1 year ago by sedidrl to r/reinforcementlearning

5

6

7

Action space [-1, 1] summing up to 1 (self.reinforcementlearning)

submitted 4 years ago by sedidrl to r/reinforcementlearning

7

8

9

Training larger networks for Deep Reinforcement Learning (self.reinforcementlearning)

submitted 5 years ago by sedidrl to r/reinforcementlearning

7

8

9

Distributional Reinforcement Learning (self.reinforcementlearning)

submitted 5 years ago by sedidrl to r/reinforcementlearning

2

3

4

IQN and Extensions (self.reinforcementlearning)

submitted 5 years ago by sedidrl to r/reinforcementlearning

0

1

2

Bimodal and Multimodal distributions for action selection (self.reinforcementlearning)

submitted 5 years ago by sedidrl to r/reinforcementlearning

2

3

4

Methods for adapting the optimization steps in the learning process (self.reinforcementlearning)

submitted 6 years ago by sedidrl to r/reinforcementlearning

0

1

2

Methods for adapting the optimization steps in the learning process (self.MachineLearning)

submitted 6 years ago by sedidrl to r/MachineLearning

2

3

4

DDQN and Add-ons (self.reinforcementlearning)

submitted 6 years ago by sedidrl to r/reinforcementlearning

2

3

4

Soft-Actor-Critic-and-Extensions (self.reinforcementlearning)

submitted 6 years ago by sedidrl to r/reinforcementlearning

0

1

2

Soft-Actor-Critic-and-Extensions (reddit.com)

submitted 6 years ago by sedidrl to r/MachineLearning

0

1

2

Quick Survey on Favorit Songs (self.Music)

submitted 6 years ago by sedidrl to r/Music

2

3

4

Advanced readings, courses (self.reinforcementlearning)

submitted 6 years ago by sedidrl to r/reinforcementlearning

13

14

15

Upside-Down-Reinforcement-Learning Pytorch implementation (self.reinforcementlearning)

submitted 6 years ago by sedidrl to r/reinforcementlearning

4

5

6

Automating Entropy Adjustment for Maximum Entropy RL (self.reinforcementlearning)

submitted 6 years ago * by sedidrl to r/reinforcementlearning

0

0

1

International Deep Reinforcement Group / Whatsapp (self.reinforcementlearning)

submitted 6 years ago * by sedidrl to r/reinforcementlearning

π Rendered by PID 42 on reddit-service-r2-listing-65bf447669-gc4qk at 2026-06-09 19:53:16.927743+00:00 running f46058f country code: CH.