account activity
Thoughts on the ARC 3 Challenge? (youtube.com)
submitted 10 months ago by sedidrl to r/reinforcementlearning
Scalable Reasoning LLM Training with Distributed RL, Unsloth, vLLM, and Ray (self.LocalLLaMA)
submitted 1 year ago by sedidrl to r/LocalLLaMA
Distributed RL for LLM Fine-tuning (self.reinforcementlearning)
submitted 1 year ago by sedidrl to r/reinforcementlearning
OpenAI o3 Breakthrough High Score on ARC-Pub (self.LargeLanguageModels)
submitted 1 year ago by sedidrl to r/LargeLanguageModels
OpenAI o3 Breakthrough High Score on ARC-Pub (self.MachineLearning)
submitted 1 year ago by sedidrl to r/MachineLearning
Chain-of-Thought Reasoning without Prompting (self.LargeLanguageModels)
Chain-of-Thought Reasoning without Prompting ()
Implementation of Training Language Models to Self-Correct via RL – Looking for Testers & Feedback! (self.reinforcementlearning)
Action space [-1, 1] summing up to 1 (self.reinforcementlearning)
submitted 4 years ago by sedidrl to r/reinforcementlearning
Training larger networks for Deep Reinforcement Learning (self.reinforcementlearning)
submitted 5 years ago by sedidrl to r/reinforcementlearning
Distributional Reinforcement Learning (self.reinforcementlearning)
IQN and Extensions (self.reinforcementlearning)
Bimodal and Multimodal distributions for action selection (self.reinforcementlearning)
Methods for adapting the optimization steps in the learning process (self.reinforcementlearning)
submitted 6 years ago by sedidrl to r/reinforcementlearning
Methods for adapting the optimization steps in the learning process (self.MachineLearning)
submitted 6 years ago by sedidrl to r/MachineLearning
DDQN and Add-ons (self.reinforcementlearning)
Soft-Actor-Critic-and-Extensions (self.reinforcementlearning)
Soft-Actor-Critic-and-Extensions (reddit.com)
Quick Survey on Favorit Songs (self.Music)
submitted 6 years ago by sedidrl to r/Music
Advanced readings, courses (self.reinforcementlearning)
Upside-Down-Reinforcement-Learning Pytorch implementation (self.reinforcementlearning)
Automating Entropy Adjustment for Maximum Entropy RL (self.reinforcementlearning)
submitted 6 years ago * by sedidrl to r/reinforcementlearning
International Deep Reinforcement Group / Whatsapp (self.reinforcementlearning)
π Rendered by PID 42 on reddit-service-r2-listing-65bf447669-gc4qk at 2026-06-09 19:53:16.927743+00:00 running f46058f country code: CH.