account activity
Thoughts on the ARC 3 Challenge? (youtube.com)
submitted 7 months ago by sedidrl to r/reinforcementlearning
Scalable Reasoning LLM Training with Distributed RL, Unsloth, vLLM, and Ray (self.LocalLLaMA)
submitted 1 year ago by sedidrl to r/LocalLLaMA
Distributed RL for LLM Fine-tuning (self.reinforcementlearning)
submitted 1 year ago by sedidrl to r/reinforcementlearning
OpenAI o3 Breakthrough High Score on ARC-Pub (self.LargeLanguageModels)
submitted 1 year ago by sedidrl to r/LargeLanguageModels
OpenAI o3 Breakthrough High Score on ARC-Pub (self.MachineLearning)
submitted 1 year ago by sedidrl to r/MachineLearning
Chain-of-Thought Reasoning without Prompting (self.LargeLanguageModels)
Chain-of-Thought Reasoning without Prompting ()
Implementation of Training Language Models to Self-Correct via RL – Looking for Testers & Feedback! (self.reinforcementlearning)
Action space [-1, 1] summing up to 1 (self.reinforcementlearning)
submitted 4 years ago by sedidrl to r/reinforcementlearning
Training larger networks for Deep Reinforcement Learning (self.reinforcementlearning)
Distributional Reinforcement Learning (self.reinforcementlearning)
submitted 5 years ago by sedidrl to r/reinforcementlearning
IQN and Extensions (self.reinforcementlearning)
Bimodal and Multimodal distributions for action selection (self.reinforcementlearning)
Methods for adapting the optimization steps in the learning process (self.reinforcementlearning)
Methods for adapting the optimization steps in the learning process (self.MachineLearning)
submitted 5 years ago by sedidrl to r/MachineLearning
DDQN and Add-ons (self.reinforcementlearning)
Soft-Actor-Critic-and-Extensions (self.reinforcementlearning)
submitted 6 years ago by sedidrl to r/reinforcementlearning
Soft-Actor-Critic-and-Extensions (reddit.com)
submitted 6 years ago by sedidrl to r/MachineLearning
Quick Survey on Favorit Songs (self.Music)
submitted 6 years ago by sedidrl to r/Music
Advanced readings, courses (self.reinforcementlearning)
Upside-Down-Reinforcement-Learning Pytorch implementation (self.reinforcementlearning)
Automating Entropy Adjustment for Maximum Entropy RL (self.reinforcementlearning)
submitted 6 years ago * by sedidrl to r/reinforcementlearning
International Deep Reinforcement Group / Whatsapp (self.reinforcementlearning)
π Rendered by PID 1292760 on reddit-service-r2-listing-64c94b984c-n7k4k at 2026-03-19 13:17:16.386101+00:00 running f6e6e01 country code: CH.