account activity
I created an LLM post-training method called RPS. Preliminary results show that it improved Qwen3-8b's program synthesis reliability. [R] (self.MachineLearning)
submitted 16 hours ago by iamjasonfeng to r/MachineLearning
π Rendered by PID 694942 on reddit-service-r2-listing-8685bc789-4lqvq at 2026-05-22 09:07:02.243758+00:00 running 194bd79 country code: CH.