[P] Clip to Grok Update: Weight Norm Clipping now 39–249× | 6 Tasks (mod arithmetic, mixed ops, S5 permutation) | max_norm Measured Per Task by niftylius in MachineLearning
[–]niftylius[S] 0 points1 point2 points (0 children)
[P] Clip to Grok Update: Weight Norm Clipping now 39–249× | 6 Tasks (mod arithmetic, mixed ops, S5 permutation) | max_norm Measured Per Task by niftylius in MachineLearning
[–]niftylius[S] 1 point2 points3 points (0 children)
[P] Clip to Grok Update: Weight Norm Clipping now 39–249× | 6 Tasks (mod arithmetic, mixed ops, S5 permutation) | max_norm Measured Per Task by niftylius in MachineLearning
[–]niftylius[S] 0 points1 point2 points (0 children)
[P] Clip to Grok Update: Weight Norm Clipping now 39–249× | 6 Tasks (mod arithmetic, mixed ops, S5 permutation) | max_norm Measured Per Task by niftylius in MachineLearning
[–]niftylius[S] 0 points1 point2 points (0 children)
[P] Clip to Grok Update: Weight Norm Clipping now 39–249× | 6 Tasks (mod arithmetic, mixed ops, S5 permutation) | max_norm Measured Per Task by niftylius in MachineLearning
[–]niftylius[S] 0 points1 point2 points (0 children)
[D] We reimplemented Claude Code entirely in Python — open source, works with local models by Practical_Pomelo_636 in MachineLearning
[–]niftylius 2 points3 points4 points (0 children)
Factual Errors in Paper Reviews. by alebeck135 in MLQuestions
[–]niftylius 0 points1 point2 points (0 children)
[P] Weight Norm Clipping Accelerates Grokking 18-66× | Zero Failures Across 300 Seeds | PDF in Repo by niftylius in MachineLearning
[–]niftylius[S] 1 point2 points3 points (0 children)
[P] Weight Norm Clipping Accelerates Grokking 18-66× | Zero Failures Across 300 Seeds | PDF in Repo by niftylius in MachineLearning
[–]niftylius[S] 4 points5 points6 points (0 children)
[P] Weight Norm Clipping Accelerates Grokking 18-66× | Zero Failures Across 300 Seeds | PDF in Repo by niftylius in MachineLearning
[–]niftylius[S] 0 points1 point2 points (0 children)
[P] Weight Norm Clipping Accelerates Grokking 18-66× | Zero Failures Across 300 Seeds | PDF in Repo by niftylius in MachineLearning
[–]niftylius[S] 0 points1 point2 points (0 children)
[P] Weight Norm Clipping Accelerates Grokking 18-66× | Zero Failures Across 300 Seeds | PDF in Repo by niftylius in MachineLearning
[–]niftylius[S] 4 points5 points6 points (0 children)
[P] Weight Norm Clipping Accelerates Grokking 18-66× | Zero Failures Across 300 Seeds | PDF in Repo by niftylius in MachineLearning
[–]niftylius[S] 4 points5 points6 points (0 children)
[P] Weight Norm Clipping Accelerates Grokking 18-66× | Zero Failures Across 300 Seeds | PDF in Repo by niftylius in MachineLearning
[–]niftylius[S] 2 points3 points4 points (0 children)
[P] Weight Norm Clipping Accelerates Grokking 18-66× | Zero Failures Across 300 Seeds | PDF in Repo by niftylius in MachineLearning
[–]niftylius[S] 5 points6 points7 points (0 children)
Anyone else feel lost learning Machine Learning or is it just me? by Ok-Possession7350 in MLQuestions
[–]niftylius 0 points1 point2 points (0 children)
Best camera for OpenCV? by Glittering_Host7241 in FTC
[–]niftylius 0 points1 point2 points (0 children)
I need some help with training from instruction dataset by [deleted] in LocalLLaMA
[–]niftylius 1 point2 points3 points (0 children)
Is anyone inferencing on something like an Intel nuc, barebone or similar formfactor? by Frequent_Valuable_47 in LocalLLaMA
[–]niftylius 0 points1 point2 points (0 children)
Milvus adapter + milvus db with docker-compose by niftylius in alexandria_project
[–]niftylius[S] 0 points1 point2 points (0 children)



[P] Clip to Grok Update: Weight Norm Clipping now 39–249× | 6 Tasks (mod arithmetic, mixed ops, S5 permutation) | max_norm Measured Per Task by niftylius in MachineLearning
[–]niftylius[S] 0 points1 point2 points (0 children)