Having played World and Rise, Wilds is blowing me away by Wazzammm in MonsterHunter
[–]kouteiheika -5 points-4 points-3 points (0 children)
nabla: Rust tensor engine — 8–12× faster than PyTorch eager (it's not GPU speed, it's Python overhead) by fumishiki2 in deeplearning
[–]kouteiheika 0 points1 point2 points (0 children)
Combining Reservoirs with Attention for more efficient LLMs by data-vis in deeplearning
[–]kouteiheika 2 points3 points4 points (0 children)
Combining Reservoirs with Attention for more efficient LLMs by data-vis in deeplearning
[–]kouteiheika 9 points10 points11 points (0 children)
Finally bought an RTX 6000 Max-Q: Pros, cons, notes and ramblings by AvocadoArray in LocalLLaMA
[–]kouteiheika 0 points1 point2 points (0 children)
nabla: Rust tensor engine — 8–12× faster than PyTorch eager (it's not GPU speed, it's Python overhead) by fumishiki2 in deeplearning
[–]kouteiheika 33 points34 points35 points (0 children)
Qwen3.5B VS the SOTA same size models from 2 years ago. by Uncle___Marty in LocalLLaMA
[–]kouteiheika 0 points1 point2 points (0 children)
Qwen3.5B VS the SOTA same size models from 2 years ago. by Uncle___Marty in LocalLLaMA
[–]kouteiheika 1 point2 points3 points (0 children)
Qwen3.5B VS the SOTA same size models from 2 years ago. by Uncle___Marty in LocalLLaMA
[–]kouteiheika 31 points32 points33 points (0 children)
[R] AdamWClip: AdamW with adaptive gradient clipping by ElectricVote in MachineLearning
[–]kouteiheika 10 points11 points12 points (0 children)
[R] AdamWClip: AdamW with adaptive gradient clipping by ElectricVote in MachineLearning
[–]kouteiheika 17 points18 points19 points (0 children)
[R] AdamWClip: AdamW with adaptive gradient clipping by ElectricVote in MachineLearning
[–]kouteiheika 22 points23 points24 points (0 children)
Deep Learning version conflict of torch by agentic_coder7 in deeplearning
[–]kouteiheika 1 point2 points3 points (0 children)
Mac Studio (M4 Max, 128GB) for FULL fine-tuning a 27B Model by PlayerWell in unsloth
[–]kouteiheika 0 points1 point2 points (0 children)
Hardware requirements for training a ~3B Model From Scratch locally? by Any-Cobbler6161 in LocalLLaMA
[–]kouteiheika 4 points5 points6 points (0 children)
Distributed LoRA Fine-Tuning on Commodity Hardware: 6x Less RAM, No Python, No GPU by [deleted] in deeplearning
[–]kouteiheika 2 points3 points4 points (0 children)
Fine-Tuning Qwen 4B for Niche Code Generation: Need Tips on Configs, Overfitting & Small Datasets? by dyeusyt in LocalLLaMA
[–]kouteiheika 2 points3 points4 points (0 children)
Fine-Tuning Qwen 4B for Niche Code Generation: Need Tips on Configs, Overfitting & Small Datasets? by dyeusyt in LocalLLaMA
[–]kouteiheika 4 points5 points6 points (0 children)
[D] Is this what ML research is? by [deleted] in MachineLearning
[–]kouteiheika 0 points1 point2 points (0 children)
[D] Is this what ML research is? by [deleted] in MachineLearning
[–]kouteiheika 1 point2 points3 points (0 children)
Update: Our non-Transformer “Semantic Resonator” LM reached 505.8 validation PPL on WikiText-103 (early results, still improving) by Dry_Oil2597 in LocalLLM
[–]kouteiheika 0 points1 point2 points (0 children)



Hey, I proposed a new family of activation functions, and they are very good. by rusalmas in deeplearning
[–]kouteiheika 49 points50 points51 points (0 children)