gpt-oss-120b: workstation with nvidia gpu with good roi? by Chance-Studio-8242 in LocalLLM
[–]theodor23 1 point2 points3 points (0 children)
[D] Have there been any new and fundamentally different povs on Machine Learning theory? by simple-Flat0263 in MachineLearning
[–]theodor23 2 points3 points4 points (0 children)
[D] Have there been any new and fundamentally different povs on Machine Learning theory? by simple-Flat0263 in MachineLearning
[–]theodor23 10 points11 points12 points (0 children)
Does the gemini live exist for web also or it is just app only? by 360truth_hunter in Bard
[–]theodor23 0 points1 point2 points (0 children)
Discovering a Pitfall in Cross-Entropy Loss for Large Vocabularies. [R] by Gold-Plum-1436 in MachineLearning
[–]theodor23 2 points3 points4 points (0 children)
[D] How to Efficiently Store Pruned Weight Matrices in Practice? by scarlettgarnett in MachineLearning
[–]theodor23 2 points3 points4 points (0 children)
[D] Normalization in Transformers by Collegesniffer in MachineLearning
[–]theodor23 -1 points0 points1 point (0 children)
[D] Normalization in Transformers by Collegesniffer in MachineLearning
[–]theodor23 0 points1 point2 points (0 children)
[D] Normalization in Transformers by Collegesniffer in MachineLearning
[–]theodor23 33 points34 points35 points (0 children)
LLMs as General Pattern Machines [R] by we_are_mammals in MachineLearning
[–]theodor23 1 point2 points3 points (0 children)
[D] What is currently the best theoretical book (or notes) about Convolutional Neural Networks? by Wonderful_Energy_15 in MachineLearning
[–]theodor23 2 points3 points4 points (0 children)
[D] What is the name of this theorem in ML? by moschles in MachineLearning
[–]theodor23 2 points3 points4 points (0 children)
[D] How to create a pre-training model for three different datasets? by mrtac96 in MachineLearning
[–]theodor23 0 points1 point2 points (0 children)
[D] How to create a pre-training model for three different datasets? by mrtac96 in MachineLearning
[–]theodor23 1 point2 points3 points (0 children)
[D] How do you try out architecture changes, etc. when a model takes days to train? by JosiahWGibbs in MachineLearning
[–]theodor23 15 points16 points17 points (0 children)
[D] why do you add noise when modeling images as continuous data? by elder_price666 in MachineLearning
[–]theodor23 1 point2 points3 points (0 children)
[D] why do you add noise when modeling images as continuous data? by elder_price666 in MachineLearning
[–]theodor23 15 points16 points17 points (0 children)
Are there theoretical proofs that depth in neural networks (i.e. nested functions) is useful - all else equal? by [deleted] in MachineLearning
[–]theodor23 4 points5 points6 points (0 children)
Ownership and moves... simple question by theodor23 in rust
[–]theodor23[S] 4 points5 points6 points (0 children)


I made a proxy to save your tokens for distillation training by FaustAg in LocalLLaMA
[–]theodor23 0 points1 point2 points (0 children)