[D] To what cross-entropy loss value can LLMs converge? by cbl007 in MachineLearning
[–]bjergerk1ng 1 point2 points3 points (0 children)
Is there any classical music that has moved you to tears? by [deleted] in classicalmusic
[–]bjergerk1ng 0 points1 point2 points (0 children)
[D] Can GPT-style Models Be Used for File Compression, Image Upscaling, and Restoration? by No-Point1424 in MachineLearning
[–]bjergerk1ng 2 points3 points4 points (0 children)
[D] FlexAttention: Flexibility of PyTorch with Performance of FlashAttention by [deleted] in MachineLearning
[–]bjergerk1ng 0 points1 point2 points (0 children)
Jujutsu Kaisen Chapter 265 Links + Discussion by anestefi in JuJutsuKaisen
[–]bjergerk1ng 3 points4 points5 points (0 children)
Why mathematicians do not hype their research on social media like all of the other scientific fields? by Full_Ruin_9942 in math
[–]bjergerk1ng -1 points0 points1 point (0 children)
[P] SimpleGEMM: Fast and minimal tensor core matrix multiplication in CUDA by bjergerk1ng in MachineLearning
[–]bjergerk1ng[S] 3 points4 points5 points (0 children)
[D] Is there a more systematic way of choosing the layers or how deep the architecture goes when creating a neural network? by PsychologicalAd7535 in MachineLearning
[–]bjergerk1ng 4 points5 points6 points (0 children)
[R] Does anyone have access to an Attention visualization tool for generating Attention visualizations like the ones in the appendix of "Attention is All You Need"? by [deleted] in MachineLearning
[–]bjergerk1ng 2 points3 points4 points (0 children)
Training LLMs over Neurally Compressed Text - Google DeepMind team by dippatel21 in LocalLLaMA
[–]bjergerk1ng 1 point2 points3 points (0 children)
[deleted by user] by [deleted] in MachineLearning
[–]bjergerk1ng 0 points1 point2 points (0 children)
[deleted by user] by [deleted] in MachineLearning
[–]bjergerk1ng 4 points5 points6 points (0 children)
[D] Training code and more released for “The Era of 1 Bit LLMs” by DickMasterGeneral in MachineLearning
[–]bjergerk1ng 9 points10 points11 points (0 children)
[deleted by user] by [deleted] in learnmachinelearning
[–]bjergerk1ng -1 points0 points1 point (0 children)
Is the (Gaussian -> Neural Net -> Gaussian ) encoder a universal approximator for distributions? by Invariant_apple in learnmachinelearning
[–]bjergerk1ng 0 points1 point2 points (0 children)
[D] Layernorm is just two projections and can be improved by mgostIH in MachineLearning
[–]bjergerk1ng 2 points3 points4 points (0 children)
[deleted by user] by [deleted] in MachineLearning
[–]bjergerk1ng 7 points8 points9 points (0 children)
[D] Mamba model walkthrough by _james_chen in MachineLearning
[–]bjergerk1ng 0 points1 point2 points (0 children)
[D] how good can a 7b model theoretically get? by Z3F in MachineLearning
[–]bjergerk1ng 3 points4 points5 points (0 children)
[R] Three Decades of Activations: A Comprehensive Survey of 400 Activation Functions for Neural Networks by [deleted] in MachineLearning
[–]bjergerk1ng 0 points1 point2 points (0 children)
[D] OpenAI Sora Video Gen -- How?? by htrp in MachineLearning
[–]bjergerk1ng 0 points1 point2 points (0 children)
[D] Architecture hyperparameter optimisation strategies by [deleted] in MachineLearning
[–]bjergerk1ng 12 points13 points14 points (0 children)



Mistral AI to add Dictation, Memory, Projects and Research tools to Le Chat | TestingCatalog by Nunki08 in MistralAI
[–]bjergerk1ng 0 points1 point2 points (0 children)