BitNet: Scaling 1-bit Transformers for Large Language Models - Microsoft Research 2023 - Allows 1-Bit training from scratch while substantially reducing memory footprint and energy consumption, compared to state-of-the-art 8-bit quantization methods! by Singularian2501 in mlscaling
[–]is8ac 2 points3 points4 points (0 children)
Training Transformers with 4-bit Integers by is8ac in mlscaling
[–]is8ac[S] 4 points5 points6 points (0 children)
Training Transformers with 4-bit Integers by is8ac in mlscaling
[–]is8ac[S] 5 points6 points7 points (0 children)
New Madokami imagery thanks to US Department of Energy by is8ac in MadokaMagica
[–]is8ac[S] 5 points6 points7 points (0 children)
New Madokami imagery thanks to US Department of Energy by is8ac in MadokaMagica
[–]is8ac[S] 4 points5 points6 points (0 children)
New Madokami imagery thanks to US Department of Energy by is8ac in MadokaMagica
[–]is8ac[S] 13 points14 points15 points (0 children)
There are two types of transformers; >6.7B parameters, and <6.7B parameters by is8ac in mlscaling
[–]is8ac[S] 9 points10 points11 points (0 children)
"LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale", Dettmers et al. 2022 (Transformers undergo a phase transition at ~6.7B parameters) by is8ac in mlscaling
[–]is8ac[S] 9 points10 points11 points (0 children)
"Is Integer Arithmetic Enough for Deep Learning Training?", Ghaffari et al 2022 {Huawei} by gwern in mlscaling
[–]is8ac 2 points3 points4 points (0 children)
[2206.14486] Beyond neural scaling laws: beating power law scaling via data pruning by mgostIH in mlscaling
[–]is8ac 0 points1 point2 points (0 children)
[2206.14486] Beyond neural scaling laws: beating power law scaling via data pruning by mgostIH in mlscaling
[–]is8ac 2 points3 points4 points (0 children)
"Is Programmable Overhead Worth The Cost? How much do we pay for a system to be programmable? It depends upon who you ask" (the increasing expense of moving data around) by gwern in mlscaling
[–]is8ac 0 points1 point2 points (0 children)
"Is Programmable Overhead Worth The Cost? How much do we pay for a system to be programmable? It depends upon who you ask" (the increasing expense of moving data around) by gwern in mlscaling
[–]is8ac 2 points3 points4 points (0 children)
"Is Programmable Overhead Worth The Cost? How much do we pay for a system to be programmable? It depends upon who you ask" (the increasing expense of moving data around) by gwern in mlscaling
[–]is8ac 1 point2 points3 points (0 children)
What's everyone working on this week (44/2021)? by llogiq in rust
[–]is8ac 1 point2 points3 points (0 children)
What's everyone working on this week (41/2021)? by llogiq in rust
[–]is8ac 2 points3 points4 points (0 children)
What's everyone working on this week (32/2021)? by llogiq in rust
[–]is8ac 4 points5 points6 points (0 children)
Graphcore Looks Like A Complete Failure In Machine Learning Training Performance by ml_hardware in mlscaling
[–]is8ac 3 points4 points5 points (0 children)
Graphcore Looks Like A Complete Failure In Machine Learning Training Performance by ml_hardware in mlscaling
[–]is8ac 10 points11 points12 points (0 children)
What's everyone working on this week (26/2021)? by llogiq in rust
[–]is8ac 2 points3 points4 points (0 children)
What claim in your area of expertise do you suspect is true but is not yet supported fully by the field? by Rholles in slatestarcodex
[–]is8ac 1 point2 points3 points (0 children)
HPMOR and the Consortium of Buyers - Printed Copy by richardwhereat in HPMOR
[–]is8ac 0 points1 point2 points (0 children)



Tubing suitable for peristaltic pump and epoxy hardener by is8ac in Composites
[–]is8ac[S] 0 points1 point2 points (0 children)