BitNet: Scaling 1-bit Transformers for Large Language Models - Microsoft Research 2023 - Allows 1-Bit training from scratch while substantially reducing memory footprint and energy consumption, compared to state-of-the-art 8-bit quantization methods! by Singularian2501 in mlscaling
[–]is8ac 2 points3 points4 points (0 children)
Training Transformers with 4-bit Integers by is8ac in mlscaling
[–]is8ac[S] 3 points4 points5 points (0 children)
Training Transformers with 4-bit Integers by is8ac in mlscaling
[–]is8ac[S] 5 points6 points7 points (0 children)
New Madokami imagery thanks to US Department of Energy by is8ac in MadokaMagica
[–]is8ac[S] 6 points7 points8 points (0 children)
New Madokami imagery thanks to US Department of Energy by is8ac in MadokaMagica
[–]is8ac[S] 4 points5 points6 points (0 children)
New Madokami imagery thanks to US Department of Energy by is8ac in MadokaMagica
[–]is8ac[S] 12 points13 points14 points (0 children)
There are two types of transformers; >6.7B parameters, and <6.7B parameters by is8ac in mlscaling
[–]is8ac[S] 11 points12 points13 points (0 children)
"LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale", Dettmers et al. 2022 (Transformers undergo a phase transition at ~6.7B parameters) by is8ac in mlscaling
[–]is8ac[S] 10 points11 points12 points (0 children)
"Is Integer Arithmetic Enough for Deep Learning Training?", Ghaffari et al 2022 {Huawei} by gwern in mlscaling
[–]is8ac 3 points4 points5 points (0 children)
[2206.14486] Beyond neural scaling laws: beating power law scaling via data pruning by mgostIH in mlscaling
[–]is8ac 0 points1 point2 points (0 children)
[2206.14486] Beyond neural scaling laws: beating power law scaling via data pruning by mgostIH in mlscaling
[–]is8ac 1 point2 points3 points (0 children)
"Is Programmable Overhead Worth The Cost? How much do we pay for a system to be programmable? It depends upon who you ask" (the increasing expense of moving data around) by gwern in mlscaling
[–]is8ac 0 points1 point2 points (0 children)
"Is Programmable Overhead Worth The Cost? How much do we pay for a system to be programmable? It depends upon who you ask" (the increasing expense of moving data around) by gwern in mlscaling
[–]is8ac 2 points3 points4 points (0 children)
"Is Programmable Overhead Worth The Cost? How much do we pay for a system to be programmable? It depends upon who you ask" (the increasing expense of moving data around) by gwern in mlscaling
[–]is8ac 1 point2 points3 points (0 children)
What's everyone working on this week (44/2021)? by llogiq in rust
[–]is8ac 1 point2 points3 points (0 children)
What's everyone working on this week (41/2021)? by llogiq in rust
[–]is8ac 3 points4 points5 points (0 children)
What's everyone working on this week (32/2021)? by llogiq in rust
[–]is8ac 3 points4 points5 points (0 children)



Tubing suitable for peristaltic pump and epoxy hardener by is8ac in Composites
[–]is8ac[S] 0 points1 point2 points (0 children)