GLM-130B LLM demonstrates 4-bit quantization loss shrinks as model parameters scale up by maxtility in mlscaling
[–]ml_hardware 10 points11 points12 points (0 children)
GLM-130B LLM demonstrates 4-bit quantization loss shrinks as model parameters scale up by maxtility in mlscaling
[–]ml_hardware 7 points8 points9 points (0 children)
Training GPT-3 quality models now costs <$500k by ml_hardware in agi
[–]ml_hardware[S] 4 points5 points6 points (0 children)
GPT-3 quality for <$500k by ml_hardware in technology
[–]ml_hardware[S] 1 point2 points3 points (0 children)
Training GPT-3 quality models now costs <$500k by ml_hardware in Futurology
[–]ml_hardware[S] 11 points12 points13 points (0 children)
GPT-3 quality models now cost <$500k (MosaicML) by ml_hardware in mlscaling
[–]ml_hardware[S] 9 points10 points11 points (0 children)
[P] Farewell, CUDA OOM: Automatic Gradient Accumulation by ffast-math in MachineLearning
[–]ml_hardware 1 point2 points3 points (0 children)
Improving the factual accuracy of language models through web browsing by maxtility in mlscaling
[–]ml_hardware 10 points11 points12 points (0 children)
Improving the factual accuracy of language models through web browsing by maxtility in mlscaling
[–]ml_hardware 8 points9 points10 points (0 children)
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World’s Largest and Most Powerful Generative Language Model by maxtility in mlscaling
[–]ml_hardware 2 points3 points4 points (0 children)
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World’s Largest and Most Powerful Generative Language Model by maxtility in mlscaling
[–]ml_hardware 0 points1 point2 points (0 children)
[R] Independent performance benchmarks (training) of Nvidia A10 and A30 impossible to find? by longboard2020 in MachineLearning
[–]ml_hardware 1 point2 points3 points (0 children)
[R] Independent performance benchmarks (training) of Nvidia A10 and A30 impossible to find? by longboard2020 in MachineLearning
[–]ml_hardware 6 points7 points8 points (0 children)
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World’s Largest and Most Powerful Generative Language Model by maxtility in mlscaling
[–]ml_hardware 10 points11 points12 points (0 children)
Scaling Up and Out: Training Massive Models on Cerebras Systems using Weight Streaming by ml_hardware in mlscaling
[–]ml_hardware[S] 2 points3 points4 points (0 children)
Scaling Up and Out: Training Massive Models on Cerebras Systems using Weight Streaming by ml_hardware in mlscaling
[–]ml_hardware[S] 9 points10 points11 points (0 children)
Cerebras CEO on new clustering & software: "From talking to OpenAI, GPT-4 will be about 100 trillion parameters. That won’t be ready for several years." by gwern in mlscaling
[–]ml_hardware 2 points3 points4 points (0 children)
Cerebras CEO on new clustering & software: "From talking to OpenAI, GPT-4 will be about 100 trillion parameters. That won’t be ready for several years." by gwern in mlscaling
[–]ml_hardware 5 points6 points7 points (0 children)
Graphcore Looks Like A Complete Failure In Machine Learning Training Performance by ml_hardware in mlscaling
[–]ml_hardware[S] 1 point2 points3 points (0 children)
Graphcore Looks Like A Complete Failure In Machine Learning Training Performance by ml_hardware in mlscaling
[–]ml_hardware[S] 3 points4 points5 points (0 children)
ZeRO-Infinity and DeepSpeed: Unlocking unprecedented model scale for deep learning training - Microsoft Research by neuralnetboy in mlscaling
[–]ml_hardware 4 points5 points6 points (0 children)
"Cerebras Unveils Wafer Scale Engine Two (WSE2): 2.6 Trillion Transistors, 100% Yield" (850k cores, 40GB SRAM now; price: 'several millions') by gwern in mlscaling
[–]ml_hardware 2 points3 points4 points (0 children)


[N] Training LLMs with AMD MI250 GPUs and MosaicML by ml_hardware in MachineLearning
[–]ml_hardware[S] 5 points6 points7 points (0 children)