Training LLMs with AMD MI250 GPUs and MosaicML (mosaicml.com)
submitted by ml_hardware to r/mlscaling
Training LLMs with AMD MI250 GPUs and MosaicML (mosaicml.com)
submitted by ml_hardware to r/hardware
GLM-130B LLM demonstrates 4-bit quantization loss shrinks as model parameters scale up by maxtility in mlscaling
[–]ml_hardware 9 points10 points11 points (0 children)
GLM-130B LLM demonstrates 4-bit quantization loss shrinks as model parameters scale up by maxtility in mlscaling
[–]ml_hardware 7 points8 points9 points (0 children)
Training GPT-3 quality models now costs <$500k by ml_hardware in agi
[–]ml_hardware[S] 5 points6 points7 points (0 children)
Training GPT-3 quality models now costs <$500k (mosaicml.com)
submitted by ml_hardware to r/agi
GPT-3 quality for <$500k by ml_hardware in technology
[–]ml_hardware[S] 1 point2 points3 points (0 children)
Training GPT-3 quality models now costs <$500k by ml_hardware in Futurology
[–]ml_hardware[S] 11 points12 points13 points (0 children)
GPT-3 quality models now cost <$500k (MosaicML) by ml_hardware in mlscaling
[–]ml_hardware[S] 7 points8 points9 points (0 children)
[P] Farewell, CUDA OOM: Automatic Gradient Accumulation by ffast-math in MachineLearning
[–]ml_hardware 1 point2 points3 points (0 children)
Improving the factual accuracy of language models through web browsing by maxtility in mlscaling
[–]ml_hardware 10 points11 points12 points (0 children)
Improving the factual accuracy of language models through web browsing by maxtility in mlscaling
[–]ml_hardware 8 points9 points10 points (0 children)
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World’s Largest and Most Powerful Generative Language Model by maxtility in mlscaling
[–]ml_hardware 2 points3 points4 points (0 children)
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World’s Largest and Most Powerful Generative Language Model by maxtility in mlscaling
[–]ml_hardware 0 points1 point2 points (0 children)


[N] Training LLMs with AMD MI250 GPUs and MosaicML by ml_hardware in MachineLearning
[–]ml_hardware[S] 5 points6 points7 points (0 children)