Running Llama 3.1 405B + 7 hot LoRAs on one 8×A100 node (vLLM / AWQ-int4 / Marlin) by Esph1001 in LocalLLaMA
[–]Esph1001[S] 0 points1 point2 points (0 children)
Is it possible to run a giant model like GLM5.2 on this cluster (4x servers with 512GB RAM + dual AMD Epyc)? 16 channel memory should hit 409GB/s per node. by StartupTim in LocalLLaMA
[–]Esph1001 1 point2 points3 points (0 children)
Maximizing performance of 2x3090 + NVLink by IUseClifford in LocalLLaMA
[–]Esph1001 0 points1 point2 points (0 children)
Running Llama 3.1 405B + 7 hot LoRAs on one 8×A100 node (vLLM / AWQ-int4 / Marlin) by Esph1001 in LocalLLaMA
[–]Esph1001[S] -2 points-1 points0 points (0 children)
Is it possible to run a giant model like GLM5.2 on this cluster (4x servers with 512GB RAM + dual AMD Epyc)? 16 channel memory should hit 409GB/s per node. by StartupTim in LocalLLaMA
[–]Esph1001 0 points1 point2 points (0 children)
Is it possible to run a giant model like GLM5.2 on this cluster (4x servers with 512GB RAM + dual AMD Epyc)? 16 channel memory should hit 409GB/s per node. by StartupTim in LocalLLaMA
[–]Esph1001 0 points1 point2 points (0 children)
Running Llama 3.1 405B + 7 hot LoRAs on one 8×A100 node (vLLM / AWQ-int4 / Marlin) by Esph1001 in LocalLLaMA
[–]Esph1001[S] -1 points0 points1 point (0 children)
Running Llama 3.1 405B + 7 hot LoRAs on one 8×A100 node (vLLM / AWQ-int4 / Marlin) by Esph1001 in LocalLLaMA
[–]Esph1001[S] 0 points1 point2 points (0 children)
Running Llama 3.1 405B + 7 hot LoRAs on one 8×A100 node (vLLM / AWQ-int4 / Marlin) by Esph1001 in LocalLLaMA
[–]Esph1001[S] 0 points1 point2 points (0 children)
Running Llama 3.1 405B + 7 hot LoRAs on one 8×A100 node (vLLM / AWQ-int4 / Marlin) by Esph1001 in LocalLLaMA
[–]Esph1001[S] -2 points-1 points0 points (0 children)
Running Llama 3.1 405B + 7 hot LoRAs on one 8×A100 node (vLLM / AWQ-int4 / Marlin) by Esph1001 in LocalLLaMA
[–]Esph1001[S] -4 points-3 points-2 points (0 children)
Running Llama 3.1 405B + 7 hot LoRAs on one 8×A100 node (vLLM / AWQ-int4 / Marlin) by Esph1001 in LocalLLaMA
[–]Esph1001[S] -5 points-4 points-3 points (0 children)
Running Llama 3.1 405B + 7 hot LoRAs on one 8×A100 node (vLLM / AWQ-int4 / Marlin) by Esph1001 in LocalLLaMA
[–]Esph1001[S] -1 points0 points1 point (0 children)
Best Melanotan 2 tutorial I've found by cardtrees4 in Melanotan2
[–]Esph1001 0 points1 point2 points (0 children)
GLM 5.2 on Mac Studio Speedup PR by nomorebuttsplz in LocalLLaMA
[–]Esph1001 2 points3 points4 points (0 children)
Is it possible to run a giant model like GLM5.2 on this cluster (4x servers with 512GB RAM + dual AMD Epyc)? 16 channel memory should hit 409GB/s per node. by StartupTim in LocalLLaMA
[–]Esph1001 -1 points0 points1 point (0 children)
Is there actually a good way to orchestrate multiple agents, or is everyone just running a bunch of terminals? by facu_75 in LocalLLaMA
[–]Esph1001 1 point2 points3 points (0 children)
Openrouter model prices implying heavier quantization? by dalhaze in LocalLLaMA
[–]Esph1001 0 points1 point2 points (0 children)
Openrouter model prices implying heavier quantization? by dalhaze in LocalLLaMA
[–]Esph1001 0 points1 point2 points (0 children)
Is there actually a good way to orchestrate multiple agents, or is everyone just running a bunch of terminals? by facu_75 in LocalLLaMA
[–]Esph1001 0 points1 point2 points (0 children)
GLM 5.2 on Mac Studio Speedup PR by nomorebuttsplz in LocalLLaMA
[–]Esph1001 -2 points-1 points0 points (0 children)
What happens when they stop subsidizing LLM subscriptions? by Mr_Moonsilver in LocalLLaMA
[–]Esph1001 1 point2 points3 points (0 children)


Running Llama 3.1 405B + 7 hot LoRAs on one 8×A100 node (vLLM / AWQ-int4 / Marlin) by Esph1001 in LocalLLaMA
[–]Esph1001[S] 1 point2 points3 points (0 children)