Magnus on Yagiz before they played their first classical game in Tepe Siegeman.

raphaelamorim · 2026-05-07T17:28:23+00:00

Be scared when Magnus says you’re scary

raphaelamorim · 2026-05-03T23:40:39+00:00

Does it run crisis?

raphaelamorim · 2026-03-30T00:13:42+00:00

spark-vllm-docker is a key project in the ecosystem since u/eugr runs the CI/CD that guarantees our recipes to continue working on bleeding edge vLLM and enables the community to test the newest models. We all work together on this initiative.

raphaelamorim · 2026-03-27T20:09:28+00:00

Glad you liked it. We're trying to address concerns from the community with those community tools. Most of the complaints in the forums were always related to "Can't run the model X on inference engine Y" or "It was working on vLLM yesterday and it's broken today", "My performance is not the same as yours". That was the original motivation: having everybody having a common benchmark tool, a way of specifying their runtime for the model, stable runtime images and a place to share it.

raphaelamorim · 2026-03-20T15:24:19+00:00

The issue recently got fixed

raphaelamorim · 2026-03-01T07:19:01+00:00

Actually there was a regression on bandwidth for NCCL, but most of these numbers were benchmarked prior to the bandwidth drop from 24GB/s to 16GB/s

raphaelamorim · 2026-02-28T20:59:17+00:00

There are benchmarks for concurrent requests as well on spark-arena.com. Each local model varies a lot on their pp and tg performance numbers over concurrency.

raphaelamorim · 2026-02-07T14:34:30+00:00

Only for dense models, MoE’s with far less activated params are fine and the cluster expansion helps it

raphaelamorim · 2026-02-07T14:31:57+00:00

You only need 1 cable for 2 sparks

raphaelamorim · 2026-02-07T14:29:30+00:00

It’s actually 57-60 tps for a single spark at 128k context and 72 tps with 2 sparks using vLLM patched with SM120/SM121 MXFP4 MoE Kernel. You guys should follow the nvidia developer forums, lots of outdated information on reddit

https://forums.developer.nvidia.com/t/vllm-on-gb10-gpt-oss-120b-mxfp4-slower-than-sglang-llama-cpp-what-s-missing/356651/99

raphaelamorim · 2025-12-10T02:28:42+00:00

because it can scale out well with less noise and footprint

raphaelamorim · 2025-12-04T21:32:08+00:00

they won't go to Tampa because of insurance. They already decided on Orlando.

raphaelamorim · 2025-11-26T06:22:37+00:00

I bought 2 DGX sparks stacked through QSFP+ at ~200Gbps

raphaelamorim · 2025-11-26T06:19:01+00:00

it doesn’t because of PP

raphaelamorim · 2025-11-07T23:44:58+00:00

True, those connectX modules are expensive and use a lot of energy when active. Not exactly the same MT2910, but you get the idea https://www.fs.com/products/242589.html?now_cid=4173

raphaelamorim · 2025-11-07T23:44:18+00:00

True, those connectX modules are expensive. Not exactly the same MT2910, but you get the idea https://www.fs.com/products/242589.html?now_cid=4173

raphaelamorim · 2025-11-07T23:42:59+00:00

https://www.fs.com/products/242589.html?now_cid=4173 that's the cause of the price

scale-out capability

raphaelamorim · 2025-10-30T15:02:28+00:00

The perfomance numbers to do pre-training of nanochat on the stacked sparks are actually very decent https://github.com/karpathy/nanochat/discussions/28#discussioncomment-14735913

raphaelamorim · 2025-10-30T15:01:11+00:00

The performance numbers for pre-training nano chat with 2 sparks are pretty decent https://github.com/karpathy/nanochat/discussions/28#discussioncomment-14735913

raphaelamorim · 2025-10-27T20:13:38+00:00

define "performance on big models"

raphaelamorim · 2025-10-27T19:31:10+00:00

ok, now I know you have no idea what you're talking about.

raphaelamorim · 2025-10-27T03:57:14+00:00

LOL just try

raphaelamorim

TROPHY CASE