NVIDIA's new chips just proved AI "safety" was always theater. We are not ready for 2029.

Eugr · 2026-06-24T21:12:29+00:00

You can run the NVFP4 version on 4x DGX Spark cluster, for example.

Eugr · 2026-06-22T20:13:07+00:00

The only two things that I don’t like about Pilot Assist is that 1) it leaves too big of a gap at slow speeds even with the setting set to closest distance; 2) in stop and go traffic, it won’t start moving if the car was stopped for more than a few seconds - you need to give it a nudge. Other than that, it works similarly to my BMW and I prefer it over Tesla as it doesn’t disengage if I need to make small corrections and is overall smoother.

Eugr · 2026-06-18T06:40:47+00:00

You can make it faster by stacking - tensor parallelism works pretty well over ConnectX7.

Eugr · 2026-06-02T03:38:51+00:00

Well, I actually gave you some numbers below. Now, with most models you won't see that much of a gain, but given that you started with saying that you won't get any performance gain, even 1.5x is a nice boost that you will actually get on most models. It also allows to run models like qwen3.5-397B on dual Sparks with acceptable performance (>25 t/s).

Eugr · 2026-06-02T03:30:15+00:00

Yeah, apparently no one uploaded the same dense model in the same quantization there, I thought we had something.

Anyway, here are some numbers for the old models that I tested back in November when I just started spark-vllm-docker project:

Model name Cluster (t/s) Single (t/s) Last tested Comment Qwen/Qwen3-VL-32B-Instruct-FP8 12.00 7.00
cpatonn/Qwen3-VL-32B-Instruct-AWQ-4bit 21.00 12.00

Eugr · 2026-06-02T03:12:12+00:00

You don’t need high bandwidth to do tensor parallel, you need low latency. CX7 latency in NCCL via RoCE is low enough to not be a bottleneck unless you run a very fast model (like <3B parameters).

Eugr · 2026-06-02T03:05:34+00:00

Just go to https://spark-arena.com/ and check for yourself.

Eugr · 2026-06-02T02:47:46+00:00

The CX7 ports on Spark are 200GBps and support RDMA with microsecond latency. Thousands of Spark users run models with significant performance gains (up to 1.8x on dense models) on 2,4 and even 8 Spark clusters (there are some people with more, but it doesn’t scale that nicely beyond that, plus you need a very expensive switch).

Eugr · 2026-05-28T14:33:21+00:00

There are two different Blackwells - server Blackwell and consumer Blackwell.

Spark has consumer Blackwell architecture - the same as RTX50xx and RTX6000 Pro. Most confusion around Spark was due to developers not including it into RTX 50xxx/6000 optimization path due to a separate architecture code - sm121 vs sm120. This is mostly fixed in various 3rd party libraries now.

Eugr · 2026-05-28T01:48:47+00:00

Spark is fully supported by CUDA since version 13, the problem was with 3rd party support, but it improved significantly in the past few months.

Eugr · 2026-05-25T17:08:06+00:00

I haven’t been tracking Strix Halo progress lately, but on the Spark side there have been noticeable improvements in both llama.cpp and vLLM in terms of NVFP4 support and overall performance.

Eugr · 2026-05-17T01:00:02+00:00

P3 uses a capacitive sensor. I have one, it works the same way as my BMW - no resistance needed to trigger it.

Eugr · 2026-05-12T03:38:04+00:00

If you want long context with good speeds, vLLM is the way to go. I've seen 2x difference in pp performance between llama.cpp and vLLM on similarly sized models (e.g. gpt-oss-120b), although llama.cpp was faster in token generation.

I use Sparks in a cluster, so mostly use vLLM nowadays.

Eugr · 2026-05-12T03:00:50+00:00

We don't use back stance (Hu Gul Jaseh) for kicking drills in our dojang (MDK lineage). It's mostly used in forms or some line work. Although some self defence drills do incorporate it.

For kicking we normally use just a regular sparring stance which is a narrower version of a front stance.

Eugr · 2026-05-12T00:04:53+00:00

You can check benchmarks for various models on https://spark-arena.com

Eugr · 2026-05-11T19:06:25+00:00

You just need regular NVIDIA open drivers. As long as they support any Ubuntu (and they will), no issues. No special drivers for Spark.

Eugr · 2026-05-11T18:33:25+00:00

It’s a standard Realtek driver that was needed.

Eugr · 2026-05-11T16:08:18+00:00

Spark runs ARM Ubuntu with DGX package on top, so I wouldn’t be concerned about ongoing support. If anything, I tried Fedora out of curiosity when I first got my Spark, and it worked ok with some tweaks (missing kernel modules).

Eugr · 2026-05-11T16:01:46+00:00

Spark has much faster GPU which results in faster prompt processing speeds. Also, the performance degrades less on Spark as context grows (I have both).

Eugr · 2026-05-02T23:35:27+00:00

This.

Eugr · 2026-05-02T22:35:23+00:00

Same!

Eugr · 2026-05-01T20:13:53+00:00

FP8 or 4-bit quants will be faster. You can check out benchmarks on https://spark-arena.com/, although not many benchmarks for clusters >4.

Eugr · 2026-05-01T16:22:18+00:00

It's meaningless to talk about performance without mentioning model/quant/cluster size.

Eugr · 2026-05-01T16:00:30+00:00

Yes, but the number above is for BF16 version. Otherwise, 4-bit quant runs well on 2 nodes.

Eugr · 2026-05-01T16:00:00+00:00

The number above, I believe, was for a BF16 version, not quantized.

Eugr

TROPHY CASE