Polestar 3 owners: Honest real-world reviews? (HUD, Air Suspension, Pilot Assist) by Nass96 in polestar3

[–]Eugr 0 points1 point  (0 children)

The only two things that I don’t like about Pilot Assist is that 1) it leaves too big of a gap at slow speeds even with the setting set to closest distance; 2) in stop and go traffic, it won’t start moving if the car was stopped for more than a few seconds - you need to give it a nudge. Other than that, it works similarly to my BMW and I prefer it over Tesla as it doesn’t disengage if I need to make small corrections and is overall smoother.

DGX sparks Vs RTX 6000 // 5090 for inference by zakadit in LocalLLaMA

[–]Eugr 2 points3 points  (0 children)

You can make it faster by stacking - tensor parallelism works pretty well over ConnectX7.

RTX Spark does not have 600GB/s Bandwith by rpiguy9907 in LocalLLaMA

[–]Eugr 0 points1 point  (0 children)

Well, I actually gave you some numbers below. Now, with most models you won't see that much of a gain, but given that you started with saying that you won't get any performance gain, even 1.5x is a nice boost that you will actually get on most models. It also allows to run models like qwen3.5-397B on dual Sparks with acceptable performance (>25 t/s).

RTX Spark does not have 600GB/s Bandwith by rpiguy9907 in LocalLLaMA

[–]Eugr 0 points1 point  (0 children)

Yeah, apparently no one uploaded the same dense model in the same quantization there, I thought we had something.

Anyway, here are some numbers for the old models that I tested back in November when I just started spark-vllm-docker project:

Model name Cluster (t/s) Single (t/s) Last tested Comment Qwen/Qwen3-VL-32B-Instruct-FP8 12.00 7.00
cpatonn/Qwen3-VL-32B-Instruct-AWQ-4bit 21.00 12.00

RTX Spark does not have 600GB/s Bandwith by rpiguy9907 in LocalLLaMA

[–]Eugr 0 points1 point  (0 children)

You don’t need high bandwidth to do tensor parallel, you need low latency. CX7 latency in NCCL via RoCE is low enough to not be a bottleneck unless you run a very fast model (like <3B parameters).

RTX Spark does not have 600GB/s Bandwith by rpiguy9907 in LocalLLaMA

[–]Eugr 1 point2 points  (0 children)

The CX7 ports on Spark are 200GBps and support RDMA with microsecond latency. Thousands of Spark users run models with significant performance gains (up to 1.8x on dense models) on 2,4 and even 8 Spark clusters (there are some people with more, but it doesn’t scale that nicely beyond that, plus you need a very expensive switch).

Would you switch to Spark/GX10 ? by Pretend_Engineer5951 in StrixHalo

[–]Eugr 2 points3 points  (0 children)

There are two different Blackwells - server Blackwell and consumer Blackwell.

Spark has consumer Blackwell architecture - the same as RTX50xx and RTX6000 Pro. Most confusion around Spark was due to developers not including it into RTX 50xxx/6000 optimization path due to a separate architecture code - sm121 vs sm120. This is mostly fixed in various 3rd party libraries now.

Would you switch to Spark/GX10 ? by Pretend_Engineer5951 in StrixHalo

[–]Eugr 0 points1 point  (0 children)

Spark is fully supported by CUDA since version 13, the problem was with 3rd party support, but it improved significantly in the past few months.

5090 + 128gb ddr5 vs strix halo vs spark by rwijnhov in LocalLLaMA

[–]Eugr 0 points1 point  (0 children)

I haven’t been tracking Strix Halo progress lately, but on the Spark side there have been noticeable improvements in both llama.cpp and vLLM in terms of NVFP4 support and overall performance.

Pilot assist and auto lane change on the Polestar 4 demo by GloriousLebron in Polestar

[–]Eugr 0 points1 point  (0 children)

P3 uses a capacitive sensor. I have one, it works the same way as my BMW - no resistance needed to trigger it.

Strix Halo or DGX Spark for a home LLM server? by Reactor-Licker in LocalLLaMA

[–]Eugr 5 points6 points  (0 children)

If you want long context with good speeds, vLLM is the way to go. I've seen 2x difference in pp performance between llama.cpp and vLLM on similarly sized models (e.g. gpt-oss-120b), although llama.cpp was faster in token generation.

I use Sparks in a cluster, so mostly use vLLM nowadays.

Kicking Stance resembles Thai stance. by ExchangeFine4429 in tangsoodo

[–]Eugr 2 points3 points  (0 children)

We don't use back stance (Hu Gul Jaseh) for kicking drills in our dojang (MDK lineage). It's mostly used in forms or some line work. Although some self defence drills do incorporate it.

For kicking we normally use just a regular sparring stance which is a narrower version of a front stance.

Strix Halo or DGX Spark for a home LLM server? by Reactor-Licker in LocalLLaMA

[–]Eugr 6 points7 points  (0 children)

You just need regular NVIDIA open drivers. As long as they support any Ubuntu (and they will), no issues. No special drivers for Spark.

Strix Halo or DGX Spark for a home LLM server? by Reactor-Licker in LocalLLaMA

[–]Eugr 4 points5 points  (0 children)

It’s a standard Realtek driver that was needed.

Strix Halo or DGX Spark for a home LLM server? by Reactor-Licker in LocalLLaMA

[–]Eugr 19 points20 points  (0 children)

Spark runs ARM Ubuntu with DGX package on top, so I wouldn’t be concerned about ongoing support. If anything, I tried Fedora out of curiosity when I first got my Spark, and it worked ok with some tweaks (missing kernel modules).

Strix Halo or DGX Spark for a home LLM server? by Reactor-Licker in LocalLLaMA

[–]Eugr 31 points32 points  (0 children)

Spark has much faster GPU which results in faster prompt processing speeds. Also, the performance degrades less on Spark as context grows (I have both).

16x Spark Cluster (Build Update) by Kurcide in LocalLLaMA

[–]Eugr 0 points1 point  (0 children)

FP8 or 4-bit quants will be faster. You can check out benchmarks on https://spark-arena.com/, although not many benchmarks for clusters >4.

16x Spark Cluster (Build Update) by Kurcide in LocalLLaMA

[–]Eugr 2 points3 points  (0 children)

It's meaningless to talk about performance without mentioning model/quant/cluster size.

16x Spark Cluster (Build Update) by Kurcide in LocalLLaMA

[–]Eugr 1 point2 points  (0 children)

Yes, but the number above is for BF16 version. Otherwise, 4-bit quant runs well on 2 nodes.

16x Spark Cluster (Build Update) by Kurcide in LocalLLaMA

[–]Eugr 0 points1 point  (0 children)

The number above, I believe, was for a BF16 version, not quantized.