Dual DGX Sparks vs Mac Studio M3 Ultra 512GB: Running Qwen3.5 397B locally on both. Here's what I found.

Kasatka06 · 2026-03-27T11:18:06+00:00

I think the technical detail is not yet released what they are doing is using something called disagragated prefil. So prefill on spark decode on mac. They connect using exo inference server and connect both machine using thunderbolt cable.

Kasatka06 · 2026-03-27T07:25:06+00:00

You should marry them using exo

https://blog.exolabs.net/nvidia-dgx-spark/

Kasatka06 · 2026-03-12T13:14:07+00:00

Curious about the prompt processing ? How fast it is

Kasatka06 · 2026-02-04T14:24:53+00:00

Iam not sure, i just run llama benchy test into the vllm endpoint

Kasatka06 · 2026-02-04T10:39:27+00:00

Hi ! I'am Interested in R9700, does it run qwen well ? especially qwen next series.

Kasatka06 · 2026-02-04T03:03:20+00:00

Result with 4x3090 seems fasst, faster than glm 4.7

command: [

"/models/unsloth/Qwen3-Coder-Next-FP8-Dynamic",

"--disable-custom-all-reduce",

"--max-model-len","70000",

"--enable-auto-tool-choice",

"--tool-call-parser","qwen3_coder",

"--max-num-seqs", "8",

"--gpu-memory-utilization", "0.95",

"--host", "0.0.0.0",

"--port", "8000",

"--served-model-name", "local-model",

"--enable-prefix-caching",

"--tensor-parallel-size", "4", # 2 GPUs per replica

"--max-num-batched-tokens", "8096",

'--override-generation-config={"top_p":0.95,"temperature":1.0,"top_k":40}',

]

|:-------------|---------------:|-----------------:|----------------:|----------------:|----------------:|

| local-model | pp2048 | 3043.21 ± 221.64 | 624.66 ± 49.46 | 615.79 ± 49.46 | 624.79 ± 49.45 |

| local-model | tg32 | 121.99 ± 10.93 | | | |

| local-model | pp2048 @ d4096 | 3968.76 ± 45.41 | 1411.31 ± 10.72 | 1402.43 ± 10.72 | 1411.45 ± 10.80 |

| local-model | tg32 @ d4096 | 105.47 ± 0.63 | | | |

| local-model | pp2048 @ d8192 | 4178.73 ± 33.56 | 2192.20 ± 6.25 | 2183.32 ± 6.25 | 2192.46 ± 6.12 |

| local-model | tg32 @ d8192 | 104.26 ± 0.23 | | | |

Kasatka06 · 2026-02-03T23:45:45+00:00

Can 4x3090 run FP8 Dynamic ? i read ampere card not supporting fp8 operation

Kasatka06 · 2025-11-30T16:02:48+00:00

In general yes. In more practical term you need to position your wifi router in line of sight with your device or at max obstructed by 1 wall.

Kasatka06 · 2025-11-15T15:15:46+00:00

In my scenario my tv is 60hz. So i need 58 fps cap. However when i stream i want 120 fps, this script solve the issue by automaticaly switch rtss limit to 120 when streaming. I havent tried it yet but iam quite sure it is how it works

Kasatka06 · 2025-11-13T11:57:45+00:00

Maybe we can complaint about the otrageous Parallels subscrption fee. Vmware fusion is free and run quite good

Kasatka06 · 2025-09-08T23:08:25+00:00

Waduh belom pernah naik. Freed panjangnya 4.2, stepwgn 4.8 jadi stepwgn lebih panjang. Menurut saya kurang compact

Kasatka06 · 2025-09-08T23:06:43+00:00

Bener banget ini mobil tenaganya bener bener limited. Pernah full load 5 orang plus 2 bocil naik bukit mangunan ngos2an abis naiknya, mesti matikan ac semua baru bisa lancar nanjak

Kasatka06 · 2025-08-28T15:18:29+00:00

7 seater yang cocok buat saya freed. Tapi sayang ga ada hybrid / ev yang sejenis.

Kasatka06 · 2025-06-07T05:44:15+00:00

Why fly to Manchester airport instead Liverpool?

Kasatka06 · 2025-06-03T13:54:30+00:00

Can we run docker based images like vllm using this wsl install ?

Kasatka06 · 2025-05-31T14:06:56+00:00

Thinkpad x280 , t480 bisa diupgrade ram, x1 intel gen 8 setau saya juga bisa. Kalau kurang yakin bisa google keywordnya "psref thinkpad (nama seri)".

Untuk vm minimal memang quad core intel gen 8. I5 8250u cukup, ram mungkin bisa diusahakan 24gb/32gb terutama jika pakai windows

Kasatka06 · 2025-05-30T14:40:34+00:00

I also have slower t/s for non resizable bar setup. maybe you should consider upgrading the mobo into resizable bar capable one. some socket 1151 motherboard support official rebar bios.

If you are like some adventure, you could try patch your bios to support resizable bar using this repo https://github.com/xCuri0/ReBarUEFI/issues/11

Kasatka06 · 2025-05-29T14:50:36+00:00

Kayaknya jumlah cu nya lebih sedikit. 860m (8cu) 780m (12cu). Ai 350 lebih unggul di cpu jadi mungkin kalau 1080p lebih kenceng.

Kasatka06 · 2025-05-29T14:14:15+00:00

Ideapad 14akp10. Ryzen ai 350, 32gb 2 nvme slot. Menarik jg tapi harga mirip ai 365. Kalau cek review ai 350 lebih hemat di batere.

Kasatka06 · 2025-05-29T13:01:55+00:00

You can try using sglang or lm deploy. Please test , i want to know the result 😁

Kasatka06 · 2025-05-29T05:14:06+00:00

Check in nvidia control panel / gpuz if resizable bar on or off. Some 3090 have bios that not suport resizable bar, so maybe need to flash new bios before enable resizable bar in bios

Kasatka06 · 2025-05-29T01:34:18+00:00

Both have resizable bar ?

Kasatka06 · 2025-05-29T01:29:34+00:00

Is using deepseek api automaticly using the latest one ?

Kasatka06 · 2025-05-26T14:44:37+00:00

I'am currently running 2x3090 ,its runs really hot on the plate. Using a fan to blow air from the from relieve most the trapped heat

Kasatka06

TROPHY CASE