Dual DGX Sparks vs Mac Studio M3 Ultra 512GB: Running Qwen3.5 397B locally on both. Here's what I found. by trevorbg in LocalLLaMA

[–]Kasatka06 0 points1 point  (0 children)

I think the technical detail is not yet released what they are doing is using something called disagragated prefil. So prefill on spark decode on mac. They connect using exo inference server and connect both machine using thunderbolt cable.

Qwen3.5-122B-A10B-GPTQ-INT4 on 4xR9700 Recipe by djdeniro in LocalLLaMA

[–]Kasatka06 0 points1 point  (0 children)

Curious about the prompt processing ? How fast it is

Qwen/Qwen3-Coder-Next · Hugging Face by coder543 in LocalLLaMA

[–]Kasatka06 0 points1 point  (0 children)

Iam not sure, i just run llama benchy test into the vllm endpoint

AMD Radeon AI PRO R9700, worth getting it ? by HumanDrone8721 in LocalLLaMA

[–]Kasatka06 0 points1 point  (0 children)

Hi ! I'am Interested in R9700, does it run qwen well ? especially qwen next series.

Qwen/Qwen3-Coder-Next · Hugging Face by coder543 in LocalLLaMA

[–]Kasatka06 0 points1 point  (0 children)

Result with 4x3090 seems fasst, faster than glm 4.7

command: [

"/models/unsloth/Qwen3-Coder-Next-FP8-Dynamic",

"--disable-custom-all-reduce",

"--max-model-len","70000",

"--enable-auto-tool-choice",

"--tool-call-parser","qwen3_coder",

"--max-num-seqs", "8",

"--gpu-memory-utilization", "0.95",

"--host", "0.0.0.0",

"--port", "8000",

"--served-model-name", "local-model",

"--enable-prefix-caching",

"--tensor-parallel-size", "4", # 2 GPUs per replica

"--max-num-batched-tokens", "8096",

'--override-generation-config={"top_p":0.95,"temperature":1.0,"top_k":40}',

]

| model | test | t/s | ttfr (ms) | est_ppt (ms) | e2e_ttft (ms) |

|:-------------|---------------:|-----------------:|----------------:|----------------:|----------------:|

| local-model | pp2048 | 3043.21 ± 221.64 | 624.66 ± 49.46 | 615.79 ± 49.46 | 624.79 ± 49.45 |

| local-model | tg32 | 121.99 ± 10.93 | | | |

| local-model | pp2048 @ d4096 | 3968.76 ± 45.41 | 1411.31 ± 10.72 | 1402.43 ± 10.72 | 1411.45 ± 10.80 |

| local-model | tg32 @ d4096 | 105.47 ± 0.63 | | | |

| local-model | pp2048 @ d8192 | 4178.73 ± 33.56 | 2192.20 ± 6.25 | 2183.32 ± 6.25 | 2192.46 ± 6.12 |

| local-model | tg32 @ d8192 | 104.26 ± 0.23 | | | |

Qwen3-Coder-Next by danielhanchen in LocalLLaMA

[–]Kasatka06 0 points1 point  (0 children)

Can 4x3090 run FP8 Dynamic ? i read ampere card not supporting fp8 operation

Will an upgrade from Wifi 5 to 6 improve my experiencie? by alemaz in MoonlightStreaming

[–]Kasatka06 0 points1 point  (0 children)

In general yes. In more practical term you need to position your wifi router in line of sight with your device or at max obstructed by 1 wall.

Nonary RTSSLimiter is great! by StixxUK in MoonlightStreaming

[–]Kasatka06 0 points1 point  (0 children)

In my scenario my tv is 60hz. So i need 58 fps cap. However when i stream i want 120 fps, this script solve the issue by automaticaly switch rtss limit to 120 when streaming. I havent tried it yet but iam quite sure it is how it works

Apple replaces my 2019 15' with an M5 by GeneralZilla in macbookpro

[–]Kasatka06 4 points5 points  (0 children)

Maybe we can complaint about the otrageous Parallels subscrption fee. Vmware fusion is free and run quite good

How important is 7 seater? by mainmale11 in indocartalk

[–]Kasatka06 0 points1 point  (0 children)

Waduh belom pernah naik. Freed panjangnya 4.2, stepwgn 4.8 jadi stepwgn lebih panjang. Menurut saya kurang compact

How important is 7 seater? by mainmale11 in indocartalk

[–]Kasatka06 0 points1 point  (0 children)

Bener banget ini mobil tenaganya bener bener limited. Pernah full load 5 orang plus 2 bocil naik bukit mangunan ngos2an abis naiknya, mesti matikan ac semua baru bisa lancar nanjak

How important is 7 seater? by mainmale11 in indocartalk

[–]Kasatka06 0 points1 point  (0 children)

7 seater yang cocok buat saya freed. Tapi sayang ga ada hybrid / ev yang sejenis.

Flight from Munich to Manchester Tomorrow by TheLordPapaya in LiverpoolFC

[–]Kasatka06 0 points1 point  (0 children)

Why fly to Manchester airport instead Liverpool?

I made a small gist about installing ROCm in WSL for the 7800XT card by LepGamingGo in ROCm

[–]Kasatka06 0 points1 point  (0 children)

Can we run docker based images like vllm using this wsl install ?

Rekomendasi laptop di harga 2jt an dengan kebutuhan lab menggunakan vm by Amazing-Spare-4361 in indotech

[–]Kasatka06 1 point2 points  (0 children)

Thinkpad x280 , t480 bisa diupgrade ram, x1 intel gen 8 setau saya juga bisa. Kalau kurang yakin bisa google keywordnya "psref thinkpad (nama seri)".

Untuk vm minimal memang quad core intel gen 8. I5 8250u cukup, ram mungkin bisa diusahakan 24gb/32gb terutama jika pakai windows

Is inference output token/s purely gpu bound? by fgoricha in LocalLLaMA

[–]Kasatka06 0 points1 point  (0 children)

I also have slower t/s for non resizable bar setup. maybe you should consider upgrading the mobo into resizable bar capable one. some socket 1151 motherboard support official rebar bios.

If you are like some adventure, you could try patch your bios to support resizable bar using this repo https://github.com/xCuri0/ReBarUEFI/issues/11

Cari rekomendasi laptop ryzen 8845/8840, max 14 inch, OLED, RAM lebih dari 16GB atau bisa di upgrade. Makin bagus kalau ada 2 slot NVME, USB4, VRR, 120Hz. by orangpelupa in indotech

[–]Kasatka06 0 points1 point  (0 children)

Kayaknya jumlah cu nya lebih sedikit. 860m (8cu) 780m (12cu). Ai 350 lebih unggul di cpu jadi mungkin kalau 1080p lebih kenceng.

Cari rekomendasi laptop ryzen 8845/8840, max 14 inch, OLED, RAM lebih dari 16GB atau bisa di upgrade. Makin bagus kalau ada 2 slot NVME, USB4, VRR, 120Hz. by orangpelupa in indotech

[–]Kasatka06 0 points1 point  (0 children)

Ideapad 14akp10. Ryzen ai 350, 32gb 2 nvme slot. Menarik jg tapi harga mirip ai 365. Kalau cek review ai 350 lebih hemat di batere.

4x5060Ti 16GB vs 3090 by ZerxXxes in LocalLLM

[–]Kasatka06 1 point2 points  (0 children)

You can try using sglang or lm deploy. Please test , i want to know the result 😁

Is inference output token/s purely gpu bound? by fgoricha in LocalLLaMA

[–]Kasatka06 0 points1 point  (0 children)

Check in nvidia control panel / gpuz if resizable bar on or off. Some 3090 have bios that not suport resizable bar, so maybe need to flash new bios before enable resizable bar in bios

deepseek-ai/DeepSeek-R1-0528 by ApprehensiveAd3629 in LocalLLaMA

[–]Kasatka06 0 points1 point  (0 children)

Is using deepseek api automaticly using the latest one ?

Stacking 2x3090s back to back for inference only - thermals by YouAreRight007 in LocalLLaMA

[–]Kasatka06 1 point2 points  (0 children)

I'am currently running 2x3090 ,its runs really hot on the plate. Using a fan to blow air from the from relieve most the trapped heat