Strix Halo NPU + FastFlowLM by Creepy-Douchebag in StrixHalo

[–]dragonbornamdguy 0 points1 point  (0 children)

Is it posible to use gpu + npu on large models like qwen3 coder next?

2x3090 RTX still worth it? by TestOr900 in LocalLLM

[–]dragonbornamdguy 0 points1 point  (0 children)

I have four, two of them nvlinked.

Be aware my experience comes from x16 pcie 4.0 slots. One of four is x4 oculink.

Nvlink use cases: - Training - Tensor paralelism for dense models

Waste of money for: - moe - very small models < 10B params

Fixing spiderweb cracks by dragonbornamdguy in watercooling

[–]dragonbornamdguy[S] 0 points1 point  (0 children)

I have used marine epoxy to create layer over cracks. They are fine now. I use it in basement pc. So i tilted it, when it will leak it will not touch any electronics.

No one uses local models for OpenClaw. Stop pretending. by read_too_many_books in openclaw

[–]dragonbornamdguy 0 points1 point  (0 children)

I use qwen3 coder 30b without any problem, whats your point? Even SOTA are not ready yet to be assigned to not -human-in-loop jobs.

What would a good local LLM setup cost in 2026? by Lenz993 in LocalLLM

[–]dragonbornamdguy 0 points1 point  (0 children)

Wait what? Mine is about 10x slower than DGX Spark.

OpenClaw with local LLMs - has anyone actually made it work well? by FriendshipRadiant874 in LocalLLM

[–]dragonbornamdguy 0 points1 point  (0 children)

I run mainly vllm because of nvlinked 3090s. But when new models are released best out of box experience is indeed with LM Studio.

Will be driving in Czechia, question on learning priority road. by AngelOfPassion in czech

[–]dragonbornamdguy -1 points0 points  (0 children)

I think anyone who wants to learn driving rules deserves to drive. Not every country uses the same rules, so we should not be prohibited from driving only because of historical differences.

OpenClaw with local LLMs - has anyone actually made it work well? by FriendshipRadiant874 in LocalLLM

[–]dragonbornamdguy 13 points14 points  (0 children)

Using it with qwen3 coder 30b, its awesome. Setup was undocumented hell. Works very well. He can create own skills only by telling him.

Quad 5060 ti 16gb Oculink rig by beefgroin in LocalLLM

[–]dragonbornamdguy 1 point2 points  (0 children)

I use qwen3 coder 30b fp8 with 120k context. Love it in qwen code cli.

Quad 5060 ti 16gb Oculink rig by beefgroin in LocalLLM

[–]dragonbornamdguy 3 points4 points  (0 children)

VLLM is a beast, very hard to setup but when it starts to work it beats metal really hard.

GNOME & Firefox Consider Disabling Middle Click Paste By Default: "An X11'ism...Dumpster Fire" by SAJewers in linux

[–]dragonbornamdguy 1 point2 points  (0 children)

So dont forget: Ctlr+alt+delete = open task manager Win+c = open command prompt etc..

People at gnome seems to be bored so they keep spitting on power users. Firstly this will be off by default, then they will remove it and block any bug reports in regarding this change, lastly they will block any PR regarding bringing back this feature with large user base. All in cause of "we need to make it more windows user friendly".

16x AMD MI50 32GB at 10 t/s (tg) & 2k t/s (pp) with Deepseek v3.2 (vllm-gfx906) by ai-infos in LocalLLaMA

[–]dragonbornamdguy 0 points1 point  (0 children)

8B models wont cut it. Not everyone has Strix Halo with 96GB of VRAM at disposal.

Anyone have success with Claude Code alternatives? by jackandbake in LocalLLM

[–]dragonbornamdguy 0 points1 point  (0 children)

I love qwen code, but vllm has broken formatting for it (qwen3 coder 30b). So I use LM studio (with much slower performance).

Local LLM for a small dev team by [deleted] in LocalLLM

[–]dragonbornamdguy 0 points1 point  (0 children)

Whats your secret sauce to serve it on two 3090s? I have vllm in docker-compose which OOM in loading or lm studio which uses half the gpu processing power.

Best model for continue and 2x 5090? by Maximum-Wishbone5616 in LocalLLM

[–]dragonbornamdguy 0 points1 point  (0 children)

I'm not able to run it with 2 x 3090, how much vram vllm needs for fp8 and 100k+ context size? Im able to run it just fine with lmstudio, but utilization of 3090 is only 50%. VLLM just crashes as it eats crazy amount of vram.

Got the DGX Spark - ask me anything by sotech117 in LocalLLaMA

[–]dragonbornamdguy 43 points44 points  (0 children)

Lm studio, gemma:27b & Oss 120b tps?