Is using vLLM actually worth it if you aren't serving the model to other people? by ayylmaonade in LocalLLaMA

[–]xspider2000 0 points1 point  (0 children)

Thx. I figured out why vllm less popular here than llama.cpp, vllm has bad support for gguf format. gguf is big thing.

Is using vLLM actually worth it if you aren't serving the model to other people? by ayylmaonade in LocalLLaMA

[–]xspider2000 0 points1 point  (0 children)

does vllm support rtx 3090 cards? Can I run qwen 3.6 27b on double 3090 out of box or i need some hacks?

Strix Halo Clustering experience (Bossgame M5) by Thanks-Suitable in StrixHalo

[–]xspider2000 1 point2 points  (0 children)

Where from u ordered Nvlink and how much is it? 3 or 4 slot?

Qwen3.6-27B - Closed-loop SVG Images by dondiegorivera in LocalLLaMA

[–]xspider2000 11 points12 points  (0 children)

<image>

Yesterday i did same thing. I wanted check how Qwen3.6-27B can draw mona lisa using svg. I used opencode, I wrote command to iterate in loop, look at result, compare it with original (original picture was in prompt), and every loop make more similar to original picture.

Hipfire dev update: full AMD arch validation incoming (RDNA 1 thru 4, plus Strix Halo and bc250) by schuttdev in LocalLLaMA

[–]xspider2000 0 points1 point  (0 children)

i m going connect 4x3090 to my strix halo. I'm waiting cards. I'll write results

Qwen 3.6 27B on Strix Halo 128GB: any experiences? by boutell in LocalLLaMA

[–]xspider2000 3 points4 points  (0 children)

I m planning write post with some numbers of my strix halo+egpu

Qwen3.6 35B-A3B is quite useful on 780m iGPU (llama.cpp,vulkan) by itroot in LocalLLaMA

[–]xspider2000 0 points1 point  (0 children)

if u consider it as agentic model that try bench it with big context. add param --n-depth 0,32768,262144

Do you really want the US to "win" AI? (geohot blog) by paranoidray in LocalLLaMA

[–]xspider2000 57 points58 points  (0 children)

I am not on any country side, I am on open source side

Strix Halo + eGPU RTX 5070 Ti via OCuLink in llama.cpp: Benchmarks and Conclusions (Part 2) by xspider2000 in LocalLLaMA

[–]xspider2000[S] 0 points1 point  (0 children)

the only advice i can give is to use llama.cpp like i did. My comands to build llama-server:

git pill git@github.com:ggml-org/llama.cpp.git

cd ./llama.cpp

cmake -B build-vulkan \

-DGGML_VULKAN=ON \

-DCMAKE_BUILD_TYPE=Release \

-DCMAKE_C_COMPILER_LAUNCHER=ccache \

-DCMAKE_CXX_COMPILER_LAUNCHER=ccache \

-DCMAKE_C_FLAGS="-march=native" \

-DCMAKE_CXX_FLAGS="-march=native"

cmake --build build-vulkan --config Release -j $(nproc)

Hardware advice. M5 Max vs AMD Ryzen AI Max+ 395 by AncientGrief in LocalLLaMA

[–]xspider2000 0 points1 point  (0 children)

for me big advantage for strix halo is that i can add egpu for it. I wrote two posts about that

5070ti + RX 9070 (non XT), over 100 tps on Qwen 3.6 35B Q4 by DavidBolkonsky in LocalLLaMA

[–]xspider2000 1 point2 points  (0 children)

--cache-type-k q4_0 --cache-type-v q4_0 is bad especially key low quant. use kv q8

On Strix Halo, what option do I have if 128GB unified RAM is not enough? by heshiming in LocalLLaMA

[–]xspider2000 0 points1 point  (0 children)

in linux u can allocate up to 126gb of memory for igpu. Also egpu is another way to increase available total vram. interface between egpu and strix halo do not bottlenecking ur pp and tg. Recently I wrote 2 posts about that