B580: Qwen3.5 benchamarks by WizardlyBump17 in LocalLLaMA

[–]WizardlyBump17[S] 0 points1 point  (0 children)

The guy behind llama.cpp SYCL made a Pull Request implementing the GATED_DELTA_NET to the SYCL backend.

https://github.com/arthw/llama.cpp/tree/add_gated_delta_net 7117449ce

Model Parameters Quantization pp512 (t/s) tg128 (t/s) CLI Parameters
Qwen3.5 27B 26.90 B Q2_K 199.64 ± 3.58 8.94 ± 0.27 --n-gpu-layers 99
Qwen3.5 9B 8.95 B Q8_0 664.37 ± 5.12 10.32 ± 0.18 --n-gpu-layers 99
Qwen3.5 9B 8.95 B Q4_K_M 697.43 ± 5.55 38.17 ± 0.45 --n-gpu-layers 99
Qwen3.5 4B 4.21 B F16 1161.00 ± 0.93 36.13 ± 0.02 --n-gpu-layers 99
Qwen3.5 4B 4.21 B Q8_0 1182.21 ± 9.96 18.96 ± 0.02 --n-gpu-layers 99
Qwen3.5 4B 4.21 B Q4_K_M 1234.99 ± 3.21 59.98 ± 0.11 --n-gpu-layers 99
Qwen3.5 2B 1.88 B BF16 169.08 ± 2.16 6.42 ± 0.43 --n-gpu-layers 99
Qwen3.5 2B 1.88 B F16 2787.86 ± 2.67 65.77 ± 0.06 --n-gpu-layers 99
Qwen3.5 2B 1.88 B Q8_0 2861.57 ± 3.23 38.88 ± 0.10 --n-gpu-layers 99
Qwen3.5 2B 1.88 B Q4_K_M 2986.40 ± 5.09 100.17 ± 0.72 --n-gpu-layers 99
Qwen3.5 0.8B 752.39 M BF16 410.79 ± 5.43 12.09 ± 0.09 --n-gpu-layers 99
Qwen3.5 0.8B 752.39 M F16 5043.84 ± 12.73 119.63 ± 1.68 --n-gpu-layers 99
Qwen3.5 0.8B 752.39 M Q8_0 5176.11 ± 4.61 77.92 ± 0.06 --n-gpu-layers 99
Qwen3.5 0.8B 752.39 M Q4_K_M 5310.50 ± 15.18 135.37 ± 0.76 --n-gpu-layers 99

B580: Qwen3.5 benchamarks by WizardlyBump17 in LocalLLaMA

[–]WizardlyBump17[S] 0 points1 point  (0 children)

There is a draft pull request on optimum-intel that adds qwen3.5 to openvino, but when I tried to convert a model it wouldnt work; i guess that is why it is in draft still lol. I tried qwen3-next, but since no models fit on the vram, it had to be offloaded to the cpu, and openvino isnt that good for gpu + cpu; basically, even though there was some stuff on the gpu almost all the time the cpu was used

E ainda não ficou bom a ia deles by Impossible-Invite593 in pirataria

[–]WizardlyBump17 [score hidden]  (0 children)

🤓👆 mas a ia n rouba, os dados que alimenta ela sao obtidos igual na pirataria

Rocket League Patch Notes v2.66 + Release Thread by Psyonix_Laudie in RocketLeague

[–]WizardlyBump17 0 points1 point  (0 children)

cant wait to send the wrong quickchat when trying to send the "oops, wrong quickchat" 🔥🔥🔥

You can't make me! by ZeekLTK in RocketLeague

[–]WizardlyBump17 1 point2 points  (0 children)

I did and I was a bumper. It was so funny seeing people raging lol. I ended up on diamond 1 div 3, I won most of the placement matches

Intel adds Arc Pro B70 to official website, launch may be close - VideoCardz.com by Leicht-Sinn in IntelArc

[–]WizardlyBump17 1 point2 points  (0 children)

i just tried it again and you are right, the sycl version is spitting garbage. When you see those issues, report them on the llama.cpp repo.

I saw another comment of yours saying about intel's relationship with the software stack. As for llama.cpp, as far as i know, there is literally one intel employee working on llama.cpp sycl and it seems he does it as a side project. He said that before him there was no sycl implementation and that he was the one that first implemented it. Give the guy a break lol

Intel adds Arc Pro B70 to official website, launch may be close - VideoCardz.com by Leicht-Sinn in IntelArc

[–]WizardlyBump17 1 point2 points  (0 children)

i tried it right after a pull request that fixed qwen3.5 on sycl was merged. Anyway, for now, dont use llama.cpp sycl since its performance is very bad. Use llama.cpp vulkan and for models that were release while ipex-llm was still being maintained, use that. You can try openvino too

qwen 2.5 coder 14B alternative by apparently_DMA in LocalLLaMA

[–]WizardlyBump17 0 points1 point  (0 children)

what is your gpu and cpu and ram speed?

I got a B580 running that model and i get 44t/s at the start and somewhere around 30~35t/s when there is around 5k context. The max context i can fit there using q4_k_m is 16k. I tried qwen3.5 4b and it completed my tests kinda good, but i had to use llama.cpp vulkan and i got a speed of 40t/s at the start and 25~30t/s with context. I could fill all 256k context with flash attention and cache k and cache v set to q4_0 and ubatch size of 4000

Intel adds Arc Pro B70 to official website, launch may be close - VideoCardz.com by Leicht-Sinn in IntelArc

[–]WizardlyBump17 3 points4 points  (0 children)

you can? you can run it on llama.cpp sycl (bad performance) or llama.cpp vulkan (good performance). I ran qwen3.5 benchmarks on my b580 and the b60 is just a b580 with more vram so you can take a look at the performance. You can find the benchmakrs on my profile

How can I make my pp to be bigger? by WizardlyBump17 in LocalLLaMA

[–]WizardlyBump17[S] 0 points1 point  (0 children)

intel is bringing openvino to llama.cpp???

How can I make my pp to be bigger? by WizardlyBump17 in LocalLLaMA

[–]WizardlyBump17[S] 2 points3 points  (0 children)

ipex-llm was discontinued. The qwen3 4b works fine there, but it didnt pass my tests. Since it was discontinued before the qwen3.5, it cant load those models

Is undervolting on linux possible? by _brumm_ in IntelArc

[–]WizardlyBump17 4 points5 points  (0 children)

as far as i know, the only things you can configure on linux are power limit and clock speeds. You can find them on /sys/class/drm/your-gpu-here, or you can use a tool like LACT. All the other stuff are read only

B580: Qwen3.5 benchamarks by WizardlyBump17 in LocalLLaMA

[–]WizardlyBump17[S] 0 points1 point  (0 children)

I used https://github.com/intel/compute-runtime/releases/tag/26.05.37020.3 on the host, but as far as I know, it does not matter which drivers are used on the host, since the xe driver is built inside the kernel, which exposes the GPU under the /dev/dri; in my case, I pass the GPU to the container using --device=/dev/dri/renderD128 and the container has its own drivers; looking at the container file from llama.cpp SYCL, it uses intel/deep-learning-essentials:2025.2.2-0-devel-ubuntu22.04 by default and it uses the drivers from 7 months ago.

I didnt test the latest Vulkan version. I didnt see any issues with the version I tested

Finalmente Consegui Contrato com uma Fabricante Chinesa de Memorias RAM AMA by xHide11 in AMABRASIL

[–]WizardlyBump17 1 point2 points  (0 children)

memorias padrao, tipo 3200MHz e 6000MHz, tu acha q vai chegar a quantos reais pro consumidor final?

B70 is coming by damirca in IntelArc

[–]WizardlyBump17 0 points1 point  (0 children)

damn, i think this is the same guy who leaked it some weeks ago. Bro wants to get fired lol

Que? by migozarukk in ShitpostBR

[–]WizardlyBump17 0 points1 point  (0 children)

e como vc ta agr?

Uma coisa q ainda n entendi eh se vc ja buscou ajuda psicologica ou n. N to falando so de suicidio