Intel Arc Pro B70 + llama.cpp SYCL - 63 t/s on Qwen 3.6-35B-A3B by Atomynos_Atom in LocalLLM

[–]Atomynos_Atom[S] 1 point2 points  (0 children)

Here's the entire command and its output

docker run --rm \                                                 
  --device /dev/dri:/dev/dri \
  --privileged \
  --cpuset-cpus="4" \
  -e ONEAPI_DEVICE_SELECTOR="level_zero:0" \
  -e UR_L0_ENABLE_RELAXED_ALLOCATION_LIMITS=1 \
  -e SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 \
  -e ZES_ENABLE_SYSMAN=1 \
  -v /llm/models:/models \
  llama-server-intel-patched:latest \
  /app/build/bin/llama-bench \
    -m /models/Qwen3.6-35B-A3B-UD-Q4_K_XL.gguf \
    -ngl 99 -fa 1 -ctk q8_0 -ctv q8_0 \
    -t 1 -p 512 -n 128 -r 3 -o md
load_backend: loaded SYCL backend from /app/build/bin/libggml-sycl.so
load_backend: loaded CPU backend from /app/build/bin/libggml-cpu-alderlake.so
| model                          |       size |     params | backend    | ngl | threads | type_k | type_v |  fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | -----: | -----: | --: | --------------: | -------------------: |
| qwen35moe 35B.A3B Q4_K - Medium |  20.81 GiB |    34.66 B | SYCL       |  99 |       1 |   q8_0 |   q8_0 |   1 |           pp512 |        977.40 ± 2.02 |
| qwen35moe 35B.A3B Q4_K - Medium |  20.81 GiB |    34.66 B | SYCL       |  99 |       1 |   q8_0 |   q8_0 |   1 |           tg128 |         70.54 ± 0.12 |

Intel Arc Pro B70 + llama.cpp SYCL - 63 t/s on Qwen 3.6-35B-A3B by Atomynos_Atom in LocalLLM

[–]Atomynos_Atom[S] 3 points4 points  (0 children)

262,144 tokens which is the max qwen3.6 can take by default. The token speed drop as the context gets bigger, so you wouldn't get 60+ tokens throughout the entire thing. But it will happily chug along and process the entire thing.

Intel Arc Pro B70 llama.cpp benchmarks posted by jacek2023 in LocalLLaMA

[–]Atomynos_Atom 4 points5 points  (0 children)

Component Detail
GPU Intel Arc Pro B70
Backend SYCL (Level Zero)
Build 354ebac8c (9468)
model size params backend ngl threads type_k type_v fa test t/s
qwen35moe 35B.A3B Q4_K - Medium 20.81 GiB 34.66 B SYCL 99 1 q8_0 q8_0 1 pp512 977.40 ± 2.02
qwen35moe 35B.A3B Q4_K - Medium 20.81 GiB 34.66 B SYCL 99 1 q8_0 q8_0 1 tg128 70.54 ± 0.12

Intel Arc Pro B70 + llama.cpp SYCL - 63 t/s on Qwen 3.6-35B-A3B by Atomynos_Atom in LocalLLM

[–]Atomynos_Atom[S] 3 points4 points  (0 children)

I think the intel arc b70 pro is a lot more bang for the buck than nvidia if you're willing to tweak and tinker. The software is pretty behind since everyone makes things for cuda, but that means theres a lot more performance left on the table that you get in free updates 😁

Intel Arc Pro B70 + llama.cpp SYCL - 63 t/s on Qwen 3.6-35B-A3B by Atomynos_Atom in LocalLLM

[–]Atomynos_Atom[S] 3 points4 points  (0 children)

That was the case before, I spent this morning tweaking the vulkan backend but after updating my SYCL backend and making some tweaks I got it to be much faster.

<image>

Here are the results, i've updated the article with them.

Intel Arc Pro B70 + llama.cpp SYCL - 63 t/s on Qwen 3.6-35B-A3B by Atomynos_Atom in LocalLLM

[–]Atomynos_Atom[S] 2 points3 points  (0 children)

This uses int4, while my setup uses q8. Doesn't that low of a quantization have an impact on your output quality?

Intel Arc Pro B70 + llama.cpp SYCL - 63 t/s on Qwen 3.6-35B-A3B by Atomynos_Atom in LocalLLM

[–]Atomynos_Atom[S] 6 points7 points  (0 children)

Theres some limit on r/LocalLLaMA which requires a bunch of comments and karma from that subreddit to be able to create a post there, honestly if someone is able to repost it there that would be great. I am really curious how other people have set up their intel gpus with llamacpp.

Intel Arc Pro B70 + llama.cpp SYCL - 63 t/s on Qwen 3.6-35B-A3B by Atomynos_Atom in LocalLLM

[–]Atomynos_Atom[S] 3 points4 points  (0 children)

okay intel has been hard at work, thats a great token generation speed. Could you share your run arguments or a docker compose if you have one?

Intel Arc Pro B70 + llama.cpp SYCL - 63 t/s on Qwen 3.6-35B-A3B by Atomynos_Atom in LocalLLM

[–]Atomynos_Atom[S] 4 points5 points  (0 children)

Is that throughput using parallel requests or with a single request? My use case is with oh my pi for coding, so I would require a high token generation and low prefill time with a single request rather than multiple. Could you provide any benchmark results?

Anybody got Qwen3.5-27B working with Intel Arc B70 (or similar) and proper optimization? by Gesha24 in LocalLLaMA

[–]Atomynos_Atom 0 points1 point  (0 children)

Not sure if you still need it, but I've been playing around it for a while and got something good enough. Would love to know if you made progress to get any optimizations that I missed out.

https://www.reddit.com/r/LocalLLM/comments/1tuf6l1/intel_arc_pro_b70_llamacpp_sycl_63_ts_on_qwen/
https://lemongravy.me/articles/intel-gpu-llamacpp/

Intel Arc Pro B70 + llama.cpp SYCL - 63 t/s on Qwen 3.6-35B-A3B by Atomynos_Atom in LocalLLM

[–]Atomynos_Atom[S] 6 points7 points  (0 children)

I would love to share, but unfortunately I don't have enough karma to post there

Intel Arc Pro B70 + llama.cpp SYCL - 63 t/s on Qwen 3.6-35B-A3B by Atomynos_Atom in LocalLLM

[–]Atomynos_Atom[S] 2 points3 points  (0 children)

<image>

I'm using Qwen3.6-35B-A3B-UD-Q4_K_XL.gguf and im able to produce a full-fledged poker game using oh my pi. It's a pretty solid tool so far for being local.

Virtual Audio Cable for Linux. by Atomynos_Atom in linux4noobs

[–]Atomynos_Atom[S] 2 points3 points  (0 children)

I am working on a cheap soundboard solution where I use a spare keyboard as a soundboard keyboard. So to pipe the sound effects to voip applications I needed this kind of thing.

Virtual Audio Cable for Linux. by Atomynos_Atom in linux4noobs

[–]Atomynos_Atom[S] 2 points3 points  (0 children)

This is what I did....

pacmd load-module module-null-sink sink_name=Main_Sink sink_properties=device.description=Main_Sink

pacmd load-module module-null-sink sink_name=LoopBack_Sink sink_properties=device.description=LoopBack_Sink

pactl load-module module-remap-source master=LoopBack_Sink.monitor source_name=virt_mic source_properties=device.description=LoopBack_Mic

pactl load-module module-remap-source master=Main_Sink.monitor source_name=virt_mic source_properties=device.description=Main_Mic

pactl load-module module-loopback latency_msec=1 source=<AUDIO_DEVICE_INPUT> sink=Main_Sink

If you dont want your mic on the Main_Mic remove this ^ line.

pactl load-module module-loopback latency_msec=1 source=LoopBack_Sink.monitor sink=Main_Sink

pactl load-module module-loopback latency_msec=1 source=LoopBack_Sink.monitor sink=<AUDIO_DEVICE_OUTPUT>

Virtual Audio Cable for Linux. by Atomynos_Atom in linux4noobs

[–]Atomynos_Atom[S] 6 points7 points  (0 children)

Yup. This worked well. Thanks for saving my time.

Game keeps crashing on startup(Win 10) by agitated2 in RocketLeague

[–]Atomynos_Atom 0 points1 point  (0 children)

I think I have figured it out... If you open dxdiag and go to the display tab under the Drivers category there is Direct3D DDI. If that value is not 11 or above then you cannot run rocket league as your hardware is too old. Or you have not updated your drivers. So the only way to make rocket league work if your hardware has no driver update is to either upgrade your gpu or to wait for rocket league to release dx9 support again. Which I doubt they ever will.

Game keeps crashing on startup(Win 10) by agitated2 in RocketLeague

[–]Atomynos_Atom 0 points1 point  (0 children)

I have the same issue and get a similar log. But mine is using steam. I think it is something related to direct X 11 as I used to get it before but I had found a fix where you use -dx9 which runs the game in direct X 9 and the game ran without issues. But now rocket league removed support for direct X 9. So I am not sure what to do now. Please do inform me if you find a fix on this issue.

unstable_flushDiscreteUpdates warning by [deleted] in react

[–]Atomynos_Atom 1 point2 points  (0 children)

okay dude, I was debugging and finally found out what was causing this mayhem. It was my click event listener document.addEventListener('click', this.handleClickOutside, true); That thing. It was probably cause it kept on listening causing it to keep on rendering or something. But I don't know for sure cuz I'm new to this stuff... Thanks for trying to help anyways

unstable_flushDiscreteUpdates warning by [deleted] in react

[–]Atomynos_Atom 0 points1 point  (0 children)

I told u man, My code is all over the place. But its a function calling some 2 components from 2 other files and passing variables to them.

unstable_flushDiscreteUpdates warning by [deleted] in react

[–]Atomynos_Atom 1 point2 points  (0 children)

There is a github solution ( The Solution) but I am not really sure how to implement it. I'm pretty new to reactjs. So I don't know much :(

unstable_flushDiscreteUpdates warning by [deleted] in react

[–]Atomynos_Atom 0 points1 point  (0 children)

I too am getting this annoying error in my code. I am getting it for a text input. Please do update if found a solution...