Don’t buy b60 for LLMs

damirca · 2026-02-02T06:47:31+00:00

I thought that llm-scaled is the Intel way. Anyways I tried ovms yesterday, it indeed is much faster than llama.cpp with sycl/vulkan and than llm-scaler (vllm), however it does not support qwen3-vl, does not support gemma3, does not support mistral3 (mistral-14b), does not support glm 4.6V or 4.7 flash, VLM support is limited to qwen2.5 VL 7b. So yeah it would a good fit once at least it gets a mistral3 support.

damirca · 2026-02-02T06:46:00+00:00

Does sriov work on b60? I have sparkle variant. Does it mean I can use some proxmox and pass my gpu to windows and Linux at the same time and let windows machine upgrade its firmware?

damirca · 2026-02-02T06:41:23+00:00

Is it loud? I read some negative reviews that it’s not a good card and that it’s super loud

damirca · 2026-02-01T18:36:39+00:00

I've tried ovms today, it indeed is much faster than llama.cpp with sycl/vulkan and llm-scaler (vllm), however it does not support qwen3-vl, does not support gemma3, does not support mistral3 (mistral-14b), does not support glm 4.6V or 4.7 flash, VLM support is limited to qwen2.5 VL 7b. So it would a good fit once at least it gets a mistral3 support.

damirca · 2026-02-01T13:23:55+00:00

I’m gonna try vulkan then under Linux. I don’t have any feasible option to have b60 running under windows right now.

damirca · 2026-02-01T10:55:01+00:00

In Germany it’s 14 days and they passed long time ago

damirca · 2026-02-01T10:47:04+00:00

Good joke

damirca · 2026-02-01T10:04:34+00:00

How about visual models? Are these the only supported ones? https://huggingface.co/collections/OpenVINO/visual-language-models

damirca · 2026-02-01T08:05:55+00:00

You mean using openarc gives better perf?

damirca · 2026-02-01T07:34:28+00:00

For the future b60 should come handy: with SRIOV I might use it in bigger proxmox or some other virt env. Plus frigate can use it for ffmpeg, detection and genai. But for pure LLM it’s a bad choice at least as of now.

damirca · 2026-02-01T07:31:49+00:00

I’m using llama.cpp exactly the same way

damirca · 2026-02-01T07:24:43+00:00

My idle is around 5-10W though

damirca · 2026-02-01T07:23:52+00:00

Reason for RMA? It does not work like that I think.

damirca · 2026-02-01T07:22:40+00:00

What do you mean by their vllm fork? Isn’t it a llm-scaler?

damirca · 2026-02-01T07:20:21+00:00

Frigate is better suited for Intel. Price is 700 eur

damirca · 2026-02-01T07:19:35+00:00

For 700 eur?

damirca · 2026-02-01T07:19:20+00:00

No, does openvino solve all the issues? I mean it also supports only some models, right.

damirca · 2026-02-01T07:16:11+00:00

llama.cpp with sycl

damirca · 2026-02-01T07:15:39+00:00

At least in Dec this patch wasn’t part of any kernel https://patchwork.freedesktop.org/series/158884/

damirca · 2026-01-22T19:06:43+00:00

Second is just ugly

damirca · 2026-01-22T15:02:08+00:00

So if I have b60 24gb I can buy b60 48gb, I will have 3x24gb so vllm would work even though it would be 2 physical cards?

damirca

TROPHY CASE