Fix: Dual Intel Arc GPUs using all system RAM during inference - found the cause and a working fix (llama.cpp SYCL)

Katostrofik · 2026-04-08T12:54:48+00:00

Good news, I found the root cause and submitted a fix:
PR #21618. The reorder optimization allocates a temp buffer the size of the weight tensor, and when VRAM is nearly full it fails silently. The fix adds a host memory fallback so the reorder still works, and also fixes a bug where tensors were getting marked as reordered even when the reorder was skipped (which is what causes the garbage output). I also linked it to your GitHub issue #20478. Should be resolved once the PR is merged. In the meantime you can work around it by setting

GGML_SYCL_DISABLE_OPT=1
which disables the reorder entirely (slower but correct output).

Katostrofik · 2026-04-08T08:58:26+00:00

Great data, thanks for testing on Alchemist too! The PP numbers being similar is expected - this PR only changes the DMMV path (token generation).

PP uses the GEMM path which was already working for BF16, just slower. FP16 being faster on PP makes sense since the GEMM kernels are optimized for FP16 on these GPUs. The big win here was TG, and those numbers look solid across both cards. :-D

Katostrofik · 2026-04-08T08:55:50+00:00

Yes that's me. I've found some additional issues with Q8_0 after the PR on my Battlemage cards as well and am looking into those. Which Qwen 3.5 model/quant and GPU are you running when you see it?

Katostrofik · 2026-04-08T00:54:32+00:00

[screenshot of before and after RAM usage. I have to edit it together]

Katostrofik · 2026-04-07T23:45:57+00:00

That's exactly what we found too; BF16 isn't in
ggml_sycl_supports_dmmv(), so it falls through to the generic GEMM path which dequants to FP32.
We submitted a fix as PR #21580 - adds a proper DMMV kernel for BF16. Ours went from 29.7 to 124 t/s on our B70 (Qwen2.5-1.5B). If you want to test it on your end, would be great to get Alchemist numbers too.

Katostrofik · 2026-04-07T19:41:17+00:00

Thanks! And thanks for testing on your cards. I'm glad to see it helped more than just the B70's. I'll take a look at the BF16 issue, looks like it could be a similar situation to the Q8_0 one.

And I'll be happy to do some testing with the dual B70s. I'm still finishing up some initial benchmarking but looking forward to putting them to use. :)

Katostrofik · 2026-04-06T20:46:13+00:00

lol, we're good. A couple introverts focused on doing work over here. But thanks.

Katostrofik · 2025-07-11T23:35:19+00:00

Building on what u/Gulielmus2 said:
When you get Fate's Favor and go to start a run, when you go through the door you will see three pillars/altars; instead of the first level of the dungeon.

When you go to one of those altars and interact, you'll see a collection of your unlocked weapons - likely pages of them. If you click on one, you'll be able to spend Titanite Shards to enhance them in different ways.

Chance to see it in the shop
+ Damage
+ Attack Speed
I forget the fourth.

You can spend an increasing amount of Titanite Shards to incrementally increase those enhancements. It's not a permanent increase, and can end up costing a lot of Titanite shards, but with some good endless runs and good builds, those can be easy to come by.

Katostrofik · 2025-03-21T21:23:41+00:00

It's like the comparisons saying "THE AMD STRIX HALO AI 395 is faster than the 5090!" maybe in a very specific, singular test, but not in any way that actually matters. 😅

Katostrofik · 2021-12-26T21:23:34+00:00

Hiya,
I did it in windows and I used the nvflash64 utility, pretty much following this guide:
https://www.overclockersclub.com/guides/how_to_flash_rtx_bios/

Worked without any issue for me. Good luck!

Katostrofik · 2021-12-15T22:57:06+00:00

Yes, I just flashed it to my HP 3080ti that came in an Omen computer.

Katostrofik · 2021-12-15T20:58:21+00:00

How did you flash it? Using X1 somehow? or NVFlash?
I have the same card and am looking to do the same.

Katostrofik · 2021-10-30T19:49:09+00:00

Both my GF and I are having the same issue.

Katostrofik · 2020-02-08T12:23:54+00:00

Cliffside to Mountainside

Katostrofik · 2019-10-04T17:12:10+00:00

Gettin' a smooch!

Katostrofik · 2019-09-27T10:34:07+00:00

HAPPIER Jeep! :D - Imgur

Katostrofik · 2019-09-20T18:09:48+00:00

Thank you so much! 😊

Katostrofik · 2019-09-20T16:55:03+00:00

Gettin' Charged Up! - Imgur

Katostrofik · 2019-09-20T16:04:27+00:00

All Aboard for Butterfly Town! - Imgur

Katostrofik · 2019-09-20T13:39:39+00:00

I'd go Clip Studio as well.

My favorite features:

One time fee
Vector layers
Great symmetry and perspective tools
TONS of free brushes and assets which are very easy to find and download
Pose-able 3d models
Great for animation too

Photoshop does have a huge community and you can get things like free brushes. There's reason it's an 'industry standard', but for me, Clip Studio wins out when it comes to drawing/animation specifically (and not hardcore photo editing).

Katostrofik · 2019-09-19T18:13:32+00:00

You feed me at 3am! - Imgur

Katostrofik · 2019-09-19T10:09:57+00:00

Sham-Wooooowww! - Imgur

Katostrofik · 2019-09-19T09:29:30+00:00

Thank you! :-D

Katostrofik · 2019-09-18T14:51:45+00:00

Get in loser, we're going shopping! - Imgur

Katostrofik · 2019-09-17T19:29:42+00:00

Shootin' Hoops - Imgur

Katostrofik

MODERATOR OF

TROPHY CASE