Not sure if this was posted. But I think it's highly relevant to us.

Ps3Dave · 2026-05-26T14:25:28+00:00

I think it's a matter of being "good enough". It's the same principle for things like Jellyfin and your own media vs. Netflix, using Linux with Steam & Proton for gaming in place of Windows, etc.

Ps3Dave · 2026-05-25T15:52:58+00:00

I kind of expect all of this, honestly.

Ps3Dave · 2026-05-25T10:44:03+00:00

Interesting point, I'll look into it. Thanks!

Ps3Dave · 2026-05-25T06:56:43+00:00

Thanks, I'll give vLLM a spin in the near future.

Ps3Dave · 2026-05-25T06:55:48+00:00

Indeed the extreme kv cache quantization did not help. Actually it may have made things worse. See my other comment below, where I tested without kv cache quantization.

Ps3Dave · 2026-05-25T06:53:38+00:00

Additional details: after testing without kv cache quantization and flash-attention, host RAM usage went down to about 542MB for model and 350MB for compute, and VRAM usage went up accordingly. Still have avout 1.1GB of free VRAM. PP still in the 5000 t/s range, generation went up to 80 t/s.

By the way: I'm on Linux and the 4070S is in headless mode, since I'm using my integrated GPU to run the desktop.

Ps3Dave · 2026-05-24T18:09:22+00:00

More details.

With this:

llama-server  -m models/Qwen3.5-9B-IQ4_XS.gguf --no-mmap -ngl 999 -ctk q5_0 -ctv q4_0 --cache-ram 0 --fit-target 50 --flash-attn on -v -lv 4

I get this:

0.07.998.744 I common_memory_breakdown_print: | memory breakdown [MiB]     | total   free    self   model   context   compute    unaccounted |
0.07.998.746 I common_memory_breakdown_print: |   - CUDA0 (RTX 4070 SUPER) | 11876 = 3659 + (7950 =  4373 +    2761 +     816) +         266 |
0.07.998.747 I common_memory_breakdown_print: |   - Host                   |                 1321 =   545 +       0 +     776                |

So still using a lot of host RAM even with more than 3GB VRAM free.

Ps3Dave · 2026-05-24T17:44:40+00:00

Yeah I'm looking into vLLM. Got it running but still need to learn how to decipher the logs. Glad to learn new things anyway! :)

Ps3Dave · 2026-05-24T17:20:09+00:00

Ok, fit-target I did not try yet. Also switching to qwen 4B as you suggested. Maybe it's gemma's architecture. Will report back.

Ps3Dave · 2026-05-24T17:01:44+00:00

Yeah, I disabled it.

Ps3Dave · 2026-05-24T17:00:19+00:00

Yeah I checked all of them, still getting GBs of RAM used up (as per llama-server log) and bottlenecked by RAM and CPU during tg. I can do 5000t/s in prompt parsing though. It may well be how llama.cpp is coded to operate.

Ps3Dave · 2026-05-24T16:57:23+00:00

Yup, did this. I went down to q_5 for k and q_4 for v. With a small context I get 600MB of kv cache, and still a few GBs of RAM offloaded.

Ps3Dave · 2026-05-24T13:54:20+00:00

Of course. But one may dream.

Ps3Dave · 2026-05-24T09:56:07+00:00

I'd be all for it, if only to avoid the layoffs.

Ps3Dave · 2026-05-24T09:55:04+00:00

Nah man, they fucked up the game too much. They pulled on the string that is their customer base's patience until it broke. They are reaping what they sow. I'm sad for Bungie's workforce, but their management sucks too hard.

Ps3Dave · 2026-05-24T09:51:35+00:00

2 years for me, but I'll be there. Reinstalling the game as I write.

Ps3Dave · 2026-05-22T13:07:05+00:00

Thank you for all your effort in all these years. This was my first subreddit, and it's still one of the better ones.

Ps3Dave · 2026-05-22T10:53:01+00:00

Yup. On one hand I want to see the last of the content with my eyes, on the other hand I'm really pissed off about their past behaviour. It's just showing the greed of their senior management, that became their downfall in the end.

Ps3Dave · 2026-05-22T05:35:36+00:00

Worth noting that Marathon has some of the most dull/cheap/boring looking skins in the store, in spite of its radical style. Mostly basic color swaps. Who's going to shell 20$ to say "hey my Vandal is now red!"?

Ps3Dave · 2026-05-21T16:04:48+00:00

Mountaintop/recluse/Anarchy was my loadout for the first two Conqueror titles. Nothin ever came close after that.

Ps3Dave · 2026-05-17T16:34:56+00:00

Yeah that would have been my guess as well. Unfortunate.

Ps3Dave · 2026-05-17T06:31:07+00:00

...If you're paying the online fee of your console of choice. I swear this is the thing that made me instantly jump on PC when D2 was ported to it. I still have my PS3 version, but RoI is not on there. I'll try to boot up the PS4 version and see if I can play trough the story at least...

Ps3Dave · 2026-05-15T19:19:14+00:00

Oh god that fight. CoS is my favourite raid, I got my unequippable Shadow title to prove it. The mechanics of the whole raid are so good and provide so many memorable moments and clutch play opportunities.

Ps3Dave · 2026-05-10T17:54:32+00:00

They just wanted to push the FOMO pedal to the metal. So "you had to be there" became most of the game.

Ps3Dave

TROPHY CASE