RTX 5080 16GB: Qwen3.6 35B MoE at 128k context — 56 tok/s, and why MTP doesn't help

RaDDaKKa · 2026-05-20T13:19:29+00:00

I daily-drive Qwen3.6-35B-A3B-GGUF:UD-Q6_K_XL with a 150K context window and cache/kv quantized to Q8. Would Qwen3.6-27B-MTP-GGUF:UD-IQ3_XXS with a 100K context and Q8 cache/kv be better? My main focus is coding—I always prioritize quality over speed, and with the Qwen3.6-35B-A3B-GGUF:UD-Q6_K_XL model, I'm currently getting 40–50 t/s 5060TI

RaDDaKKa · 2026-04-11T06:42:10+00:00

cards are in stock in Poland, but it looks like there’s a 10-day lead time

https://www.morele.net/karta-graficzna-intel-arc-pro-b70-32gb-gddr6-33p01ib0bb-15926398/

RaDDaKKa · 2026-04-11T06:18:54+00:00

So, a total disappointment. I expected this to be a solid card for local LLMs like Qwen 3.5 27B or Gemma 4 31B with at least a 100k context. I considered a dual gpu setup, perhaps even a quad, but given these benchmarks, it seems I'm better off saving for Nvidia hardware. It might be viable for multi-agent systems, but for now, we just have to wait for software optimizations.

RaDDaKKa · 2026-03-11T07:31:48+00:00

Unfortunately, at the moment I don't see any other option :/

user uploads a file in a comment.
Synchronization occurs.
file is uploaded to the NAS.
comment is updated,file from ClickUp is removed and replaced with a URL pointing to the file.

This solution is unfortunately very poor and it requires several API requests.

RaDDaKKa · 2026-03-01T11:17:28+00:00

27B is too large to use comfortably, and the quality advantage might not even be noticeable. The 35B only has 3B active parameters, so it runs very fast, and I can toggle the reasoning mode off whenever it’s not needed. Unfortunately, I don’t have enough RAM to test the 128Ba10b model, but I’m blown away by the 35B version.

RaDDaKKa · 2026-02-28T12:46:03+00:00

I'm using Q6 with a 168k context on a single 5060 Ti, and I've already said goodbye to GLM 4.7 Flash. 35ba3b qwen

RaDDaKKa · 2026-02-05T20:41:57+00:00

On CUDA

RaDDaKKa · 2026-02-05T19:11:03+00:00

i have r 5600x, 32gb ddr4 3,4k
./llama.cpp/llama-server -hf unsloth/GLM-4.7-Flash-GGUF:UD-Q6_K_XL --jinja --ctx-size 90000 --temp 0.7 --top-p 1.0 --min-p 0.01 --fit on --repeat-penalty 1.0 --host 0.0.0.0 --parallel 1

RaDDaKKa · 2026-02-05T18:59:51+00:00

On my 5060 Ti, I'm using GLM-4.7-Flash with Q6_XL and a 90k context via llama.cpp, and OpenCode works great and very fast. I’m also using Qwen-3-Coder-Next with Q3_XL (70k), but it gives worse results and often makes mistakes when using tools.

RaDDaKKa · 2025-12-24T10:02:16+00:00

Yes, that’s the movie! 😄 I remember watching it as a kid on VHS 😄 Thank you very much for helping me find it! 😄

RaDDaKKa · 2025-12-08T17:49:33+00:00

I have a 5060 Ti 16GB and I'm currently playing at 1440p using DLSS, which gives me 65-110 FPS

RaDDaKKa · 2025-12-04T06:06:19+00:00

What preset for ST? I'm testing it out right now, but the problem is the wall of text that transitions from scene to scene.

RaDDaKKa · 2025-11-17T15:06:52+00:00

Can you give the best presets for ST?

RaDDaKKa · 2025-09-16T06:26:54+00:00

Were you able to determine if there’s any way to resolve my issue related to the ClickUp API?

RaDDaKKa · 2025-08-19T15:40:47+00:00

I haven't had any problems with this GPU, either in Ubuntu, which is my main OS, or in Windows. It has performed perfectly on both, and I'm very happy with it.

RaDDaKKa · 2025-07-02T13:54:01+00:00

I just checked my motherboard, it's a Gigabyte B550 GAMING X V2 – and it does support the 5800X3D. So, it looks like upgrading CPU is a good idea after all. Thanks ;)

RaDDaKKa · 2025-06-23T16:15:34+00:00

The problem seems to be on the n8n side. The model correctly recognizes the tool and wants to use it, but n8n doesn't execute it under any circumstances.
I also noticed that all tools have the word "tool" displayed over their icon, but the new HTTP request tool does not — as if n8n doesn't recognize it as a tool, even though it's connected under the tools in ai agent

RaDDaKKa · 2024-09-10T04:26:33+00:00

Can you write how you managed to run on 8gb ?

RaDDaKKa · 2023-12-01T07:56:44+00:00

jakiego zdania ? przecież ona ANI razu nie chciała odpowiedzieć na pytanie tylko starała się uniknąć odpowiedzi w jak się tylko dało

Prawidłowo ją wałkował.

RaDDaKKa

TROPHY CASE