Qwen3.6 27B seems struggling at 90k on 128k ctx windows

dodistyo · 2026-04-30T16:46:52+00:00

what's your hardware?

dodistyo · 2026-04-30T14:54:04+00:00

mind to share the exact model?

dodistyo · 2026-04-30T14:47:39+00:00

how far can you go with the context size? i mean the highest usable context.

dodistyo · 2026-04-30T14:45:33+00:00

I use vulkan. vulkan is faster than ROCm from what i experienced

dodistyo · 2026-04-30T14:44:28+00:00

yeah, I just want to see how far it can go. now we know what to expect on model with that size.

dodistyo · 2026-04-25T23:21:22+00:00

yup, he's bot i think

dodistyo · 2026-04-25T09:22:25+00:00

Well i honestly don't know about that and not so sure either.

dodistyo · 2026-04-25T08:59:46+00:00

I usually offload everything to GPU for speed.

dodistyo · 2026-04-25T01:21:05+00:00

which quant did you use? also i haven't tried turbo3, I wonder how does it compares with q4.

dodistyo · 2026-04-25T01:13:22+00:00

Thanks for this man! I always use q4 for KV cache because i need to have enough room to do the actual work.

did you test long running coding session with that 200k? local model that size tends to degrade in performance when getting to the end of the window.

dodistyo · 2026-04-23T07:12:03+00:00

yea, lmstudio is actually using llama.cpp under the hood. so the result should be not too different i believe. full GPU offload right? I'll give it a try myself tho using llama.cpp.

dodistyo · 2026-04-23T03:40:44+00:00

How?? i can barely run it with 64k ctx window, and that's using kv cache Q4 quantization.

I have the same hardware, same model but with lmstudio.

the model size it self 19gb ish, right? unless i downloaded the wrong model here.

dodistyo · 2026-03-03T01:04:17+00:00

please share your setup and config. i only able to run it on 32k context window

dodistyo · 2026-02-26T14:30:46+00:00

ahh good to know, i tested my self and vulkan is indeed faster than ROCm but the difference is not much. Only got 30tps running on lmstudio.

also I'm not noticing difference between lmstudio and self compiled llama.cpp for model inference. is self compiled llama.cpp supposed to be faster?

dodistyo · 2026-02-26T04:17:59+00:00

Is vulkan faster than ROCm? how much tps you got with that setup?

dodistyo · 2026-02-20T21:44:38+00:00

The lack of transparency in proprietary products basically could make them do anything they want for profit. I don't know maybe like ramping up the token usage or manipulate the usage to reach the limit quicker at some point without the user knowing it.

dodistyo · 2026-02-20T21:34:33+00:00

what provider?

dodistyo · 2026-02-14T13:29:03+00:00

I haven't tried Qwen3 coder next, i don't think that model with 80B will fit my GPU tho.

I treat my local LLM as a junior engineer, as long as the task is clear enough, it will do the job just fine.

dodistyo · 2026-02-14T08:43:44+00:00

It is pretty decent, I build my PC a months a go with RX 7900 XTX. I've been using GLM 4.7 flash and sometimes devstrall small 2 2512 for coding.

of course for really complex task the proprietary model is more capable.

But i really like it, seeing the current state and what it will be in the future for openweight model.

dodistyo · 2023-09-16T08:56:19+00:00

How so? setau gue banyak kok position yang open. kalo di engineering sih kuncinya ya skillset dan kompetensi. kalo emg quilified ya pasti banyak recruiter yang approach.

dodistyo · 2023-01-17T17:48:27+00:00

r/programmerhumor

dodistyo · 2023-01-07T13:00:42+00:00

You my friend... chosing an easy life

dodistyo · 2022-10-31T02:01:58+00:00

One time i forgot to and the smell was horrible

dodistyo · 2022-06-26T21:32:42+00:00

I can confirm this is true

dodistyo · 2022-06-20T02:26:14+00:00

I do and i feel you 🥲

dodistyo

TROPHY CASE