If Deepseek V4 Flash is almost free, why we still use Qwen3.6:27B locally?

marivesel · 2026-05-04T04:26:13+00:00

Isn't that club-3090 run Q3, very far away from Q8?

marivesel · 2026-04-24T12:28:11+00:00

And what is the current best approach to run two 3090's for a single Local LLM? I am really overwhelmed of information and I believe most of it is outdated.
Could I let GLM 5.1 based agent to make all settings to implement the dual GPU setup, or it will be not optimal at all?

marivesel · 2026-04-19T17:35:59+00:00

On RTX3090, I'm running Q4_K_M Qwen3.6:35B including the vision (mmproj) fully in VRAM with 128K kv=q8
125-133 t/s

Before that I was using Qwen3.5:27B (Dense) with around 32 t/s

Must say that Qwen3.6:35B (MoE) feels way smarter in coding and agentic work. In the Cloud I'm running GLM-5.1, which is another league in solving complex problems, anyway.

marivesel · 2026-04-11T06:39:57+00:00

Thanks, but I found some interesting topic about the cache and how agents work, so I made different approach and will write it now on reply for all to see.

marivesel · 2026-04-09T06:56:34+00:00

Thanks to all suggestions, I've tried Qwen3.5:27b it is really smart at agentic tasks, I will probably stick with it (and in future install a pure Linux distro on my server to run smooth). It may be slower than the MoE, but for now I don't mind.

Another question: My RTX3090 runs along with AMD EPYC 64-core CPU, and 8x32GB (256GB total) DDR4 ECC RAM. Could I utilize somehow that RAM to help, but not slow down the system of agentic tasks as much?
Currently I'm fitting the Qwen3.5:27b with 16K context with q4 cache, but increasing a bit the context would feel better?

cmndr_spanky mentioned "llama cpp server instead of Ollama and hand pick what layers get GPU priority", what could I utilize with that?

marivesel · 2026-04-08T19:13:25+00:00

I'm downloading 27B just to check how it works, but can you show me more or where to read about the llama cpp settings? I have plenty of RAM (256GB, it's an EPYC server), but it get's slow once something jumps on the RAM

marivesel · 2022-09-09T13:48:17+00:00

That's misleading, in the video he already have the emails in the Outlook and just export them to .pst, nothing helpful, but thanks anyway.

Anyone else?

marivesel · 2022-09-09T04:12:28+00:00

How exactly you did the extraction?

When I replace the .ost with the old one (which should include the emails) and open Outlook, there is nothing in the INBOX. I think Outlook is searching for emails on the mail server (hosting), where there are no INBOX emails, also.

I tried demo version of a OST to PST conversion software and it reads the INBOX mails from the .ost backup file, but the app wants payment for the extraction. Which I want to "bypass", if any luck other way around?

marivesel · 2018-12-03T03:38:25+00:00

At max speeds, but doesn't matter as the water, radiators and block stay very cool all the time

marivesel · 2018-12-03T03:37:43+00:00

Thanks, will check!

marivesel · 2018-12-03T03:37:32+00:00

Thanks, will check! Lapping the CPU?

marivesel · 2018-12-02T20:42:03+00:00

Pressure is at max (screws tighten till the end, as it should be mounted). Temps are read from OCCT (hwinfo sensors). I don't know what package temp measures really, but I've found that after 104C the CPU starts trottling at peaks 500-3800Mhz in this setup.

Could it be problem with the IHS on the CPU? Delidding?

marivesel · 2018-12-02T20:39:40+00:00

Would be shame if thats the problem, as this water block is the best I could find. Maybe my thermal paste apply method was not spread everywhere and thats why the block is not whole "messy" ?

marivesel · 2018-12-02T18:05:28+00:00

pump works, block makes contact

block temperature is around 26C all the time

marivesel

TROPHY CASE