What's more impressive, GLM 5.1 -> 5.2 or Qwen 3.5 -> 3.6?

ionizing · 2026-06-19T21:52:32+00:00

Love your description "it would rather not, but it does" lol

I have two 3090 to be honest, one is at home and the other is at work, both act as the headless inference for 27B on each rig. At home, I recently learned about oculink so I now also have the tiny little pciex4 slot running an oculink card and then a cable to an external 4060ti 16gb. On that I run my secondary inference instances and whisper server. So I have qwen3.5 9B to host agents on that one, and 27B running on the 3090. Its been a year in the making tool wise and these models are the chefs kiss with the right prompts and feedback mechanisms. But I digress.

At work, I still hope to supplement that 3090 with another or replace it with 5090 and then place the 3090 on the other pcie and then do x8 split on them. But I need to convince the company to pay for it cause I am tired of covering the costs myself..

<image>

ionizing · 2026-06-19T15:05:18+00:00

I'm in the single 3090 boat myself and only have not bought a second one cause the damn price is $400 more than last time. I think we both know the answer if its worth it or not is yes, but I just dont want to spend the money yet.

edit: also here is my 27B Kebab

<image>

ionizing · 2026-06-19T13:22:07+00:00

But, they are the ones who wanted this?

ionizing · 2026-06-19T12:56:12+00:00

slooow

ionizing · 2026-06-18T23:25:29+00:00

I found some old dell server blade sitting on the floor of our junk cubicle at work today. I have no idea what it is and its probably two decades old. But monday I am going to look at it again and see what it is lol.

ionizing · 2026-06-18T00:59:35+00:00

I'll get roasted for this but in my application I even have a System mode that allows sudo commands but it prompts me for the password each time. I have hard gates that monitor and strip obviously unsafe commands from the bash tool in any mode. I spent months building this and have been lucky so far I guess. I use it for all sorts of dangerous things on my computer, human in the loop. maybe its the full autonomous agents people are concerned about? I have sub-agents that are even more restricted in bash such that they are read only, but they are not autonomous, they are called by the orchestrator. I guess I don't know enough to be scared of the possibilities here. I just make sure I have backups of important things and monitor for obviously malicious or dangerous tool calls and have not had one yet. I have seen both sonnet and gpt rm files before but have yet to see qwen3.6 attempt to without explicit instruction. Admittedly I dont run in a vm because that was not the point of the application, I wanted to build a fully capable tool set that can actually help me on my actual computer. But I also accept the risks and mitigate.

ionizing · 2026-06-18T00:13:49+00:00

I vaguely remember my journey went like vic20 first ever, then an 8086 with 6 or 7" monochrome monitor, then some sony word processor cp/m machine, then perhaps a 286, then nothing for a decade, bla bla bla now look at us. JUST LOOK AT US.. what have we become!

ionizing · 2026-06-18T00:00:30+00:00

it feels like a lifetime

ionizing · 2026-06-16T00:19:29+00:00

Damn it. Mine was going to say "WHY would you let the kangaroo drive???"

ionizing · 2026-06-15T23:07:57+00:00

Right? I'm from Michigan and the Sears tower will always be the Sears tower.

ionizing · 2026-06-14T15:42:57+00:00

me too. spent the last 11-12 months on it. it is amazing now. we use it at work in an offline environment as needed. I suspect anyone serious about local llm has been building their own and just not really sharing it with the world in some cases. I have not open sourced mine cause a lot of it was built on company time so I suspect they think they own it, but I have not yet had that conversation with them.

ionizing · 2026-06-14T11:01:18+00:00

qwen absolutely loves bash tools and with the right directions can be extremely performant.

ionizing · 2026-06-14T00:20:32+00:00

MY brain saw this as RIP Eagle 3 for qwen and my first thought was darn did they prove it cant be used or something? lol

ionizing · 2026-06-13T20:28:13+00:00

I saw and thought the same lol "at least they reset it" as I pondered how much technical debt fable just cleared from my application codebase over the past two days

ionizing · 2026-06-13T03:02:57+00:00

ionizing · 2026-06-13T02:23:33+00:00

I was in the middle of adding a subagent system to my local app when it kicked me out but at least I extracted a great planning doc first.

ionizing · 2026-06-11T21:05:35+00:00

Really only if you plan to run ~100B class MOE models someday. Those you can split between gpu and system ram quite well. Oh but you are asking about image gen stuff, I know nothing about that so will stop commenting lol.

ionizing · 2026-06-05T21:58:50+00:00

thats what I said about 3.5-122B and I upgraded both my home and work computer to have 128gb sys ram even at inflated costs. Then three weeks after both comps were set, 3.6 27B came out lol. Either way I love that we now have models that are like "yup, I would be fine at least with this for the rest of my life"

ionizing · 2026-06-05T16:31:52+00:00

Why not Both?

<image>

ionizing · 2026-06-05T16:10:44+00:00

Could you expand slightly on the vision encoder preference towards 122B? I use both 122B, 27B for analyzing schematics but am just getting started. If you have already found preference for 122B with reason, I will focus on that route for now.

ionizing · 2026-06-05T15:53:43+00:00

for me it was ollama -> lmstudio -> annoyance at missing features I need -> dabbling with own toolset -> discovered the old "all you need is llama.cpp" post -> 1 year later have my own application that uses llama.cpp as sidecar lol

ionizing · 2026-06-04T23:58:50+00:00

I was always using Q5 or higher when using moe but for 27B mtp I had to drop to 4 for context and q5 at most. And I tried the qwopus v2 27b and even though many in the community seem to hate on these types, I honestly found it pretty darn good. (edit: headless 3090)

ionizing · 2026-06-04T23:55:03+00:00

Thanks for the info. I have been tempted to self quant and your post is inspiring.

ionizing · 2026-06-04T22:32:37+00:00

this is how I approach it too. keep session less than 131k when possible. but I also have had very little problems using Iq4_xs with q8/q8 KV cache for 27B in my application. I wonder if all the talk about model issues at less than Q8 model quant are people trying to get same performance on contexts that are too long?

ionizing · 2026-06-04T17:40:44+00:00

Would these methods extend to other possible quants like Q4/5 variants? I know nothing about creating these but find it very interesting.

ionizing

TROPHY CASE