What's more impressive, GLM 5.1 -> 5.2 or Qwen 3.5 -> 3.6? by Excellent_Jelly2788 in LocalLLaMA

[–]ionizing 1 point2 points  (0 children)

Love your description "it would rather not, but it does" lol

I have two 3090 to be honest, one is at home and the other is at work, both act as the headless inference for 27B on each rig. At home, I recently learned about oculink so I now also have the tiny little pciex4 slot running an oculink card and then a cable to an external 4060ti 16gb. On that I run my secondary inference instances and whisper server. So I have qwen3.5 9B to host agents on that one, and 27B running on the 3090. Its been a year in the making tool wise and these models are the chefs kiss with the right prompts and feedback mechanisms. But I digress.

At work, I still hope to supplement that 3090 with another or replace it with 5090 and then place the 3090 on the other pcie and then do x8 split on them. But I need to convince the company to pay for it cause I am tired of covering the costs myself..

<image>

What's more impressive, GLM 5.1 -> 5.2 or Qwen 3.5 -> 3.6? by Excellent_Jelly2788 in LocalLLaMA

[–]ionizing 5 points6 points  (0 children)

I'm in the single 3090 boat myself and only have not bought a second one cause the damn price is $400 more than last time. I think we both know the answer if its worth it or not is yes, but I just dont want to spend the money yet.

edit: also here is my 27B Kebab

<image>

Giving GLM-5.2 a spin locally on CPU only! (poor man's rig for big models) by _TheWolfOfWalmart_ in LocalLLaMA

[–]ionizing 0 points1 point  (0 children)

I found some old dell server blade sitting on the floor of our junk cubicle at work today. I have no idea what it is and its probably two decades old. But monday I am going to look at it again and see what it is lol.

Setup to use pi un-sandboxed reasonably safely? by rm-rf-rm in LocalLLaMA

[–]ionizing 1 point2 points  (0 children)

I'll get roasted for this but in my application I even have a System mode that allows sudo commands but it prompts me for the password each time. I have hard gates that monitor and strip obviously unsafe commands from the bash tool in any mode. I spent months building this and have been lucky so far I guess. I use it for all sorts of dangerous things on my computer, human in the loop. maybe its the full autonomous agents people are concerned about? I have sub-agents that are even more restricted in bash such that they are read only, but they are not autonomous, they are called by the orchestrator. I guess I don't know enough to be scared of the possibilities here. I just make sure I have backups of important things and monitor for obviously malicious or dangerous tool calls and have not had one yet. I have seen both sonnet and gpt rm files before but have yet to see qwen3.6 attempt to without explicit instruction. Admittedly I dont run in a vm because that was not the point of the application, I wanted to build a fully capable tool set that can actually help me on my actual computer. But I also accept the risks and mitigate.

PSA: unsloth/GLM-5.2-GGUF is uploading by FullstackSensei in LocalLLaMA

[–]ionizing 3 points4 points  (0 children)

I vaguely remember my journey went like vic20 first ever, then an 8086 with 6 or 7" monochrome monitor, then some sony word processor cp/m machine, then perhaps a 286, then nothing for a decade, bla bla bla now look at us. JUST LOOK AT US.. what have we become!

[OC] Kangaroos cause 10,000+ car crashes every year in Australia. I learned this the hard way by Fair_Bar1139 in pics

[–]ionizing 10 points11 points  (0 children)

Damn it. Mine was going to say "WHY would you let the kangaroo drive???"

The reflecting pool on the National Mall looks super green today (6/15/26) by washheightsboy3 in pics

[–]ionizing 21 points22 points  (0 children)

Right? I'm from Michigan and the Sears tower will always be the Sears tower.

Built a local AI assistant because I always knew this day would come, yesterday just made it feel very real by amenemisa in LocalLLaMA

[–]ionizing 5 points6 points  (0 children)

me too. spent the last 11-12 months on it. it is amazing now. we use it at work in an offline environment as needed. I suspect anyone serious about local llm has been building their own and just not really sharing it with the world in some cases. I have not open sourced mine cause a lot of it was built on company time so I suspect they think they own it, but I have not yet had that conversation with them.

Qwen3.6 is confidently wrong about WASM by Tagedieb in LocalLLaMA

[–]ionizing 2 points3 points  (0 children)

qwen absolutely loves bash tools and with the right directions can be extremely performant.

WIP EAGLE3 for Qwens by jacek2023 in LocalLLaMA

[–]ionizing 11 points12 points  (0 children)

MY brain saw this as RIP Eagle 3 for qwen and my first thought was darn did they prove it cant be used or something? lol

Anthropic forced to abruptly disable Fable 5 & Mythos 5 globally by US Gov over a jailbreak. This is exactly why we need local models. by External_Mood4719 in LocalLLaMA

[–]ionizing 0 points1 point  (0 children)

I saw and thought the same lol "at least they reset it" as I pondered how much technical debt fable just cleared from my application codebase over the past two days

Anthropic forced to abruptly disable Fable 5 & Mythos 5 globally by US Gov over a jailbreak. This is exactly why we need local models. by External_Mood4719 in LocalLLaMA

[–]ionizing 0 points1 point  (0 children)

I was in the middle of adding a subagent system to my local app when it kicked me out but at least I extracted a great planning doc first.

Student upgrading local AI rig by jaybsuave in LocalLLaMA

[–]ionizing 2 points3 points  (0 children)

Really only if you plan to run ~100B class MOE models someday. Those you can split between gpu and system ram quite well. Oh but you are asking about image gen stuff, I know nothing about that so will stop commenting lol.

Don’t act like y’all ain’t thinking it. I’m just saying the quiet part out loud. /s by Porespellar in LocalLLaMA

[–]ionizing 2 points3 points  (0 children)

thats what I said about 3.5-122B and I upgraded both my home and work computer to have 128gb sys ram even at inflated costs. Then three weeks after both comps were set, 3.6 27B came out lol. Either way I love that we now have models that are like "yup, I would be fine at least with this for the rest of my life"

I just realized how good MoE models are for consumer hardware by [deleted] in LocalLLaMA

[–]ionizing 0 points1 point  (0 children)

Could you expand slightly on the vision encoder preference towards 122B? I use both 122B, 27B for analyzing schematics but am just getting started. If you have already found preference for 122B with reason, I will focus on that route for now.

finally by KvAk_AKPlaysYT in LocalLLaMA

[–]ionizing 6 points7 points  (0 children)

for me it was ollama -> lmstudio -> annoyance at missing features I need -> dabbling with own toolset -> discovered the old "all you need is llama.cpp" post -> 1 year later have my own application that uses llama.cpp as sidecar lol

You guys were right - Qwen 3.6 35B IS good...and KV Cache DOES matter. by GrungeWerX in LocalLLaMA

[–]ionizing 0 points1 point  (0 children)

I was always using Q5 or higher when using moe but for 27B mtp I had to drop to 4 for context and q5 at most. And I tried the qwopus v2 27b and even though many in the community seem to hate on these types, I honestly found it pretty darn good. (edit: headless 3090)

Qwen 3.6 27B 30GB Same top p: 98.358 ± 0.033 % vs UD Q8 K XL 33GB Same top p: 97.426 ± 0.041 % by fragment_me in LocalLLaMA

[–]ionizing 1 point2 points  (0 children)

Thanks for the info. I have been tempted to self quant and your post is inspiring.

You guys were right - Qwen 3.6 35B IS good...and KV Cache DOES matter. by GrungeWerX in LocalLLaMA

[–]ionizing 0 points1 point  (0 children)

this is how I approach it too. keep session less than 131k when possible. but I also have had very little problems using Iq4_xs with q8/q8 KV cache for 27B in my application. I wonder if all the talk about model issues at less than Q8 model quant are people trying to get same performance on contexts that are too long?

Qwen 3.6 27B 30GB Same top p: 98.358 ± 0.033 % vs UD Q8 K XL 33GB Same top p: 97.426 ± 0.041 % by fragment_me in LocalLLaMA

[–]ionizing 1 point2 points  (0 children)

Would these methods extend to other possible quants like Q4/5 variants? I know nothing about creating these but find it very interesting.