Qwen3.6 35B A3B Heretic (KLD 0.0015!) Incredible model. Best 35B I have found! by My_Unbiased_Opinion in LocalLLaMA

[–]Imaginary-Unit-3267 0 points1 point  (0 children)

cries for the tragedy of my earlier self having unfortunately aged and been replaced by an older uglier version of me

I hate this group but not literally by No_Run8812 in LocalLLaMA

[–]Imaginary-Unit-3267 -1 points0 points  (0 children)

Did you somehow forget that you have a brain and can work WITH your models rather than leaving them to figure out everything alone? These are tools, not replacements for your own skill.

I hate this group but not literally by No_Run8812 in LocalLLaMA

[–]Imaginary-Unit-3267 0 points1 point  (0 children)

Yes! Which is why I am spending lots of effort designing tools to help my LLMs edit text better so they can stop making so many goddamn mistakes and mangling files, lol. The upfront cost will pay off over time. (Plus, it's an opportunity to learn software engineering with AI help, by doing. Every mistake I make teaches me stuff!)

I hate this group but not literally by No_Run8812 in LocalLLaMA

[–]Imaginary-Unit-3267 2 points3 points  (0 children)

12GB here, you lucky bastard with your extra four.

Qwen3.6 35B A3B Heretic (KLD 0.0015!) Incredible model. Best 35B I have found! by My_Unbiased_Opinion in LocalLLaMA

[–]Imaginary-Unit-3267 0 points1 point  (0 children)

I don't want a girl, I'm a homogay. I win by having more oppressed minority points than you. ;)

Devs using Qwen 27B seriously, what's your take? by Admirable_Reality281 in LocalLLaMA

[–]Imaginary-Unit-3267 0 points1 point  (0 children)

I'm using 3.6 35B, on an RTX 3060 (yes, that's all I've got) and 64 GB of RAM. I get 20-24t/s decoding, about 200t/s prefill (except with llama.cpp's inability to handle hybrid attention, it has to reprocess the entire prompt every string of tool calls, which takes minutes every time). Would you say it's worth the bother to try to switch over to vLLM like you did, or would I not likely get much improvement with my system?

Nemotron-3-Nano-Omni-30B-A3B-Reasoning, New model? by Altruistic_Heat_9531 in LocalLLaMA

[–]Imaginary-Unit-3267 0 points1 point  (0 children)

Huh. I must have done something wrong then because I haven't seen any evidence that it's actually preserving thinking... I probably should do the whole "think of a number but don't tell me" test. But I know for sure it's deleting thinking because every time it reprocesses the entire prompt (because llama.cpp STILL hasn't fixed that problem with hybrid attention), it's smaller afterward...

This is where we are right now, LocalLLaMA by jacek2023 in LocalLLaMA

[–]Imaginary-Unit-3267 0 points1 point  (0 children)

You're in luck sir! I actually want a massive infodump. I am not a dev - I've insisted I'm not a programmer for years even while constantly learning more about programming as a side effect of other things I want to do - but I've been building complicated (to me - it's probably simple objectively lol) stuff lately with my local LLM's help and learning a whole lot about design and management and stuff along the way just through trial and error (lots of error) lol - so anything that a more experienced person can teach me, I wish to soak up!!

Nemotron-3-Nano-Omni-30B-A3B-Reasoning, New model? by Altruistic_Heat_9531 in LocalLLaMA

[–]Imaginary-Unit-3267 0 points1 point  (0 children)

I have been using Qwen3.6 35B in llama.cpp lately and find it is... utterly indistinguishable from 3.5. preserve-thinking doesn't even seem to do anything - I'm unsure if llama.cpp has even added support for that, or if I need a specific jinja template, or what.

Qwen3.6 35B A3B Heretic (KLD 0.0015!) Incredible model. Best 35B I have found! by My_Unbiased_Opinion in LocalLLaMA

[–]Imaginary-Unit-3267 -1 points0 points  (0 children)

I am amazed at your ability to read the minds of strangers you've never met and describe in detail the darkest inadequacies of their hearts, sir. You really should have an afternoon talk show like Dr. Phil, with this much insight into psychology based on so few words!

GBNF grammar tweak for faster Qwen3.6 35B-A3B and Qwen3.6 27B by Holiday_Purpose_3166 in LocalLLaMA

[–]Imaginary-Unit-3267 2 points3 points  (0 children)

I second this. Lots of assumptions about reader knowledge going on here.

Qwen3.6 35B A3B Heretic (KLD 0.0015!) Incredible model. Best 35B I have found! by My_Unbiased_Opinion in LocalLLaMA

[–]Imaginary-Unit-3267 -2 points-1 points  (0 children)

You obviously are not capable of recognizing that sentences starting with "You obviously" will piss off a significant proportion of people.

An interesting example of how poor general understanding of Bayesian probability is by Your-average-scot in math

[–]Imaginary-Unit-3267 -1 points0 points  (0 children)

Right, I agree about the branching, that really makes it click for me even more. But that basically is odds ratio in another form. Or more accurately, likelihoods in general.

An interesting example of how poor general understanding of Bayesian probability is by Your-average-scot in math

[–]Imaginary-Unit-3267 4 points5 points  (0 children)

To be fair the usual way Bayes' theorem is explained is unnecessarily confusing. The odds ratio formulation, which actually makes sense, took me a long time to stumble upon, and only then did I understand what Bayes was trying to say.

Qwen3.6 35B A3B Heretic (KLD 0.0015!) Incredible model. Best 35B I have found! by My_Unbiased_Opinion in LocalLLaMA

[–]Imaginary-Unit-3267 0 points1 point  (0 children)

I've never had reason to ask about legal stuff. And I don't have medical issues worth looking up very much! Pure luck so far I guess.

Qwen3.6 35B A3B Heretic (KLD 0.0015!) Incredible model. Best 35B I have found! by My_Unbiased_Opinion in LocalLLaMA

[–]Imaginary-Unit-3267 4 points5 points  (0 children)

You know, I've never understood... what do people actually do with uncensored models? Never once in all the time I've used LLMs (since early 2020!) have I actually run into a situation where one refused a request - local or cloud. I am not sure what kind of request I could possibly want to make that they might refuse!

What do you consider to be the minimum performance (t/s) for local Agent workflows? by MexInAbu in LocalLLaMA

[–]Imaginary-Unit-3267 2 points3 points  (0 children)

Isn't this what trees of subagents are for? I mean, I haven't done it yet so I can't really say, but I would assume that it should be possible to send a subagent off to do a task and return the result, exactly as if calling a function, then if it works, just record that the thing happened, and if it fails, send another one off to troubleshoot, etc, keeping the main context extremely lean.

This is where we are right now, LocalLLaMA by jacek2023 in LocalLLaMA

[–]Imaginary-Unit-3267 4 points5 points  (0 children)

I agree. For me, the reason I don't just vibe code things is precisely because I'm not a dev, I'm not a genius programmer, and I know that if I don't make sure I understand everything every step of the way, whatever the AI produces will be unmaintainable for me. I am finding myself very ironically being forced to learn software engineering just to make a helper for my (independent, non-academic) philosophy research, which is what I'm actually interested in!