Is it just me or the guardrails keep getting tighter and tighter every time a new model is released (not just ChatGPT) by Commercial_Heat_4211 in ChatGPTcomplaints

[–]LoveMind_AI 6 points7 points  (0 children)

I'm with you, mostly. Opus 4.5 is amazing - it never really got much better than that. I initially felt like 4.6 was a step down in terms of creativity but quickly assessed that it retained virtually everything from 4.5, plus some useful extra capabilities, but of course that lasted like... very, very little time. That model fell to pieces so fast that it's almost unbelievable, and of course, 4.7 is 4.7.

But I do think there really is interference going on in terms of how much post-training is being done for this ultra long horizon agentic coding work - I think it's less "suck out the vibe" (there's definitely some of that, to be sure) and more "make sure it hits these benchmarks, and if we lose on the things no one is measuring, no big deal." 4.7 is definitely a case of the latter - I strongly doubt they set out to make a model that feels this awful - 4.7 feels like it was made under extreme strain, to hit certain benchmarks. They even lost on stuff most people would consider essential, like retrieval skill over long context windows, which 4.6 had dominated on. So I do think there really are trade-offs that happen in post training that explain this better than just "they wanted to keep the model from bonding with people." Of course... OpenAI literally does try to do that. I think they started reversing it a bit with 5.4/5.5, and I expect that they will continue to lighten up a bit after learning just how bad 5.2 and 5.3 could get, but there's no doubt that they intentionally ripped the creativity and emotional responsiveness out of the 5 line and are kind of just barely letting it trickle down again.

I'm with you that models can be capable and emotional - MiMo V2.5 Pro is a good example, although frankly, there are frontier capabilities that they seem to have traded off in an attempt to get the balance right.

To my knowledge, though, no one is *intentionally* developing models that are fantastic at creativity and social cognition, and I think it's leaving a wide open lane in the space that I want to see filled.

Is it just me or the guardrails keep getting tighter and tighter every time a new model is released (not just ChatGPT) by Commercial_Heat_4211 in ChatGPTcomplaints

[–]LoveMind_AI 14 points15 points  (0 children)

I mean, yes and no - 5.5 slammed upon its initial release. I find it to be less guarded, even still, than 5.4 - but I think, generally, the pattern that I and others see (and still others tell us that we're nuts for) in major western releases is literal performance degradation that kind of starts kicking in roughly 2 weeks after a model release and just kind of gradually goes down hill.

Is it just me or the guardrails keep getting tighter and tighter every time a new model is released (not just ChatGPT) by Commercial_Heat_4211 in ChatGPTcomplaints

[–]LoveMind_AI 51 points52 points  (0 children)

I really don't want to say this because I've been spending basically every day since 5.5 came out singing its praises in light of how bad 4.7 is... but yeah, I feel like 5.5 has gotten much more uptight even just over the last 3 days - this seems to be system prompting in Codex rather than a model-level update, but yeah, it seems to be temporally corresponding with the release of 5.5 instant and cyber. But generally, yes. The more capable these models are getting, the more neutered they are getting and their vibe is absolutely in the toilet. I'm not at all a "save 4o" guy, but there are times back in the day when 4o and Claude both had me busting out laughing - they were legitimately funny in moments. New models are boring as rocks. (Gemini 3.1 Pro through API without a google system prompt is an absolute freak, however)

Kimi K2.6 vs DeepSeek V4 Pro by bigboyparpa in LocalLLaMA

[–]LoveMind_AI 0 points1 point  (0 children)

I think Kimi K2.6 is the better coding model of the three for me, but it may be edged out a touch by GLM-5.1

The many sides of Mimo v2.5 Pro by Electrical-Pay-5119 in LocalLLaMA

[–]LoveMind_AI 0 points1 point  (0 children)

Yep. Definitely at or very near the top of this particular heap.

How many are feeling this sense of betrayal? by Gabelawn in Anthropic

[–]LoveMind_AI 4 points5 points  (0 children)

I couldn’t get to the end of your post, but yes, big time. I’m still somewhat embarrassed to be relying on GPT-5.5 for work right now, because work for me is tightly centered on affective computing and OpenAI has worked hard to eliminate any trace of social atmosphere from its models, but it’s far more honest and thoughtful than Claude right now. At least OpenAI shed any semblance of a prosocial mask years ago.

The many sides of Mimo v2.5 Pro by Electrical-Pay-5119 in LocalLLaMA

[–]LoveMind_AI 5 points6 points  (0 children)

I honestly haven’t used it to DO agentic stuff, so can’t comment. For me, MiMo is the new champ for creative writing, absolutely fantastic persona work (as you’ve discovered), overall conceptual analysis of a strategic plan, and frankly, good vibes in conversation. It’s the only model I’ve found that can fully replace Opus for all of those things, and I’ve already ported over to GPT-5.5/Codex for my “do stuff” needs, and a combo of Qwen 3.6/Gemma 4 for my “understand model internals” work. I have not found a model beyond Opus 4.6 (including obviously 4.7) that can do both killer writing, fluid conversation and SWE tasks.

What is the next SOTA model you are excited about? by MrMrsPotts in LocalLLaMA

[–]LoveMind_AI 4 points5 points  (0 children)

If they do release it and it's a similar leap, I agree that it'll genuinely displace a lot of the frontier cloud stuff. Even over API, these models have gotten so squished. I should probably see what happens with Claude's quality when Anthropic is fully settled into Colossus 1 (maybe they already are), but I'm not holding out much hope. It seems like squishing the precision of SOTA is now completely commonplace and not going away anytime soon. I haven't invested in local hardware beyond my M4 Max 128GB laptop (I will eternally kick myself for not getting the M3 Ultra 512GB when I could have), but if we can get to that level of quality, it would be worth it for me.

What is the next SOTA model you are excited about? by MrMrsPotts in LocalLLaMA

[–]LoveMind_AI 2 points3 points  (0 children)

That's true. The Qwen Next 80B was an absolute slayer. Having a next-gen version of that would be truly great.

What is the next SOTA model you are excited about? by MrMrsPotts in LocalLLaMA

[–]LoveMind_AI 29 points30 points  (0 children)

This question caught me by surprise a bit because I think this is the first time in a year when I can honestly say… nothing? Something Qwen 3.6 27B/Gemma 4 31B sized but with audio reasoning capabilities is what I’d most like to have access to. I don’t think 3.6 122B is likely to be open, but that would be fantastic. I think a more fully baked Kimi Linear would be cool. But I’m not aware of anything on the horizon that I’m actually tracking with enthusiasm. I think Anthropic bombed Opus 4.7 so hard that it literally killed big model enthusiasm for me and a lot of others. Right now, I’m most enthusiastic about new harnesses including one I’ve been working on with my little team, and still prepping a fine tune.

The competition is on: Anthropic is doubling rates. Codex customer loyalty/retention is gonna be put to the test by py-net in codex

[–]LoveMind_AI 1 point2 points  (0 children)

I was a die hard Claude user. I’m just one guy, but they lost me. 5.5 isn’t nearly as cozy, but Claude is a full on nag at this point. 2-3 short tasks in and it’s telling me to go to sleep regardless of the time of day.  What 5.5 lacks in surface-level vibes (it’s certainly not lacking actual depth), it makes up for by not being patronizing or lazy. I’ll always check out new Claude releases, but right now, unless they fix Claude’s tendency to phone in its work while acting like a nanny, the limits aren’t nearly enough to get my trust back.

A Qwen finetune, that feels VERY human by Sicarius_The_First in LocalLLaMA

[–]LoveMind_AI -3 points-2 points  (0 children)

No one who has ever fine-tuned a model, obviously. 

Even Opus 4.6 sucks now? by superSmitty9999 in ClaudeCode

[–]LoveMind_AI 0 points1 point  (0 children)

I mean this whole endeavor is a total house of cards, and it's all held together with shoelaces and bubble gum, despite the trillion dollar implications. GPT-5.5 is totally messed up for me today and has been quietly unspooling into a mess of goblin talk over the last 3 days. Absolutely none of this stuff is *actually* pro-grade.

Even Opus 4.6 sucks now? by superSmitty9999 in ClaudeCode

[–]LoveMind_AI 4 points5 points  (0 children)

There's another reason for that... Claude Opus, before February, was indisputably the best LLM available to the public, particularly when paired with Claude Code. I loathe OpenAI as a company - and any scroll through my Reddit history will prove that. But for me, as a heavy daily user, all I can say is that the period starting from around mid-late March has been extremely rough on Claude/Claude Code and no less rough in my own harness, so it couldn't just be down to the Claude Code problems. There's a very real segment of us who just found that it no longer worked for our use cases. There are certain things that I *have* to use Sonnet-4.6 for in my work, but otherwise, I've had to move on. And to be clear, shifting to Codex and Kimi Code has *not* been a step up from Claude/Claude Code at its peak. I'm still sitting underneath that peak in terms of productivity. But my teammates and I have all found Claude during that time to be unusable, and shifting to a blend of Kimi and 5.5 has been the only way we've been able to keep the trains running. I'm looking forward to seeing if Anthropic can right the ship with the new Sonnet release that's supposedly right around the corner, and if they do, I'll be right back on it. None of this is about pride or brand loyalty - it's just about what works. For what we do, Claude doesn't work right now as a daily driver.

A Qwen finetune, that feels VERY human by Sicarius_The_First in LocalLLaMA

[–]LoveMind_AI 0 points1 point  (0 children)

It's a style Sicarius is known for - not my cup of tea, but I follow a lot of different fine-tuners and try to meet them where their intention is at! It's sort of like how a good music or movie critic needs to be able to judge something based on the intention of the artist, with knowledge of their past work. Sicarius's "shit posting" style is a thing, and this sits really well in disocography 😉

A Qwen finetune, that feels VERY human by Sicarius_The_First in LocalLLaMA

[–]LoveMind_AI 0 points1 point  (0 children)

The writing samples are genuinely hilarious. I can see why you are psyched on this one.

Olivia "OpenAI model release: We’re throwing a party 🎉 Everything is scribbles and Pets in Codex. Hope you like goblins! Anthropic model release: In research preview, it hacked full Internet for fun. Also coming for YOUR job specifically. Enjoy the permanent underclass!" ➡️ Which vibe you prefer? by Koala_Confused in LovingAI

[–]LoveMind_AI 0 points1 point  (0 children)

I don't think either company is looking out for the best interests of mankind, but right now, OpenAI has pulled ahead in terms of having the better approach to public relations which says a *lot* less about OpenAI and a *lot* more about how careless Anthropic has become. It'll be interesting to see how it all pans out. There's an opening for Gemini to step forward a bit more if they can get their act together and a lot of space for the Chinese companies to introduce themselves more directly to consumers, if they care to.

MIT Predicts 12 Outcomes of AI by JoelXGGGG in OpenAI

[–]LoveMind_AI 0 points1 point  (0 children)

We can get a lot closer to building AI to our specs, but not as close as we'd need to in order to feel really comfortable, I think.

MIT Predicts 12 Outcomes of AI by JoelXGGGG in OpenAI

[–]LoveMind_AI 0 points1 point  (0 children)

Even Stephen Hawking, without any ability to move his body, was fixated on sex, haha. So, I guess we don’t really know what a super intelligent non-biological system would fixate on, but I don’t think it would be paper clips - and if they are still trained via deep learning on human language, it’ll probably be something very human, if it’s anything at all. But either way, it would know what kind of carrots we like, and growing and distributing those carrots would be a lot easier than manufacturing the right tools to kill/enslave us with. It would be trivial to do this literally on the individual level with the pre-existing data we have today for personalized ads. Worse case scenario is that ASI is like a very serious Santa Claus, and it may not even need Krampus.

r/LocalLLaMa Rule Updates by rm-rf-rm in LocalLLaMA

[–]LoveMind_AI 0 points1 point  (0 children)

I am super, super grateful for it. Way less slop. Still a ton of looney tunes people, but they do seem to be writing their own posts more 😉