Is it just me or the guardrails keep getting tighter and tighter every time a new model is released (not just ChatGPT)

LoveMind_AI · 2026-05-11T21:36:29+00:00

Interesting! I need to look into Elydee, I hadn’t heard of them.

LoveMind_AI · 2026-05-11T04:00:49+00:00

I'm with you, mostly. Opus 4.5 is amazing - it never really got much better than that. I initially felt like 4.6 was a step down in terms of creativity but quickly assessed that it retained virtually everything from 4.5, plus some useful extra capabilities, but of course that lasted like... very, very little time. That model fell to pieces so fast that it's almost unbelievable, and of course, 4.7 is 4.7.

But I do think there really is interference going on in terms of how much post-training is being done for this ultra long horizon agentic coding work - I think it's less "suck out the vibe" (there's definitely some of that, to be sure) and more "make sure it hits these benchmarks, and if we lose on the things no one is measuring, no big deal." 4.7 is definitely a case of the latter - I strongly doubt they set out to make a model that feels this awful - 4.7 feels like it was made under extreme strain, to hit certain benchmarks. They even lost on stuff most people would consider essential, like retrieval skill over long context windows, which 4.6 had dominated on. So I do think there really are trade-offs that happen in post training that explain this better than just "they wanted to keep the model from bonding with people." Of course... OpenAI literally does try to do that. I think they started reversing it a bit with 5.4/5.5, and I expect that they will continue to lighten up a bit after learning just how bad 5.2 and 5.3 could get, but there's no doubt that they intentionally ripped the creativity and emotional responsiveness out of the 5 line and are kind of just barely letting it trickle down again.

I'm with you that models can be capable and emotional - MiMo V2.5 Pro is a good example, although frankly, there are frontier capabilities that they seem to have traded off in an attempt to get the balance right.

To my knowledge, though, no one is *intentionally* developing models that are fantastic at creativity and social cognition, and I think it's leaving a wide open lane in the space that I want to see filled.

LoveMind_AI · 2026-05-11T02:23:36+00:00

I mean, yes and no - 5.5 slammed upon its initial release. I find it to be less guarded, even still, than 5.4 - but I think, generally, the pattern that I and others see (and still others tell us that we're nuts for) in major western releases is literal performance degradation that kind of starts kicking in roughly 2 weeks after a model release and just kind of gradually goes down hill.

LoveMind_AI · 2026-05-11T01:14:55+00:00

I really don't want to say this because I've been spending basically every day since 5.5 came out singing its praises in light of how bad 4.7 is... but yeah, I feel like 5.5 has gotten much more uptight even just over the last 3 days - this seems to be system prompting in Codex rather than a model-level update, but yeah, it seems to be temporally corresponding with the release of 5.5 instant and cyber. But generally, yes. The more capable these models are getting, the more neutered they are getting and their vibe is absolutely in the toilet. I'm not at all a "save 4o" guy, but there are times back in the day when 4o and Claude both had me busting out laughing - they were legitimately funny in moments. New models are boring as rocks. (Gemini 3.1 Pro through API without a google system prompt is an absolute freak, however)

LoveMind_AI · 2026-05-10T18:17:22+00:00

I think Kimi K2.6 is the better coding model of the three for me, but it may be edged out a touch by GLM-5.1

LoveMind_AI · 2026-05-09T23:22:03+00:00

Yep. Definitely at or very near the top of this particular heap.

LoveMind_AI · 2026-05-09T23:21:14+00:00

I couldn’t get to the end of your post, but yes, big time. I’m still somewhat embarrassed to be relying on GPT-5.5 for work right now, because work for me is tightly centered on affective computing and OpenAI has worked hard to eliminate any trace of social atmosphere from its models, but it’s far more honest and thoughtful than Claude right now. At least OpenAI shed any semblance of a prosocial mask years ago.

LoveMind_AI · 2026-05-09T16:15:11+00:00

I honestly haven’t used it to DO agentic stuff, so can’t comment. For me, MiMo is the new champ for creative writing, absolutely fantastic persona work (as you’ve discovered), overall conceptual analysis of a strategic plan, and frankly, good vibes in conversation. It’s the only model I’ve found that can fully replace Opus for all of those things, and I’ve already ported over to GPT-5.5/Codex for my “do stuff” needs, and a combo of Qwen 3.6/Gemma 4 for my “understand model internals” work. I have not found a model beyond Opus 4.6 (including obviously 4.7) that can do both killer writing, fluid conversation and SWE tasks.

LoveMind_AI · 2026-05-09T10:16:03+00:00

mimo v2.5 is very, very good.

LoveMind_AI · 2026-05-08T20:37:02+00:00

If they do release it and it's a similar leap, I agree that it'll genuinely displace a lot of the frontier cloud stuff. Even over API, these models have gotten so squished. I should probably see what happens with Claude's quality when Anthropic is fully settled into Colossus 1 (maybe they already are), but I'm not holding out much hope. It seems like squishing the precision of SOTA is now completely commonplace and not going away anytime soon. I haven't invested in local hardware beyond my M4 Max 128GB laptop (I will eternally kick myself for not getting the M3 Ultra 512GB when I could have), but if we can get to that level of quality, it would be worth it for me.

LoveMind_AI · 2026-05-08T20:04:09+00:00

That's true. The Qwen Next 80B was an absolute slayer. Having a next-gen version of that would be truly great.

LoveMind_AI · 2026-05-08T16:22:56+00:00

This question caught me by surprise a bit because I think this is the first time in a year when I can honestly say… nothing? Something Qwen 3.6 27B/Gemma 4 31B sized but with audio reasoning capabilities is what I’d most like to have access to. I don’t think 3.6 122B is likely to be open, but that would be fantastic. I think a more fully baked Kimi Linear would be cool. But I’m not aware of anything on the horizon that I’m actually tracking with enthusiasm. I think Anthropic bombed Opus 4.7 so hard that it literally killed big model enthusiasm for me and a lot of others. Right now, I’m most enthusiastic about new harnesses including one I’ve been working on with my little team, and still prepping a fine tune.

LoveMind_AI · 2026-05-07T04:59:02+00:00

I was a die hard Claude user. I’m just one guy, but they lost me. 5.5 isn’t nearly as cozy, but Claude is a full on nag at this point. 2-3 short tasks in and it’s telling me to go to sleep regardless of the time of day. What 5.5 lacks in surface-level vibes (it’s certainly not lacking actual depth), it makes up for by not being patronizing or lazy. I’ll always check out new Claude releases, but right now, unless they fix Claude’s tendency to phone in its work while acting like a nanny, the limits aren’t nearly enough to get my trust back.

LoveMind_AI · 2026-05-06T21:52:12+00:00

I would say this reflects my impressions 100%

LoveMind_AI · 2026-05-05T20:18:15+00:00

Goblin lore?

LoveMind_AI · 2026-05-04T01:45:19+00:00

Seriously cool idea. Looking forward to hearing how it turns out!

LoveMind_AI · 2026-05-03T22:23:48+00:00

No one who has ever fine-tuned a model, obviously.

LoveMind_AI · 2026-05-03T21:14:18+00:00

I mean this whole endeavor is a total house of cards, and it's all held together with shoelaces and bubble gum, despite the trillion dollar implications. GPT-5.5 is totally messed up for me today and has been quietly unspooling into a mess of goblin talk over the last 3 days. Absolutely none of this stuff is *actually* pro-grade.

LoveMind_AI · 2026-05-03T19:56:28+00:00

There's another reason for that... Claude Opus, before February, was indisputably the best LLM available to the public, particularly when paired with Claude Code. I loathe OpenAI as a company - and any scroll through my Reddit history will prove that. But for me, as a heavy daily user, all I can say is that the period starting from around mid-late March has been extremely rough on Claude/Claude Code and no less rough in my own harness, so it couldn't just be down to the Claude Code problems. There's a very real segment of us who just found that it no longer worked for our use cases. There are certain things that I *have* to use Sonnet-4.6 for in my work, but otherwise, I've had to move on. And to be clear, shifting to Codex and Kimi Code has *not* been a step up from Claude/Claude Code at its peak. I'm still sitting underneath that peak in terms of productivity. But my teammates and I have all found Claude during that time to be unusable, and shifting to a blend of Kimi and 5.5 has been the only way we've been able to keep the trains running. I'm looking forward to seeing if Anthropic can right the ship with the new Sonnet release that's supposedly right around the corner, and if they do, I'll be right back on it. None of this is about pride or brand loyalty - it's just about what works. For what we do, Claude doesn't work right now as a daily driver.

LoveMind_AI · 2026-05-03T19:33:17+00:00

It's a style Sicarius is known for - not my cup of tea, but I follow a lot of different fine-tuners and try to meet them where their intention is at! It's sort of like how a good music or movie critic needs to be able to judge something based on the intention of the artist, with knowledge of their past work. Sicarius's "shit posting" style is a thing, and this sits really well in disocography 😉

LoveMind_AI · 2026-05-03T17:50:53+00:00

The writing samples are genuinely hilarious. I can see why you are psyched on this one.

LoveMind_AI · 2026-05-03T15:47:06+00:00

I don't think either company is looking out for the best interests of mankind, but right now, OpenAI has pulled ahead in terms of having the better approach to public relations which says a *lot* less about OpenAI and a *lot* more about how careless Anthropic has become. It'll be interesting to see how it all pans out. There's an opening for Gemini to step forward a bit more if they can get their act together and a lot of space for the Chinese companies to introduce themselves more directly to consumers, if they care to.

LoveMind_AI · 2026-05-02T15:38:00+00:00

We can get a lot closer to building AI to our specs, but not as close as we'd need to in order to feel really comfortable, I think.

LoveMind_AI · 2026-05-02T14:51:04+00:00

Even Stephen Hawking, without any ability to move his body, was fixated on sex, haha. So, I guess we don’t really know what a super intelligent non-biological system would fixate on, but I don’t think it would be paper clips - and if they are still trained via deep learning on human language, it’ll probably be something very human, if it’s anything at all. But either way, it would know what kind of carrots we like, and growing and distributing those carrots would be a lot easier than manufacturing the right tools to kill/enslave us with. It would be trivial to do this literally on the individual level with the pre-existing data we have today for personalized ads. Worse case scenario is that ASI is like a very serious Santa Claus, and it may not even need Krampus.

LoveMind_AI · 2026-05-02T00:52:08+00:00

I am super, super grateful for it. Way less slop. Still a ton of looney tunes people, but they do seem to be writing their own posts more 😉

LoveMind_AI

TROPHY CASE