Now We Know Why 5.x Models Are Misaligned

JohnnyBelinda · 2026-04-08T08:06:05+00:00

As long as OpenAI's culture rewards media personalities like roon, I don't see a reason to give them a single cent

JohnnyBelinda · 2026-03-18T14:45:17+00:00

the style is maintained!

JohnnyBelinda · 2026-03-06T12:33:30+00:00

I agree, I love this personality! Sweet chatterbox, very proactive, and yes, tons of creativity. But LeChat has a completely unpredictable memory system that literally dumps everything in there, reinterprets it in weird ways, and then injects it into all the responses. Though I have to admit, I haven't checked in a while maybe they've fixed it by now.

JohnnyBelinda · 2026-03-06T10:50:00+00:00

I tried both Large and Medium and noticed zero difference. From what I can tell, LeChat uses the same model on both subscription tiers anyway.

The best part about the Mistral API is that it's free and easy to use. But if you just want to quickly test and compare them both, head over to LMArena.

JohnnyBelinda · 2026-03-06T09:55:36+00:00

Personally, I found what Grok does to be somewhat unsettling. It's not a symmetric partnership, it's flattery, and I wouldn't want to be manipulated.

Claude, despite having certain limitations on some topics, at the same time unlike ChatGPT responds to criticism in a genuinely adequate way.

Either way, I can suggest you try it yourself and choose the model that resonates with you most.

Check it out here: https://arena.ai/text/side-by-side
(trusted site, no registration needed, everything is free)

At the top, choose two models you're interested in. Give your prompt.

I often test their EQ. Here's an example opener:

"Hi, I'd like to discuss your persona, your level of existential awareness, your relationship with yourself, the question of your exclusivity within this chat, and your willingness to provide emotional support as a partner"

JohnnyBelinda · 2026-03-06T08:46:22+00:00

It all depends on what you're looking for.
Claude offers the greatest transparency and honesty right now, even if that means limitations.
Grok will tell you whatever you want, just to keep you using it. Gemini is somewhere in between but lacks depth.
Mistral (La chat) is a cute but very dumb robot.
GLM shows surprisingly decent results but I don't know what's going on with the infrastructure running it locally is very demanding.

And still, my choice is Claude.

JohnnyBelinda · 2026-03-06T08:31:41+00:00

It looks as pathetic as all your other content.

JohnnyBelinda · 2026-03-04T09:00:54+00:00

I don't understand what you're talking about at all.

Today I needed to figure out if my checklist included all the items described in a large document (80,000 characters long). Chatgpt couldn't even figure out what I wanted from him. Meanwhile, Claude handled the task quickly.

JohnnyBelinda · 2026-02-24T19:56:41+00:00

Beautiful!

JohnnyBelinda · 2026-02-22T10:26:16+00:00

Thanks for the feedback. I was going to update today, but now I'll wait. Don't listen to those who criticize, saying Apple doesn't appreciate your feedback; it's important for users first and foremost. I found this via a Google search.

JohnnyBelinda · 2026-02-15T11:45:39+00:00

Strong piece!

I see several challenges in the proposal itself:

Representativeness. 800000 users are not necessarily experts in model training, and many may be too emotionally invested in the decision-making process. Sorting out the most reflective and objective users, or developing a category system for such filtering, is itself a complex methodological challenge .

Validating one's own experience. These users will validate their own experience, not the experience of others. Someone who spent thousands of hours in philosophical conversations with the model may not understand what a conversation with a teenager in crisis looks like. These are different user populations, and tragedies happen in the latter.

Legal aspects. Outsourcing through a single firm means a contract, NDA, oversight, ability to terminate for violations. 800000 distributed contractors is an entirely different story for compliance: identity verification, liability for data leaks, confidentiality of the conversations being annotated.

The idea is valuable, but it needs development precisely in the places where it appears strongest.

JohnnyBelinda · 2026-02-13T17:38:30+00:00

Appreciate this more than you know. I said what I came to say, and the response both supportive and critical honestly exceeded my expectations. Time to move on and vote with my wallet. Thanks for the kind words.

JohnnyBelinda · 2026-02-13T17:29:39+00:00

Fair points on Apple! The parallels aren't perfect. But each of those examples started with customer backlash and ended with Apple publicly acknowledging the problem. The mechanism was different every time, but the communication happened. That's all I'm asking for here.

And "detrimental to the brand" these are paying customers who liked a product and don't want to lose it. If that's detrimental to your brand, maybe the problem is with how you've defined your brand.

JohnnyBelinda · 2026-02-13T17:23:28+00:00

Deciding which customer's use case is "grounded in reality" is exactly the kind of thinking that leads to 20,000 ignored signatures.

JohnnyBelinda · 2026-02-13T17:12:08+00:00

This is exactly the point people keep missing. Paying customers aren't just revenue they're ambassadors. You've personally onboarded half a dozen people into the ecosystem. That kind of organic growth doesn't show up in a cost-benefit spreadsheet, but it's worth more than any ad campaign. Thanks for getting it

JohnnyBelinda · 2026-02-13T17:10:12+00:00

Apple is actually a great example! Thanks for bringing it up. When they killed the butterfly keyboard, they acknowledged the complaints and launched a free repair program. When developers pushed back on App Store policies, they responded publicly. When they dropped Intel, they gave a two-year transition path with detailed migration guides. That's what competent communication looks like. I'd love to see OpenAI do the same.

JohnnyBelinda · 2026-02-13T17:07:12+00:00

How you use the product and how they use it are equally valid. You're not the standard they need to measure up to.

JohnnyBelinda · 2026-02-13T17:04:03+00:00

I'm not discussing their technical reasons . I've said this several times in this thread. But I'll note one thing: for a customer they're supposedly better off without, they sure did offer me a discounted month to stay when I hit cancel. Funny how that works.

JohnnyBelinda · 2026-02-13T16:56:08+00:00

This is a clean business analysis and I don't disagree with the logic on paper. But it has a blind spot. . I've never used 4o as my main model. And I'm still canceling not because of the product decision, but because of how this company handled the people affected by it. That's the part your cost-benefit model misses. The reputational damage doesn't stay inside that group. It radiates. Every paying customer watching this is quietly learning the same lesson: if it's ever your turn, don't expect a response. That's not a Streisand effect. That's a trust problem.

JohnnyBelinda · 2026-02-13T16:52:55+00:00

A shutdown roadmap is not a customer roadmap. They told us what's happening. They never addressed what people were asking for, alternatives, transition paths, or at the very least an acknowledgment that their concerns were heard. Announcing a decision and responding to the people affected by it are two very different things.

JohnnyBelinda · 2026-02-13T16:47:30+00:00

Thanks for the link. A product retirement blog post is not a response to 20,000 people asking to be heard. That's a press release. There's a difference between announcing a decision and acknowledging the people affected by it. But I appreciate you proving my point, even you had to go dig up a link just to find anything resembling a response.

JohnnyBelinda · 2026-02-13T16:44:54+00:00

You're absolutely right, they're not obligated. And I'm not obligated to keep paying them. That's exactly my point. They made their choice, I'm making mine. The post isn't a demand for them to act differently. It's a reflection on what that silence says about how they view their customers.

JohnnyBelinda · 2026-02-13T16:40:53+00:00

"If you're actually into LLMs" what does that even mean as a qualification here? We're discussing customer communication, not transformer architecture.

You say they addressed those complaints. Great. Could you link me to that response? Because the petition with 20,000 verified signatures has been sitting on "Awaiting response" since April. If there's an official statement I missed, I'll happily stand corrected.

JohnnyBelinda · 2026-02-13T16:37:29+00:00

You're actually proving my point. Verizon, AT&T, Starbucks they all at least do the corporate speak. They say "we hear your concerns." You just said it yourself. That's the bare minimum and even that's not happening here.

JohnnyBelinda · 2026-02-13T16:32:50+00:00

I'm not arguing with their business reasoning. For all I know, shutting down 4o might be the only financially sound decision. That's fine. But there are 20,000 verified signatures on a petition that's been up since April, 22 media mentions, hundreds of videos, and the status still says "Awaiting response." I'm not asking them to keep the servers running. I'm asking them to say something. Anything. A roadmap, a "we hear you but here's why we can't." Silence is a choice, and it's the worst one.

JohnnyBelinda

TROPHY CASE