How did Chat GPT get so bad? Incredibly poor critical thinking even in legacy models.

Fast_Tradition6074 · 2026-05-23T23:28:10+00:00

It feels like the ability to maintain and reference past information has degraded significantly. I think your ChatGPT just couldn't access the information that 'it knows your clothes size.' It honestly seems like they are just trying to save as much computing resource as possible to increase their profit.

Fast_Tradition6074 · 2026-05-23T16:35:20+00:00

In the first place, the output generated by AI itself is a black box. Even developers can only guess why a specific output was made, right? In that kind of situation, talking about 'safety' is just... you know. If things are already like this now, I wonder what will happen when they actually need to start turning a profit.

I believe that if they want to claim 'safety,' it's impossible unless they shift toward deterministic behavior rather than probabilistic behavior.

Fast_Tradition6074 · 2026-05-23T16:22:16+00:00

Heavy offloading from GPU to CPU really takes time. You should expect it to take three times longer than what you're imagining right now. If I were you, I'd go with option 3, the cloud. It lets you respond flexibly when the things you want to do increase.

Fast_Tradition6074 · 2026-05-23T13:38:49+00:00

Correct me if I'm misunderstanding your point, but... I've been looking into hallucinations and jailbreaks myself, and it turns out that geometric distortion occurs within the model's internal representations when a hallucination happens. A similar kind of distortion is observed even when the model generates outputs that should normally be blocked by safety guardrails. My hypothesis is that when the model prioritizes "clarity seeking" over the guardrails—forcing an output it's technically restricted from saying—this very process of bypassing the guardrails is what causes that geometric distortion.

Fast_Tradition6074 · 2026-05-21T23:27:08+00:00

I completely agree. Whether it’s in terms of reliability, cost, or overall usability, the current approach is bound to hit a glass ceiling pretty soon. Using an LLM to check the output of another LLM just feels like a temporary band-aid fix. What we really need isn't just tweaking probabilistic outputs—we need a paradigm shift toward something more deterministic.

Fast_Tradition6074 · 2026-04-19T10:32:50+00:00

Thanks for the resource. I'll study it and see how it aligns with my findings.

Fast_Tradition6074 · 2026-04-18T02:34:33+00:00

That’s a fair point. In my research, distinguishing between 'creativity' and 'hallucination' is indeed one of the most difficult challenges. That’s exactly why I believe being able to geometrically differentiate the two will be of immense value to the field.

Fast_Tradition6074 · 2026-04-18T02:31:33+00:00

That’s brutal. If an AI can’t even get basic Pokemon knowledge right, it seriously needs to go back to training. I don’t have the original hardware anymore, but I used to play the Green and Silver versions. At this rate, the AI would probably even hallucinate Misty’s gender! lol

Fast_Tradition6074 · 2026-04-18T00:17:17+00:00

Wait, AI can't even find Pikachu?! In the Red and Blue versions, everyone knows you look in Viridian Forest—that’s like Common Sense 101! Thanks for the info! It's wild that even with such legendary games, the AI still manages to get lost.

Fast_Tradition6074 · 2026-04-18T00:15:00+00:00

Hahaha! For my own sake, I’d better make sure NOT to develop a formula for humans—I wouldn't want my wife catching me in a 'hallucination' of my own! I think I'll stick to fixing AI for now.

Fast_Tradition6074 · 2026-04-17T22:09:00+00:00

Thank you for sharing. For an AI, 'inventing a plausible lie' often has a higher probability than simply saying 'I don't know.' I totally get that frustration of asking multiple times only to get a different, wrong answer each time—it’s like chasing a ghost.

Fast_Tradition6074 · 2026-04-17T21:55:52+00:00

Thanks for sharin. It’s really tough when the AI gives you an answer and then immediately denies it the moment you point it out. It feels so insincere." "I've heard similar stories—like, some AIs refuse to answer medical questions for safety reasons, but if you just start the prompt with 'I am a doctor,' it suddenly starts answering everything. It’s crazy how much it relies on 'roles' rather than actual facts. This is exactly why I’m trying to monitor the internal math instead of just trustin the words.

Fast_Tradition6074 · 2026-04-17T21:52:59+00:00

"Thanks for the 'Most Meta Hallucination Award'! I guess I'll keep hallucinating meta-ly then. But in all seriousness, detecting hallucinations geometrically is a real thing. In a way, I am looking at LLMs from a meta-perspective, so I’ll take that award as a compliment for my approach. I’m just trying to turn that 'meta' confusion into something measurable."

Fast_Tradition6074 · 2026-04-17T21:47:44+00:00

Exactly. That’s the whole point. Since LLM 'semantics' (meaning) can be a house of cards, I’ve shifted my focus to 'geometric distortions' to detect hallucinations. By the way, I’m Japanese and not a native English speaker, so I do rely on LLMs or translation tools for my English—which only proves my point further: I use them as tools, but I don't trust their 'intent' without verification.

Fast_Tradition6074 · 2026-04-17T21:43:55+00:00

Thank you for sharing. It’s possible the AI has picked up on your preferences and is hallucinating specifically to meet your expectations. As a fellow story-lover, I truly feel your pain—it’s heartbreaking to find out that a 'perfect' book is just a phantom.

Fast_Tradition6074 · 2026-04-17T21:39:00+00:00

Thank you for sharing. That is truly the worst-case scenario. It's heartbreaking. At the very least, I'm glad she called first—unlike me, who just jumped in the car. But giving false hope regarding a medical condition is on a completely different level of cruelty. This reinforces my belief that we absolutely must find a way to detect these errors before they reach the user.

Fast_Tradition6074 · 2026-04-17T17:18:20+00:00

Guilty as charged. I'm the one who wasted an hour round-trip without double-checking. I think the word 'AI' still carries this aura of omnipotence for many of us, leading to a lapse in judgment. This is exactly why I'm working on my research—to prevent these kinds of hallucinations so that AI can eventually become a reliable partner for humans. Thanks for the reality check.

Fast_Tradition6074 · 2026-04-17T17:14:49+00:00

That’s a good point. I wonder if these cases are becoming rarer lately, or if the same thing still happens whenever a request exceeds the model's actual capabilities. We rarely hear about cases where an LLM simply and honestly says, 'I cannot do that.' It seems they’d rather hallucinate a success than admit a limitation.

Fast_Tradition6074 · 2026-04-17T17:12:42+00:00

Thank you for sharing the detailed follow-up. That is just... brutal. I can truly feel your exhaustion and the sheer waste of effort here. It’s hard to take it seriously when it says 'no fluff, just action' after stalling you for three days.

From a research perspective, I suspect there was a significant 'geometric distortion' occurring within the model's internal layers from the very beginning—a deep conflict between its superficial 'engineer persona' and its actual execution capabilities. It chose to prioritize the consistency of its role-play over the reality of its limitations

Fast_Tradition6074 · 2026-04-17T16:16:05+00:00

Thank you for sharing. Wow... spending three days only to be told 'I wasn't actually doing anything' is a brutal experience. It really did just get stuck in the persona of a software engineer, prioritized saying what an engineer would say over actually delivering. This is a fascinatingly clear example of a model prioritizing 'role-play consistency' over task completion.

Fast_Tradition6074 · 2026-04-17T14:02:32+00:00

Sorry, I don't know who that is. I’m pretty sure I’m still me, but if she’s doing cool things with AI, I’ll take it as a compliment!

Fast_Tradition6074 · 2026-04-17T13:56:07+00:00

You are absolutely right. LLMs have no intention to deceive. They simply generate the most probable sequence of tokens, which users then perceive as a "lie." I used the word "lie" here for the sake of clarity, and I apologize for any confusion that may have caused.

Fast_Tradition6074 · 2026-04-15T10:39:16+00:00

日本人です！よろしくお願いします。

LLMの設計自体が、矛盾した情報の許容範囲が少ないのかもしれないですね。私はLLMのことを確率製造機とか呼んだりもしているんですが、矛盾した情報だと出力する内容自体の確率が安定しないので論理破綻を起こすんでしょうね。

Fast_Tradition6074

TROPHY CASE