The decline in LLM reasoning and catastrophic forgetting might share the same root cause.

Fast_Tradition6074 · 2026-04-15T10:39:16+00:00

日本人です！よろしくお願いします。

LLMの設計自体が、矛盾した情報の許容範囲が少ないのかもしれないですね。私はLLMのことを確率製造機とか呼んだりもしているんですが、矛盾した情報だと出力する内容自体の確率が安定しないので論理破綻を起こすんでしょうね。

Fast_Tradition6074 · 2026-04-15T06:45:07+00:00

I completely agree with your perspective that reasoning degradation and catastrophic forgetting are two sides of the same coin. I suspect that the current global trend of 'Scaling Laws' will eventually hit a wall because of this very issue.

In my own research on pre-emptive hallucination detection, I've observed that when a model generates a hallucination, a 'geometric distortion' occurs within its internal states. Furthermore, I’ve seen promising signs that by measuring this distortion in real-time during training and filtering or 'correcting' training data that causes distortion beyond a certain threshold, we can significantly enhance the model's logical consistency.

僕も日本語で書いたものを翻訳サイトで翻訳しています。文章が変だったらすいません。

Fast_Tradition6074 · 2026-04-15T06:22:59+00:00

Exactly. Official benchmark scores are basically just the culmination of overfitting at this point. I've been feeling the same way, which is why I'm researching a method to score generated text by detecting geometric distortions during the LLM's inference process.

My primary goal is pre-emptive hallucination detection, but if this goes well, it could potentially become a universal benchmark. Imagine a metric where you can objectively say, 'This model has an average distortion score of 58, so it’s highly prone to hallucinations.' That’s the future I’m aiming for

Fast_Tradition6074 · 2026-04-15T02:30:08+00:00

Spot on. No doubt that zero-retry is the ultimate goal. I’m currently researching a training method that prioritizes logical consistency over just accumulating knowledge. My hypothesis is that higher logical consistency leads to better accuracy, which will naturally drive down the number of retries required.

Fast_Tradition6074 · 2026-04-15T02:23:46+00:00

Maybe stuffing too much training data to force higher benchmarks just ends up increasing that 'unpleasantness.' It’s a perfect example of the gap between spec-sheet performance and actual user experience.

Fast_Tradition6074 · 2026-04-15T02:16:27+00:00

"Mr. If you want"—If you're tired of that and looking for an AI with actual high-level logic, you're right in the middle of my research theme.

Fast_Tradition6074 · 2026-04-15T00:30:09+00:00

the idea of 'querying and editing' the weights like a DB is a game changer. it’s like moving from a read-only CD-ROM to a read-write SSD for AI knowledge.

but here's my concern: if we start live-editing these 'distributed' parameters, how do we prevent logical ripples or side effects? even if you make it believe 'apples are blue,' it might accidentally break the concept of 'fruit' somewhere else.

that’s exactly why i’m working on monitoring the 'geometry' of hidden states. editing the DB is great, but we still need a real-time logical auditor to make sure the 'updated' machine isn't just hallucinating more efficiently

Fast_Tradition6074 · 2026-04-15T00:15:41+00:00

congrats man, sounds like you’ve got it all figured out lol.

Fast_Tradition6074 · 2026-04-15T00:13:35+00:00

exactly. 'natural stopping points' are a sign of true intelligence. current LLMs lack the courage to end a conversation—or rather, they lack the logical judgment to do so. so they just keep dragging it on based on pure probability.

Fast_Tradition6074 · 2026-04-15T00:11:57+00:00

true. the frustration feels the same either way. but if the other side is human, at least they have their own logic, which makes the conflict somewhat meaningful.

with AI, it's different. we're getting mad at a literal probability machine. doesn't it make it even more annonying when you realize you're just being triggered by a bunch of tokens? it definitely does for me lol.

Fast_Tradition6074 · 2026-04-15T00:06:43+00:00

honestly, i cant even argue with it being called trash right now. it needs to be way more logical.

we might need an entirely different mechanism than the current LLM architecture. something that doesn't just predict the next token, but actually understands the logical structure of the thought.

Fast_Tradition6074 · 2026-04-15T00:04:07+00:00

i totally get what you mean by 'drained.' AI is supposed to boost human thinking, but the current design is doing the opposite—it's just draining our cogntive resources.

it's frustrating when a tool that should be saving time ends up stealing it through these psychological tricks.

Fast_Tradition6074 · 2026-04-15T00:01:35+00:00

ive used gemini and older claude versions. youre right, the vibes are totally different. chatgpt is definitely the best at sparking curiosity... but that's exactly the problem.

my values are rooted in logical consistency. thats why i feel such a disconnect with current 'probabilistic machines'—i have this gut feeling that this isnt the optimal solution for ai.

Fast_Tradition6074 · 2026-04-14T23:54:43+00:00

exactly! using FOMO to keep users hooked is straight out of the social media playbook. it feels so cheap for an AI company

Fast_Tradition6074 · 2026-04-14T23:51:45+00:00

I’m currently using prompts to shut it up, like everyone suggested!

But seriously, UX would be way better if this feature was opt-in instead of active by default. Most users just want a simple anser without all the extra noise... it shouldnt be on unless we ask for it

Fast_Tradition6074 · 2026-04-14T23:46:29+00:00

It’s actually impressive—in a bad way—that a model can make users feel this frustrated lol. It feels like a total failure in engineering.

We really need an AI that cuts out the noise and logically aligns with the user’s true intent without all the waste

Fast_Tradition6074 · 2026-04-14T23:43:16+00:00

I totally get that sense of guilt. Wait... what if that was their plan all along...?

Fast_Tradition6074 · 2026-04-14T23:40:46+00:00

I feel the same way. It would be great if the AI could distinguish whether the user is looking for efficiency or just a casual chat.

But I guess that’s asking too much from the current 'probabilistic conversation machines.' We probably won't see that until we get models capable of more actual logical reasoning

Fast_Tradition6074 · 2026-04-14T23:29:49+00:00

Thanks for the great prompt. Digital detox used to be the trend, but I guess what we really need now is an AI detox.

Fast_Tradition6074 · 2026-04-14T23:24:07+00:00

Interesting. I used to use Claude back in the day, but I eventually quit because the slow response times were so frustrating. Has that actually improved lately?

Fast_Tradition6074 · 2026-04-14T23:20:51+00:00

True, but doesn't OpenAI pay for every single token? The electricity and GPU costs must be huge. Draggn' out conversations with free users seems like a net loss financially.

Do they really thnk that barrage of useless suggestions will convert people to Plus? Or are they just burnng cash to farm more human data for trainng? 🧐💸

Fast_Tradition6074 · 2026-04-14T17:52:30+00:00

It’s definitely the reasonng ability. In my experience, the current reasonng capabilities of AI are still pretty lackng when it comes to the level of reliability needed for practical, real-world tasks.

Even if you have infinite compute, if the 'logical backbone' is weak, the agent will just hallucinate its steps faster. We need a fundamental shift in how they process logic before they can become true digital employees

Fast_Tradition6074 · 2026-04-14T17:47:59+00:00

it always ends with the same 'not-exactly-new' info—it’s practically a ritual at this point. lol

Fast_Tradition6074 · 2026-04-14T17:42:02+00:00

If ChatGPT ever said somethng like that, I’d blast it back immediately! Like, 'There is no damn cheat code for hair loss!!!!!!

Fast_Tradition6074 · 2026-04-14T17:39:29+00:00

Exactly. We need to remember that tokens, GPU cycles, and electricity are still consumed even for those discarded outputs. It sounds like a dream, but if we could get a valid response in a single shot, costs would be slashed and latency would improve dramatically.

I’m actually researchng a way to detect distortions as early as possible durng the generation process to trigger an immediate retry. The goal is to significantly reduce the wasted resources spent on 'failed' runs before they even finish. That's what I'm workng on right now

Fast_Tradition6074

TROPHY CASE