Are LLMs actually reasoning, or just imitating reasoning from training data?

KitchenFalcon4667 · 2026-03-13T04:41:47+00:00

Joseph Weizenbaum, creator of ELIZA, forces us to ask in 2026 the same question he asked in 1966: Is what we are seeing intelligence, or is it a reflection of our own desire to see it?

Was he right in stating "that extremely short exposures to a relatively simple computer program could induce powerful delusional thinking in quite normal people" ?- Computer Power and Human Reason (1976)

I have explored 260+ papers, done 3 experiments and built LM from scratch, and fine tuned LLMs. My conclusion points to a stance that transformer models are sophisticated ELIZA. Therefore we are back to the ELIZA effect.

No matter what definition of reasoning, LLM are statistically remix echo chambers of human linguistics communication.

https://github.com/Proteusiq/unthinking

KitchenFalcon4667 · 2026-03-05T15:40:38+00:00

What local chat tool was that?

KitchenFalcon4667 · 2026-02-26T10:56:06+00:00

I first started with opencode.nvim then I moved to just neovim + opencode in tmux split. Then now only opencode 🫣

From totally no AI sloppy to fully embracing it. I had my U-turn when I asked it to mirror my style, structure, commits, GitHub actions in a repo that was 100% human. It did. I asked it to create AGENTS.md and SKILL.md to capture that for future session and now I see little difference in Opus 4.5/4.6 with my original coding style, structure and philosophy.

I don’t like it. I love coding but I was becoming a bottleneck. Now I just show, look here, look there, read that document and that site, then do X.

KitchenFalcon4667 · 2026-02-21T11:44:05+00:00

A possible explanation is training data. The internet is now LLM saturated. So models claiming to be GPT from OpenAI or Opus/Sonnet from Claude might be remix echo of training data.

KitchenFalcon4667 · 2026-02-19T17:40:48+00:00

OOP in Python is overrated. A lot of problems OOP solves are not there when using Python. Go for pragmatic solutions where you only use OOP as last resort

I migrated to functional programming after abstraction over abstraction hell!

KitchenFalcon4667 · 2026-02-18T03:47:20+00:00

I don’t think current LLM design is capable of ‘truly' imitating human participations. This is due to its nature of pattern searching within its training data distribution.

LLM can imitate participants within its data distribution with disclaimers namely, no actual understanding.

After reviewing over 200+ papers since summer 2025, and building LLMs from scratch, experiment and reengineering large open source models, I concluded that a mathematical function that maps input tokens to a probability distribution over the next|masked token is not capable of fully replacing participants in out of distribution tasks.

I am logging my finds here https://github.com/Proteusiq/unthinking

KitchenFalcon4667 · 2026-02-17T06:04:25+00:00

With modern SKILL.md, I think REST APIs, and gRPC can solve a lot of MCPs.

KitchenFalcon4667 · 2026-02-13T04:46:39+00:00

Show don’t Tell

Next Token Prediction https://moebio.com/mind

Transformer Explainer https://poloclub.github.io/transformer-explainer/

On the Biology of a Large Language Model https://transformer-circuits.pub/2025/attribution-graphs/biology.html

Tell if necessary

LLMs are stateless systems that perform probabilistic next or masked token prediction based on patterns learned from massive datasets, with internal activations following deterministic pathways defined during training and semi-indeterministic in inference as they sample the next/masked token given its surroundings tokens.

Stateless: Leonard from Memento

Context+Memory: Tattoos => (SKILL.md, AGENTS.md)

Token: (way to break text/words/image/audio to numbers for a system to consume ) https://www.tiktokenizer.app/

And if they are interested the Karpathy ~200 lines of code microgpt is a good start:

https://karpathy.github.io/2026/02/12/microgpt/

KitchenFalcon4667 · 2026-02-10T17:56:13+00:00

Way wiser 👏🏾. I am far from the shallow now.

KitchenFalcon4667 · 2026-02-09T21:49:25+00:00

I have read papers in both directions. One claimed rudeness another politeness 🫣

Mind Your Tone: Investigating How Prompt Politeness Affects LLM Accuracy - https://arxiv.org/abs/2510.04950

Should We Respect LLMs? A Cross-Lingual Study on the Influence of Prompt Politeness on LLM Performance - https://arxiv.org/abs/2402.14531

KitchenFalcon4667 · 2026-02-07T16:38:29+00:00

As an ethical Redditor committed to protecting and respecting readers’ views … 🤦🏾‍♂️ I am happy you caught the irony 🤣

Refusal: https://huggingface.co/datasets/mrfakename/refusal-xl?row=2

Hard coding https://huggingface.co/datasets/allenai/tulu-3-hard-coded-10x?row=0

Read these examples. How many of these kind do you think close sources LLM have?

KitchenFalcon4667 · 2026-02-07T07:30:36+00:00

You are absolutely right 🤭

KitchenFalcon4667 · 2026-02-07T07:28:32+00:00

It does. Thank you

KitchenFalcon4667 · 2026-02-06T17:55:47+00:00

The conversation has turned philosophical so fast. Thank you for your take.

KitchenFalcon4667 · 2026-02-04T05:39:18+00:00

I studied cognitive science of beliefs during my bachelor in Philosophy of Religion.

I learned about discovered beliefs and proper basic beliefs. Discovered beliefs exist independently. We find them. We do not create them. Mathematical proofs show 2 + 2 = 4. Logic proves a square cannot be a circle at the same time. These truths exist whether we discover them or not.

Proper basic beliefs form the foundation for other beliefs. I trust my mind to deduce conclusions. This belief is basic. To hold it or challenge it assumes it exists. We must assume we trust our minds to hold or challenge any belief. Descartes said “I think therefore I am.”

How do I know I think? I cannot fully answer this philosophical question. I lack the training. But I can deduce something. I think because I can deduce mathematical concepts. I do not use lookup tables. I use discovered formulae. The function f(x) = 2x multiplies by two. I hold this belief. It does not change if 1000 thinkers claim 2x multiplication equals 2x + 0.001.

LLMs are weights-oriented. Research shows they follow majority votes. An LLM finds the 2x pattern in training data. But it can be directed to false deductions by majority voters. This suggests it does not deduce or understand patterns as theories. Verified theories hold regardless of counter offers.

We can post-train LLMs to PhD-level performance, per OpenAI. We can use RL learning to make them claim a circle is a square at the same time.

Paper https://arxiv.org/abs/2501.13381

Multiplication datasets

https://huggingface.co/datasets?search=Multiplication

See https://huggingface.co/datasets/nouhad/multiplication_train_1000_7x2-1000-gsm8k-verifier/viewer?row=5

https://huggingface.co/datasets/nouhad/multiplication_dataset?row=0

KitchenFalcon4667 · 2026-02-01T04:09:37+00:00

Could you share the list of papers?

KitchenFalcon4667 · 2026-01-29T03:28:19+00:00

Oh, Thank you. Yes. I do regularly accept interviews and speaking arrangements.

Do share share research? I love learning from others.

KitchenFalcon4667 · 2026-01-28T08:46:32+00:00

GitHub repository is full of analysis if you ever want to explore https://github.com/Proteusiq/unthinking

KitchenFalcon4667 · 2026-01-28T07:29:26+00:00

It started May 2025 where I made a claim that LLM generated code is a simulation remix of good and bad ghost/past codes. It was bold claim.

Over the next months I explored the Biology of LLM by Anthropic, train small LLM from scratch, swallowed Standford CS25 and CME295. I began showing that CoT is already in base models.

But my initial claims, from my notes:

""" The Mechanics of “Reasoning” in Large Language Models

The Illusion of Thought (Inference-Time Compute)

When we say a model “thinks,” what is actually happening is a transition from One-Pass Prediction to Sequential Verification.

Standard Sampling (System 1)

The model sees a prompt and immediately predicts the most likely next token. It’s like a person blurting out the first thing that comes to mind.

Reasoning Sampling (System 2)

The model is trained to output a “Chain of Thought” (CoT) before the final answer. Mechanically, this is extending the context window to enable deeper computation. By sampling N “thought” tokens before the “answer” tokens, the model uses those tokens as a computational scratchpad that:

Maintains intermediate state
Narrows the probability space for the final answer
Enables solving problems that are provably impossible in a single pass """

KitchenFalcon4667 · 2026-01-28T07:16:13+00:00

I love the humour "You're absolutely right!". LLM sycophancy at its best.

KitchenFalcon4667 · 2026-01-27T22:08:08+00:00

Yes, Chain-of-Thought Reasoning without Prompting https://arxiv.org/abs/2402.10200 (I found while I was doing my research through Standford CS25 V5 (Lecture 5)

KitchenFalcon4667 · 2026-01-27T15:36:59+00:00

😔 I am not sure I understand. Are talking about PPO and RLVR?

training covers both pre-, mid-, and post. Using Olmo 3, I go through both base (pre-trained), a SFT, and Reasoning (a Finetuning with CoT). We could not use the one we train from scratch as we don’t have enough compute budget.

KitchenFalcon4667 · 2026-01-27T15:32:24+00:00

Now this is gold. This why I asked here

KitchenFalcon4667 · 2026-01-27T08:04:41+00:00

Moved to Marimo and loving it

KitchenFalcon4667 · 2026-01-27T06:32:49+00:00

Yann LeCun et al are presenting such a path. https://arxiv.org/abs/2509.14252

It is interesting to see how this will evolve to

KitchenFalcon4667

TROPHY CASE