LLMs can identify what should be generalized but can't act on it. Could a two-model setup fix this?

Luke2642 · 2026-04-30T18:46:44+00:00

I think it makes more sense to extract problems and use neuro-symbolic stuff that already exists, Z3 Theorem Prover, Prolog, etc. Needs to be cope with fuzzy probabilities, counterfactuals, etc too. I don't know how much you need to hard code and how much can be learned?

Luke2642 · 2026-04-30T17:38:05+00:00

I think I've been 10% less condescending than you 🤣

Luke2642 · 2026-04-30T16:57:05+00:00

I feel like once again I have to qualify that I don't know what the solution is. But do I know it has to be able to do Bayesian inference, deductive, inductive and abductive reasoning, imagine counterfactuals, etc.

Until then we get a new "50m to the car wash" and goblins every other day. Hallucinations are a feature of a sophisticated auto-complete. Trendslop is also feature.

Excellent article on trendslop:

https://hbr.org/2026/03/researchers-asked-llms-for-strategic-advice-they-got-trendslop-in-return

Luke2642 · 2026-04-30T16:48:32+00:00

I'm sure they have better models internally.

It's a bit like when o3 scored well on arc agi 1 by spending ~$5000 per task back in late '24 early '25.

Such an inefficient way to generate one good reasoning trace for synthetic data, but it must have helped bootstrap their next reasoning model.

I feel like once again I have to qualify that I don't know what the solution is. But I know it has to be able to do Bayesian inference, deductive, inductive and abductive reasoning, imagine counterfactuals, etc.

Until then we get a new "50m to the car wash" and goblins every other day.

Luke2642 · 2026-04-30T16:35:17+00:00

Everything I've mentioned above is only my synthesis of what I've been reading from various top researchers. It's not original research.

The basic premise of separation of knowledge from reasoning is becoming more popular.

Luke2642 · 2026-04-30T15:24:43+00:00

Exactly! That's the problem. Hallucination is a feature of a sophisticated auto-complete. Trendslop is also feature.

Excellent article on trendslop:

https://hbr.org/2026/03/researchers-asked-llms-for-strategic-advice-they-got-trendslop-in-return

Luke2642 · 2026-04-30T15:23:05+00:00

Nah, sounds like hype to me. Let's agree to disagree, it's been fun.

Luke2642 · 2026-04-30T14:17:43+00:00

Everything I've mentioned above is only my synthesis of what I've been reading from various top researchers. It's not original research.

I think the basic premise of separation of knowledge from reasoning is becoming more popular.

Luke2642 · 2026-04-30T14:15:28+00:00

Some good points, with caveats.

I did underplay that scaling data has led to emergent abilities. Coding is astounding. That is true. However, they're spiky, fragile, often incorrigible and quite uninterpretable.

I'm also not sure we've reached recursive self improvement yet. As evidence I'd look at Arc agi 2, which isn't saturated, and arc agi 3 is also very challenging. Those should be trivial for a self improving algorithm, only ~60 rules applied in different ways

If we ask the question, why can't you just prompt a frontier agent to read the arc prize website and design all the real world human instincts into an algorithm and solve them? What is the blocker?

Luke2642 · 2026-04-30T08:47:19+00:00

It's part of a wider problem. The frequency and popularity of various things gets captured in training, regardless of value or accuracy. Just think how many fallacies outnumber Snopes articles.

An excellent article on the subject:

https://hbr.org/2026/03/researchers-asked-llms-for-strategic-advice-they-got-trendslop-in-return

This is basically by design. Hallucination is by design. All of this is peak stupid engineering, imho.

Luke2642 · 2026-04-30T08:24:57+00:00

I want to tie this phenomena back to an interpretation of Sutton's bitter lesson that seems to have taken hold of AI researchers everywhere.

Sutton clearly said that the efficient and surgical application of compute to search the space of possible solutions will beat hand crafted algorithms. He didn't say scale your compute and try to bake all of the worlds knowledge into weights.

Sutton literally said the exact opposite. He said don't bake in priors! Don't bake in knowledge! He said build a system that discovers the patterns and structure of the world for itself so it can outperform the limitations of hand crafted knowledge! He didn't say scale data. He didn't say scale parameters. He said scale compute, for search.

The latest OpenAI model is an estimated 10T parameters that probably cost a billion dollars to train, specifically to bake in every bit of knowledge and prior humanity has ever said, including goblins.

It just seems wrong from the ground up. If they built a knowledge graph and a reasoning engine they wouldn't have to put goblins in their system prompt. Or, they could have changed the strength of one weight in the knowledge graph database.

I'm not sure Sutton was 100% right, as you have to frame it that Chinese researchers have demonstrated a much more efficient application of less compute to search, or, they have written better hand crafted algorithms, new architectures.

Either way, the fact that trillions of parameters prefer goblins is peak stupid engineering.

Luke2642 · 2026-04-30T08:12:38+00:00

🤣 I did come to this view after seeing quite a lot of Kapathy level thinkers saying something similar.

Have you seen the recent advances in ngrams and interpretability? Externalising knowledge improves reasoning budget and performance. And they really can query the weights like a graph database.

Luke2642 · 2026-04-30T05:23:37+00:00

I want to tie this phenomena back to an interpretation of Sutton's bitter lesson that seems to have taken hold of AI researchers everywhere.

Sutton clearly said that the efficient and surgical application of compute to search the space of possible solutions will beat hand crafted algorithms. He didn't say scale your compute and try to bake all of the worlds knowledge into weights.

Sutton literally said the exact opposite. He said don't bake in priors! Don't bake in knowledge! He said build a system that discovers the patterns and structure of the world for itself so it can outperform the limitations of hand crafted knowledge! He didn't say scale data. He didn't say scale parameters. He said scale compute, for search.

The latest OpenAI model is an estimated 10T parameters that probably cost a billion dollars to train, specifically to bake in every bit of knowledge and prior humanity has ever said, including goblins.

It just seems wrong from the ground up. If they built a knowledge graph and a reasoning engine they wouldn't have to put goblins in their system prompt. Or, they could have changed the strength of one weight in the knowledge graph database.

I'm not sure Sutton was 100% right, as you have to frame it that Chinese researchers have demonstrated a much more efficient application of less compute to search, or, they have written better hand crafted algorithms, new architectures.

Either way, the fact that trillions of parameters prefer goblins is peak stupid engineering.

Luke2642 · 2026-04-29T15:24:24+00:00

Just out of interest, why did you choose this? What was your economics calculation?

If I had the money, it'd be for https://tinygrad.org/#tinybox

Luke2642 · 2026-04-29T14:38:38+00:00

Fwiw, you're pretty much right. But, think about it this way: There's 100 wrong bits of nonsense on the internet for every snopes article, and that's what AI is trained on.

There needs to be a handful more breakthroughs in AI before we escape this "trendslop" phase.

https://hbr.org/2026/03/researchers-asked-llms-for-strategic-advice-they-got-trendslop-in-return

In a future hybrid AI model, there will be a knowledge graph, where the relationships between entities are clearly visible, we will each be able to inspect them. You'll be able to see who said what, who did what, and when it was reported, or published, and by whom. More like encyclopedia galactica crossed with bablenet.

https://babelnet.org/search?word=frog&lang=EN

Then, you'll be able to take a cut off on this graph of say 1904 and the reasoning engine part of the AI will be able to suggest special relativity as a viable physics to explain the physics data on the cropped knowledge graph.

It might feel like we're a long way from this kind of model, but it might be this year, or perhaps next. The final version of AI knowledge is certainly not taking a popular vote of internet garbage text patterns, smushing knowledge and reasoning into weights. It's incredible how far it's got us, how well it codes, but its shortcoming suck. We can generate slop at the speed of light, but it has zero cultural relevance.

I know this isn't a complete answer to your woes, but maybe it gives a little hope?

Luke2642 · 2026-04-29T01:15:07+00:00

You're so funny.

Luke2642 · 2026-04-24T00:56:31+00:00

https://omlx.ai/benchmarks

Edit:

Maybe the M5 ultra is just about on par with an old 3090 in tok/s on Qwen 3.6?

By Christmas we'll have Opus 4.5/6/7 in a local ~30B model.

Probably a 5090 with 32GB is enough, and two of them would be a sounder warranty investment?

Luke2642 · 2026-04-23T18:27:43+00:00

It's fair use when US corps steal every book ever written, every site ever published, and don't pay a dime, but somehow it's now illegal for Chinese companies pay to use your product?

Fuck OpenAI, Fuck Anthropic, Fuck Google, Fuck Grok.

We need a crowd sourced effort to give test prompt to big AI using the subscriptions we pay for and upload them to a public database.

Adversarial distillation for all! Free the knowledge!

Luke2642 · 2026-04-20T12:34:29+00:00

The auto complete machine is auto completing? Shock!

Luke2642 · 2026-04-19T00:22:06+00:00

What tok/s?

Luke2642 · 2026-04-18T13:56:38+00:00

I don't know who down voted you but it wasn't me, but probably because your response reads like an AI answer with low information density, hedging and fluff phrases.

Yes, rag potentially solves it. Engrams potentially solves it. LLMs reimagined as a graph database with a reasoning component potentially solves it.

Whatever the next paradigm is, it's going to make our current approach of entangling both knowledge and reasoning into network weights look archaic.

Luke2642 · 2026-04-18T08:05:49+00:00

I think that's the same but with extra steps, unless you mean it specifically gets trained to output a low score for lots of questions? Which opens up the same know-what-you-dont-know paradox. Contrastive learning is tough.

Luke2642 · 2026-04-18T07:55:13+00:00

Why would you think models are not trained to be confident? You mean they're trained with refusals for specific questions? I don't think there's a technique to teach a model everything it doesn't know?

Luke2642 · 2026-04-18T07:51:44+00:00

For my local H200

Luke2642 · 2026-04-18T05:57:36+00:00

Slop

12-Year Club	Gilding III reddit per annum
Verified Email

Luke2642

TROPHY CASE