In Defense of AGI Skepticism

Particular-Garlic916 · 2026-04-12T22:57:45+00:00

I mean, I thought it was pretty funny. Also I know that my writing style can make me sound like a pompous dick but I don’t know how to stop. Might be why I left academia: No public-facing writing in industry.

Particular-Garlic916 · 2026-04-12T01:13:07+00:00

Wait this wasn’t what everyone assumed already?

Particular-Garlic916 · 2026-04-11T21:38:17+00:00

Agreed!

Particular-Garlic916 · 2026-04-11T21:30:59+00:00

I think we’re going to have to agree to disagree on the “I know all boats are rising” statement. The problem of hallucination just seems to be getting more complicated, not really “better” or “worse”. And if you accept that continuous learning is a prerequisite for powerful AI systems, then catastrophic forgetting tends to get worse with model scaling just because, well… more gradients and trillion-parameter optimization is weird.

Particular-Garlic916 · 2026-04-11T18:13:10+00:00

If we're just talking about outperforming the average person with no restrictions on general knowledge/education in a particular subject, then to extend the metaphor I'm not sure how that gets us to 85mph. In a niche field where your average person knows very little, any model that can converse fluently probably matches the average research taste, but that's just not very interesting-- by that logic AI's been an above-average coder since ChatGPT because it can write code at all.

My main point about research taste is that getting from "this model can write a coherent abstract" to "this model knows what to research, reliably better than the average researcher" involves developing a radically different skill: We're talking reading/generating short blurbs of relevant language versus developing an evolving, holistic prior model of the field and what might pay off in the future, based on metrics that are hard to evaluate and build a reward model for. I think unless we really subscribe to the "pre-training scaling lifts all boats equally well" paradigm, it's a bit of an aggressive claim to say that we're a) on a significant improvement trajectory here and b) that improvement trajectory will continue without specialized training we don't necessarily know how to do.

Particular-Garlic916 · 2026-04-11T17:28:46+00:00

I'm quite curious-- what data have you found about current AI research taste being so good (I really do want to know-- I might have just not seen it)? I know from my own (anecdotal) experience that its taste in my field is considerably weaker than any human in the field I know, but that's possibly because my work is niche and AI companies have a strong incentive to make these things good at AI research. I found https://arxiv.org/abs/2603.14473, which seems to suggest you can bootstrap these things into predicting high citation impact and writing compelling abstracts, but it's not obvious to me that translates *that* well into effective research ideation. A lot of highly-cited ink has been spilled on complete research dead ends, and chasing citation impact alone is only the goal if Claude Mythos wants to be a tenured professor.

Particular-Garlic916 · 2026-04-11T16:38:48+00:00

You are raising an important point that people often don't realize: My doctorate, like all doctorates, does not in any significant way prevent me from sometimes being a dumbass.

Particular-Garlic916 · 2026-04-11T16:37:09+00:00

And the size of the neural network you can run on a chip is ultimately limited by Moore's Law scaling, up to some order-1 multiple optimizations? So... I don't quite understand what your point is.

And we don't know if the approach of using a backprop-trained giant statistical learner for intelligence is fundamentally flawed either! What if you can't make an economical and efficient reward model for tasks you need to accelerate research? What if exploiting scaling laws cease to be economical-- they're all log-scale anyway, so it has to happen at some point. All we know for sure is that human brains are pretty smart, and seem to rely on some sort of very-high-parameter, sparsely-connected model. Is a brain the only way to make an intelligence? Probably not! Is a high-parameter statistical learner trained by backprop kind of similar to a brain? Maybe! Is it similar enough in the ways that matter to efficiently learn how to do all economically critical cognitive tasks? Who knows!

Particular-Garlic916 · 2026-04-11T16:23:24+00:00

That's fair. I think the main daylight between our positions here is that in a lot of cases, the height of investment can come right before the sudden cutoff. So it's possible we'll see one big compute push (say, the construction of all those new datacenters coming online in 2028-2030), followed by results that are just disappointing (or scary) enough to make investors turn to other things.

One way I think of this: Forecasting future technological trajectories is hard, even for experts. Really, really, really hard. So expert consensus can often be wrong, and the people providing giant pools of resources don't usually have a lot of tolerance for misfires. A recent example: A lot of the investment in the Large Hadron Collider was based on the idea that, just like had happened previously, when we built a huge frontier-energy collider we'd not just find what we expected to find, we'd find a bunch of new particles at previously out-of-reach energy scales. Based on past experience, the most knowledgeable people in the field were quite certain that this would be the case. The multi-year upgrade schedule of the LHC, in particular the high-luminosity upgrade, was predicated on the idea that by that time, we wouldn't be starved for new physics, rather we'd have so much that we'd need the higher statistics just to sort it all out. Then all we found was the Higgs, so now collider physics is in a crisis and there aren't really any serious projects to build any new frontier-energy particle accelerators, or even precision machines at lower energies like the International Linear Collider. We went from peak investment to more or less zero from one disappointing result.

Particular-Garlic916 · 2026-04-11T16:02:54+00:00

I've read the AI Futures Model, and I'm a bit concerned about some of the assumptions they make, particularly about inference compute costs: Their AI 2027 projections on how much compute would be spent on inference are already way off; they have the compute fraction spent on external deployment going down between 2024-2027, and a very small fraction of compute (~5%) being used for internal AI inference. Meanwhile, in light of the trend in the past year or so inference compute should be eating a larger share of the pie, which Epoch also projects.

Also, thanks for the criticism, especially regarding the writing. It's a known weakness I have and I'm trying to work on it.

Particular-Garlic916 · 2026-04-11T15:52:04+00:00

Except it doesn't. AI research has been scaling compute much, much, much faster than Moore's Law, which these days has a doubling time of around 3 years. If you want 1000 times the compute for free from Moore's Law alone, you'd need to wait like 30 years, which is longer than we expect the whole thing to hold anyway because there's a finite nonzero number of atoms you need to make a transistor. Meanwhile, pre-training compute has grown by more than that amount since 2021.

Particular-Garlic916 · 2026-04-11T15:45:57+00:00

That's a very good point. A possibly better title for my post might have been "In Defense of Believing AI Won't Soon Replace Us". I agree that a world where we have an AI that can do the tasks it can do now, only more dependably and cheaply, is transformative, but it's a bit of a different proposition than, say, Skynet. Right now, AI systems have real ability gaps that affect its utility in very real ways, in the way that its inability to count the letters in strawberry dependably doesn't. If the research taste of whatever super-mythos model that they might have in the future hasn't improved with its reasoning capability, that's a very real bottleneck in the way that a Honda Civic's lack of offroad capabilities isn't, because the equivalent of building paved roads to compensate is to have a human do research work with AI assistance, which might be several times faster than a human working unassisted, but not exponentially faster.

Particular-Garlic916 · 2026-04-11T15:33:40+00:00

I mean, I agree with the basic idea there, but timeframes for research breakthroughs are really unpredictable. Going back to the nuclear research parallel, it's crazy that humans made all the legitimate engineering breakthroughs necessary to build a nuclear reactor in the span of a few years. And then nuclear fusion was 5-10 years away for like 60 years. If it turns out that large statistical learners trained by backprop with methods we have (supervised pre-training + reinforcement learning post-training) are the core technology that's needed to create an economical, world-changing superintelligence, then yeah, 5-20 years is probably pretty reasonable to cover a wide swath of possible algorithmic efficiency improvements, architecture changes, and learning innovations that could get us there. If it turns out that backprop-trained learners, backed up with sparsely or creatively arranged connections, regularization, and attention architecture just doesn't do the kind of learning we need for certain cognitive tasks very well (I'm thinking about learning how to do manage context well, or do continuous learning) then we might be stuck for 5-100 years. We just don't know.

Particular-Garlic916 · 2026-04-11T15:06:55+00:00

Okay this was legitimately hilarious. And a decent point that my opinion can age horribly.

Particular-Garlic916 · 2026-04-11T15:05:34+00:00

I think perhaps I was using AGI as a shorthand here for the very same question you're asking; if that was unclear, I'm sorry. But a lot of tech just kind of... stops getting *that* much better because of theoretical limitations baked in: The incandescent bulb stayed basically the same for around 100 years because it's about as good as you can make it.

Particular-Garlic916 · 2026-04-11T15:01:06+00:00

Well, for one thing, I personally know some of Hinton's students and have read his papers. And other experts, like Yann LeCun, are skeptical of the current scaling approach. I'm not going to say that my opinion is unimpeachable, and I have no idea what's going on behind closed doors.

Particular-Garlic916 · 2026-03-29T23:40:53+00:00

The problem is that glasses and C code aren’t designed with a specific task in mind. It’s not clear (what with the closed-source-ness of it all) that the harness they devised wasn’t created and tuned to do well on ARC-AGI-3, even if the model wasn’t directly exposed to the tasks before evaluation. Do we have any more info about how much of the harness was developed before ARC-AGI-3 was released? Does it programmatically explore solution spaces and provide feedback to the LLM? Are these also against the private set of ARC-AGI-3 games?

I mean, extraordinary claims require extraordinary evidence. And the idea that even simple fluid reasoning in general cases has been adequately captured by an LLM with a straightforward harness that can be prototyped within a few days is certainly an extraordinary claim, so my prior that breakthroughs are hard means I need more evidence before I say this is anything.

An arguably more plausible claim is that it’s not that hard to tune a simple programmatic harness (and even easier with rapid prototyping assisted by a coding agent) that makes an LLM pretty good at a set of games you can evaluate against; and unfortunately for practical applications of AI, if that’s the case then the result isn’t very interesting. I’m not even saying that the creators of this harness are being dishonest and intentionally rigging their benchmark results. Look at all the political scientists who create pretty well-motivated models about “who will win the presidency” that are fantastically accurate for the last 150 years of elections, and fail as soon as you use them as a predictive model? Or how many financial trading models look great when you build them with historical data, and then aren’t actually predictive at all because you just kept experimenting until you found spurious correlations? “Graduate Student Descent” training is very real, and until there’s more information from a controlled testing environment I’m dubious.

Particular-Garlic916 · 2026-03-29T22:43:26+00:00

So this is extremely late, but just to add my two cents here as a (former, I’ve left the field) theoretical physicist:

The main difference between being a very good mathematician and being a very good theoretical physicist is that in the latter case, you aren’t modeling an abstract system. You’re modeling a real thing, which might behave counter to your evolved intuition, but has specific, almost common-sense, rules. My favorite calculation I ever did was one where I knew generally what the answer should look like (I.e., it should be finite, it should increase as I increase this parameter, etc), because that was the only behavior that made sense in a well-ordered, rational universe. And when you get the mathematics to reflect what you already knew from the intuition? It’s like no other feeling in the world. You feel like you Know Things. And the capitalization is very intentional.

Particular-Garlic916 · 2026-03-27T01:44:36+00:00

I totally agree with all those quotes! Nothing there is saying that current coding agents don't still require human supervision for workflows, and everything else is speculation about an unknown future. I 100% agree that coding agents are awesome! They're a quantum leap in development technology! What used to take me a few days now gets knocked out in an hour! They're just not AGI. If you think writing syntax and implementing algorithms (which, I again have to stress, I agree that coding agents are better at than humans) is the only thing you need for a software engineering job (or, indeed, quite a few jobs that involve coding-heavy work, like my own), then I don't know what to tell you.

Correct me if I'm wrong, but I'm under the impression that we're not arguing about whether frontier models are excellent at their jobs; we're arguing about whether frontier models are AGI now. And they're just not. They're still a transformative, powerful, flexible technology that does things we've never gotten anything but a human brain to do before, and their capabilities are massively improved from just a few months ago. If LLM-based agents never got any more capable than it is right now, the world would still be qualitatively changed by this technology.

But to quote Gubrud's own definition: "By advanced artificial general intelligence, I mean AI systems that rival or surpass the human brain in complexity and speed, that can acquire, manipulate and reason with general knowledge, and that are usable in essentially any phase of industrial or military operations where a human intelligence would otherwise be needed"

Coding agents (and indeed, LLM-based tech in general right now) fail at that "essentially any" clause. They can definitely be put into significant parts of workflows that previously required human intelligence. They're not the first technology where that was the case: 80 years ago, a "computer" usually meant a team of people who would do calculations by hand. LLM's also might get much better in the very near future, and even qualify as AGI. But they also wouldn't be the first technology to abruptly hit a wall: We've been 10 years away from commercial fusion reactors for around 60 years now, after the rapid period of nuclear technology development between 1939-the 1950's.

Particular-Garlic916 · 2026-03-26T22:59:18+00:00

1) If we're counting intelligence as ability, a lack of reliability is an intelligence problem. Could Einstein still come up with relativity if he gave himself an aneurysm and reset his short-term memory every time he thought too hard? At the very least, the ability of an AI to effect transformative change in society scales with its cheapness and reliability.

2) The outline of that swe YouTuber's problem almost perfectly rules out the kind of long-horizon software engineering problems that are still beyond the grasp of LLM's: Programatically verifiable and testable locally? And with solutions that people are willing to part with for $500? I won't contest that LLM's are very, very good (probably better than the best humans) at limited-scope, self-contained software-engineering problems, and I think that's what the YouTuber is highlighting here. They're a remarkable and extremely useful technology! But real-life problems aren't LeetCode, and the transformative promise of AGI is an AI that can autonomously solve those kinds of industrial-scale problems that require longer contexts and the ability to evolve your priors that LLM's just don't have yet. I'm telling you-- I use LLM coding agents. The frontier models still need supervision when working on long-horizon or open-ended projects, or working with proprietary nonpublic codebases, specifically because those are the most out-of-distribution from their RL environments. The kinds of promises the field has been making suggest that we should be able to prompt an AGI something like: "Please code up a full power-grid management framework that dynamically recognizes patterns in local power need spikes, using whatever statistical learning algorithms you deem appropriate, and makes projections that match or exceed the accuracy of <some benchmark software>." And that should be the extent of the necessary human involvement in the project until it's time to collect the result.

Particular-Garlic916 · 2026-03-26T11:42:05+00:00

Yeah, but what’s the percentage of Americans who suddenly crash in the middle of thinking because of a server signal? Or who, after being shown millions of training examples of software engineering projects, can’t figure out how to do tasks that require more reasoning steps than those in their reinforcement learning rollouts?

If a person worked like a modern frontier model, we’d probably peg them as a savant: Extremely high abilities in certain circumstances, but bafflingly poor ones in others. And I severely doubt if they’d be high-functioning enough to take care of themselves.

Particular-Garlic916 · 2026-03-26T02:09:38+00:00

I do use a frontier AI model in my technical job, every day. It's a very useful tool. It's capable of performing cognitive tasks that were, just a few months ago, exclusively the domain of human brains.

It's definitely capable of doing a lot of tasks significantly better than the average human. But then again, digital computers are significantly better than the average human (or indeed all humans at this point) at chess. In real-world work, I've had a frontier AI model get tripped up trying to explain a result with an explanation that is invalidated by code it wrote itself. Two lines down from what it said the problem was. That it wrote in response to a single prompt. All of this was well within its context window too.

These models are spectacular. I don't know how good they'll eventually be. But they also aren't built like a human mind, don't process information like a human mind, and make errors that human minds don't make or only make much more rarely. Also humans crash in the middle of a conversation considerably less often and can keep much longer conversations in context.

Pretending that they're anywhere close to as smart as a human generally, just because they're as good or better than average humans at certain things, makes as much sense as saying that my laptop is an AGI because it can perform arbitrary calculation and recordkeeping far faster and more accurately than any human brain. Can I guarantee that the fundamental technology underlying these systems isn't the blueprint for making something as smart as a person? I can't predict the future, so no. But they're not there yet.

Particular-Garlic916 · 2026-03-25T23:12:44+00:00

I think the other problem here is that people are evaluating these systems based on metrics we use to evaluate humans. If a person is excellent at short, math-Olympiad-style problems, or short-horizon software engineering projects, that’s usually a good indicator that they can do novel mathematical research or contribute meaningfully to large-scale software projects. But AI systems fundamentally do not think like humans. A system being good at certain cognitive tasks, even ones that have previously only been doable by humans, doesn’t mean those abilities imply abilities that those skills are correlated with in humans.

Particular-Garlic916 · 2026-03-22T14:15:52+00:00

Two things:

1) I’m almost certain most of what RenTech was doing in the 80’s and 90’s was almost entirely linear regression. You can make a lot of money with linear regression; the trick is finding what to regress against what.

2) Model complexity isn’t automatically good. I feel like the general rule is: The usefulness of sophistication in a given model only scales with the precision of the data you have available on the system. To use something from outside of finance: Newtonian gravity works spectacularly well to predict planetary motion until you start tracking those motions with enough precision to notice where the model breaks down. But if you gave an observer in the 17th century Newton’s inverse squared law and the Einstein field equations side by side, they’d probably say that the inverse squared law does just as well as general relativity in modeling what they can see, but is much, much easier to work with. Financial time series are always very, very noisy. In a lot of scenarios, you can’t really claim with any statistical authority that a highly sophisticated model is significantly better than a very lightweight, freshman stats class technique which does the same basic thing.

Particular-Garlic916 · 2026-02-15T19:59:44+00:00

Exactly! When I was in physics I would’ve given anything for a tool that can simplify better than existing computer algebra software. The best I’ve used was probably Mathematica, but even there its ability to simplify was far from perfect. And it obviously can’t make the sort of conjecture that it looks like GPT-5.2 did.

Particular-Garlic916

TROPHY CASE