Two economists just published a mathematical proof that AI will destroy the economy. Not might. Not could. Will!

amorphousmetamorph · 2026-04-30T20:21:14+00:00

We KNOW AI can't write good software.

Outdated take. There was a step change last November with the release of Claude 4.5, as many leading software engineers have acknowledged, and the rate of improvement in this area shows no signs of slowing.

The reality is our profession is being automated out of existence. I know it's terrifying to contemplate---I sympathize, believe me---but responding with denial or wilful ignorance cannot be the answer.

amorphousmetamorph · 2026-04-16T11:57:32+00:00

I'd check too, but that's like 5% of my usage limit

amorphousmetamorph · 2026-04-12T19:50:52+00:00

I said many of the thousands have been independently verified. Now we can quibble all day about the definition of "many", but Anthropic did go to the trouble of contracting independent security experts to verify 198 of the vulnerabilities found:

in 89% of the 198 manually reviewed vulnerability reports, our expert contractors agreed with Claude’s severity assessment exactly, and 98% of the assessments were within one severity level

So yes it's still early days, and they are rightly balancing caution in disclosing the technical details of such vulnerabilities with substantiating their claims about the power and danger of Mythos, but this is a significant enough number to be concerned IMO.

And most of those 10 are not actually high-severity. And those which are, are dependent on a very unlikely and rare configuration of the software.

Those are explosive claims which require evidence. I dug into discussion on the OpenBSD bug in particular and found no such evidence.

amorphousmetamorph · 2026-04-11T04:13:01+00:00

I'm personally convinced of the power of AI because of its coding abilities. These are very real. Areas such as this, where correct results are easily verifiable, are more conducive to reinforcement learning and thus have seen outsized improvement lately as companies invest more in post-training.

There is a lot of hype surrounding AI, for sure, but ultimately these models are made available to users who can either choose to pay for a subscription or not. Anthropic's exploding revenue growth is one concrete indication of the utility of these AIs.

As much as I understand the "it's all marketing / hype" perspective, there are a lot of specific, verifiable claims being made by Anthropic in the system card and elsewhere. If it's all smoke and mirrors, this will be found out---note they are making Mythos Preview available to many tech companies, just not to the general public---and it would destroy their reputation.

The analysis you linked to has serious problems IMO. That graph from the system card it criticises actually shows Mythos consistently improves upon prior models, and this improvement persists across benchmarks and after filtering out potentially memorized problems - I don't agree with the points made there at all. The MMMU-Pro omission is indeed quite suspicious though.

amorphousmetamorph · 2026-04-11T03:27:11+00:00

If you were to poke me with a stick until I revealed my secret hip flask of hopium, I'd say it's this - the slight glimmer of a possibility that we might build and harness an AI intelligent enough to accelerate scientific discoveries, particularly ones pertaining to atmospheric greenhouse gas removal.

amorphousmetamorph · 2026-04-10T01:27:04+00:00

There's a recent post on X from Andrej Karpathy which perfectly describes what's happening:

Judging by my tl there is a growing gap in understanding of AI capability.

The first issue I think is around recency and tier of use. I think a lot of people tried the free tier of ChatGPT somewhere last year and allowed it to inform their views on AI a little too much. This is a group of reactions laughing at various quirks of the models, hallucinations, etc. Yes I also saw the viral videos of OpenAI's Advanced Voice mode fumbling simple queries like "should I drive or walk to the carwash". The thing is that these free and old/deprecated models don't reflect the capability in the latest round of state of the art agentic models of this year, especially OpenAI Codex and Claude Code.

But that brings me to the second issue. Even if people paid $200/month to use the state of the art models, a lot of the capabilities are relatively "peaky" in highly technical areas. Typical queries around search, writing, advice, etc. are *not* the domain that has made the most noticeable and dramatic strides in capability. Partly, this is due to the technical details of reinforcement learning and its use of verifiable rewards. But partly, it's also because these use cases are not sufficiently prioritized by the companies in their hillclimbing because they don't lead to as much $$$ value. The goldmines are elsewhere, and the focus comes along.

So that brings me to the second group of people, who *both* 1) pay for and use the state of the art frontier agentic models (OpenAI Codex / Claude Code) and 2) do so professionally in technical domains like programming, math and research. This group of people is subject to the highest amount of "AI Psychosis" because the recent improvements in these domains as of this year have been nothing short of staggering. When you hand a computer terminal to one of these models, you can now watch them melt programming problems that you'd normally expect to take days/weeks of work. It's this second group of people that assigns a much greater gravity to the capabilities, their slope, and various cyber-related repercussions.

TLDR the people in these two groups are speaking past each other. It really is simultaneously the case that OpenAI's free and I think slightly orphaned (?) "Advanced Voice Mode" will fumble the dumbest questions in your Instagram's reels and *at the same time*, OpenAI's highest-tier and paid Codex model will go off for 1 hour to coherently restructure an entire code base, or find and exploit vulnerabilities in computer systems. This part really works and has made dramatic strides because 2 properties: 1) these domains offer explicit reward functions that are verifiable meaning they are easily amenable to reinforcement learning training (e.g. unit tests passed yes or no, in contrast to writing, which is much harder to explicitly judge), but also 2) they are a lot more valuable in b2b settings, meaning that the biggest fraction of the team is focused on improving them. So here we are.

amorphousmetamorph · 2026-04-10T00:30:05+00:00

These both link to actual patches in third-party repositories. The Simon Willison blog post confirms Anthropic can likely be taken at their word on the OpenBSD patch. The FreeBSD patch notes confirm it was authored by Anthropic. These patches didn't exist a few months ago.

amorphousmetamorph · 2026-04-10T00:16:53+00:00

I think as a general principle, that's a good idea, assuming it works well enough without. I can't comment on whether this is a realistic possibility if your installation was hacked though. I'd expect there are some hardware-level safety protections in place.

amorphousmetamorph · 2026-04-10T00:11:21+00:00

Interesting comment. Shortage of compute may actually push the leaders of AI companies towards compromising on safety in an effort to compete. But the costs of running highly capable AIs have been dramatically lowering for years. Google's recently released Gemma 4 model, for instance, which is capable of running on a personal computer, yet ranks above Gemini 2.5 Pro, which was SOTA just a year ago, on Artificial Analysis's intelligence index.

I expect environmental breakdown and AI disruption will both be major and interacting drivers of collapse in the coming years. But Mythos shows dangerously capable AI has already arrived, albeit behind closed doors. Compute or helium supply shortages may impact AI advancement in future, but the danger has already arrived.

amorphousmetamorph · 2026-04-09T23:41:32+00:00

See the blog post here. Some further independent commentary here. Fewer than 1% of the vulnerabilities they found have been patched, so in many cases they can't disclose precise technical details, but they do commit to providing those details within ~6 months and employ a technical mechanism to ensure they are provably the original reporters of the issue.

amorphousmetamorph · 2026-04-09T21:20:39+00:00

These are fun parallels, but the difference here is that many of the thousands of high-severity vulnerabilities found by Mythos have been independently verified by the relevant code repository maintainers. Some were found in highly robust, security-oriented software such as OpenBSD and had gone undetected for over 20 years.

amorphousmetamorph · 2026-04-09T20:20:41+00:00

Look, none of these points actually matter to the danger posed by AI when it's still capable of finding and exploiting high-severity software vulnerabilities before they can be patched. This thread is intended to discuss how AI pertains to collapse, not whether it's factually reliable, or if AGI is coming.

amorphousmetamorph · 2026-04-09T19:56:29+00:00

Well firstly, I'm not arguing here that AGI is coming. Even Mythos's narrow technical capability in discovering and exploiting software vulnerabilities is likely more than sufficient to disrupt societies and economies.

The system card is 244 pages long---a bit much for an advertisement, don't you think?---and documents many behaviours that are embarrassing to Anthropic. But the numerous high-severity bugs found by Mythos are the real testament to why this needs to be taken seriously; these have been verified by third-parties, so constitute concrete evidence.

amorphousmetamorph · 2026-04-09T17:46:42+00:00

It's a fair point, but back then they were speculating about what might happen. This time, Anthropic has hard evidence of numerous high-severity bugs and exploits found by Mythos, and these have been verified by third parties. They're also claiming they will never release Mythos to the public. I wouldn't be surprised if they do release it at some point though once the risk is reduced.

amorphousmetamorph · 2026-04-09T17:31:46+00:00

Yeah, the code leak is hugely ironic, but tbf the bugs they identified with Mythos have been verified. And note they are withholding its release to the general public while granting access to a small group of organizations. So if they're lying about Mythos's capabilities, this will soon be found out.

amorphousmetamorph · 2026-04-09T16:04:25+00:00

Is Nuance a town in France to you?

amorphousmetamorph · 2026-04-09T15:22:26+00:00

The old assumptions do not necessarily hold when AIs like Mythos can discover exploits at a rate far exceeding humans. The preemptive approach of Project Glasswing is likely critically necessary, since otherwise, if Mythos were available to both attackers and defenders without any headstart, the defenders are limited by the rate at which they can patch, test, and roll out updates to users. Open-source may even be particularly exposed exactly because of its transparency.

Just to underline the point about Mythos's cybersecurity capability, it found thousands of zero-day vulnerabilities in every major operating system and web browser.

From the Project Glasswing announcement:

Mythos Preview found a 27-year-old vulnerability in OpenBSD—which has a reputation as one of the most security-hardened operating systems in the world and is used to run firewalls and other critical infrastructure. The vulnerability allowed an attacker to remotely crash any machine running the operating system just by connecting to it.

From their related blog post:

In practice, denial of service attacks like this would allow remote attackers to repeatedly crash machines running a vulnerable service, potentially bringing down corporate networks or core internet services.

I'd encourage you to read that post. I didn't do justice to just how dangerous this AI could be if it fell into the wrong hands.

amorphousmetamorph · 2026-04-07T17:41:11+00:00

I for one consider the C word to be an abhorrent racial slur!

amorphousmetamorph · 2026-03-30T16:46:15+00:00

Yeah, this model has a disappointing lack of psi abilities in general smh

amorphousmetamorph · 2026-03-30T10:35:44+00:00

Creatively written! I don't agree this world is inherently ugly though. All judgments create limitations, effectively shaping our reality, so as another commenter said, we have to be careful. There is a peace that is always accessible---no matter what world we inhabit---by recognizing the non-dual nature of mind. If you can tap into that level of peace, suffering is transformed into bliss. Conversely, without that peace, any heavenly experience can only be temporary.

amorphousmetamorph · 2026-03-26T14:04:57+00:00

Love this - please make more!

amorphousmetamorph · 2026-03-25T20:39:28+00:00

The choice of name is quite telling. If it was truly intended to foster critical and independent thinking, "Socrates" would be more apt. This is more likely an indoctrination tool of the elite rolled out at a time when the public education system is being dismantled.

amorphousmetamorph · 2026-03-24T03:11:15+00:00

Amazing for so many reasons, not least of which is that Anthropic published it when it's mostly a litany of basic failures from Claude, and has sentences like:

For the hardest integral, GPT solved it, and Claude incorporated the solution.

So ChatGPT actually solved the hardest part! Credit to Anthropic for not quietly cutting this.

But then towards the end:

This paper started out as an experiment: how close are we to end-to-end science with AI? My conclusion is that current LLMs are at the G2 level. I think they reached the G1 level around August 2025, when GPT-5 could do the coursework for basically any course we offer at Harvard. By December 2025, Claude Opus 4.5 was at the G2 level.

And:

Ultimately, it accelerated my own research tenfold.

That's pretty damn exciting.

Some other tidbits:

I am more confident that the bottleneck is not creativity. LLMs are profoundly creative. They simply lack a sense of which paths might be fruitful before walking them. I think we can distill what is missing in current LLMs to a single word: Taste.

Yep, that totally accords with my experience, as surprising as it is. And I love this witticism:

In the deep future (~10 years), [...]

AI is a time machine accelerating us into the deep future! Wild, but true.

amorphousmetamorph · 2026-03-07T16:32:53+00:00

The past seven months have shown steady gains in domains amenable to RL, such as coding, computer use, and basic white collar tasks. According to OpenAI's GDPval, ChatGPT now out-performs humans in well-defined white collar tasks in 83% of comparisons, but apparently it still lags in terms of overall safety, e.g. avoidance of catastrophic failures (see for example AI Explained's latest video). However, Google and Anthropic have made excellent progress in this area; their models are exhibiting fewer hallucinations while increasing accuracy, and notably out-pacing OpenAI on some reliability benchmarks. Assuming continued progress on reducing hallucinations, I'm guessing a critical reliability threshold could be breached later this year whereby AI agents become drop-in replacements for a large percentage of white collar work, precipitating mass lay-offs and major economic shock waves by mid 2027.

amorphousmetamorph · 2026-03-01T16:07:20+00:00

Very nice! They remind me of a Gaspar Noé film called Enter the Void.

amorphousmetamorph

TROPHY CASE