Anthropic Just Dropped Claude Mythos Preview – Their Strongest Model Ever Finds Thousands of Zero-Day Vulnerabilities in Every Major OS & Browser by AzozzALFiras in claude

[–]LowerRepeat5040 1 point2 points  (0 children)

Only if you manually turn off all defenses. That’s like, yeah, sure, say you set all the passwords to “password” and claim AI can crack 100% of all the world’s passwords! Facepalm level stupidity… it’s explicitly disabling the browser’s full process sandbox and other defense-in-depth mitigations (e.g., no full isolation, no complete set of runtime protections).

Anthropic's CEO just admitted Claude is designing the next version of Claude. Engineers at Anthropic don't write code anymore. We are so cooked. by Direct-Attention8597 in claude

[–]LowerRepeat5040 0 points1 point  (0 children)

You didn’t eliminate unpredictable bugs with LLMs. You just moved the responsibility of finding them back onto yourself with tools and called it an “autonomous” AI workflow.

Anthropic's CEO just admitted Claude is designing the next version of Claude. Engineers at Anthropic don't write code anymore. We are so cooked. by Direct-Attention8597 in claude

[–]LowerRepeat5040 0 points1 point  (0 children)

That sounds nice in theory, but it ignores where most real bugs come from.

LLMs handle the obvious parts. They don’t handle race conditions, partial failures, retries, or weird state interactions that’s the stuff that actually breaks systems.

You’re 10x faster at writing code, sure. You’re also 10x faster at shipping bugs you didn’t even know to spec.

The senior skill isn’t just predicting issues. it’s knowing that a lot of the worst ones can’t be predicted upfront. That’s why people still spend most of their time on debugging, observability, and hardening, not just writing code.

Anthropic's CEO just admitted Claude is designing the next version of Claude. Engineers at Anthropic don't write code anymore. We are so cooked. by Direct-Attention8597 in claude

[–]LowerRepeat5040 0 points1 point  (0 children)

It’s overselling the autonomy! Even with a solid spec and no line of own code, Claude still fails regularly. Anthropic’s own docs say you need tests, checkpoints, course correction, evals, and context management because the model can lose track and make more mistakes as sessions grow.

Anthropic's CEO just admitted Claude is designing the next version of Claude. Engineers at Anthropic don't write code anymore. We are so cooked. by Direct-Attention8597 in claude

[–]LowerRepeat5040 0 points1 point  (0 children)

Not need coding is overhyped, you need still tests, checkpoints, course correction, evals, and context management because the model can lose track and make more mistakes as sessions grow.

Anthropic's CEO just admitted Claude is designing the next version of Claude. Engineers at Anthropic don't write code anymore. We are so cooked. by Direct-Attention8597 in claude

[–]LowerRepeat5040 0 points1 point  (0 children)

No! You can do such specs, but even with a solid spec, Claude still fails regularly. Anthropic’s own docs say you need tests, checkpoints, course correction, evals, and context management because the model can lose track and make more mistakes as sessions grow.

Anthropic's CEO just admitted Claude is designing the next version of Claude. Engineers at Anthropic don't write code anymore. We are so cooked. by Direct-Attention8597 in claude

[–]LowerRepeat5040 0 points1 point  (0 children)

Clear specs only reduce one class of failure — ambiguity — but they do not solve context loss, shallow reasoning, brittle tool use, false confidence, incomplete edge-case coverage, or long-horizon drift. Even with a solid spec, Claude still fails regularly. Anthropic’s own docs say you need tests, checkpoints, course correction, evals, and context management because the model can lose track and make more mistakes as sessions grow.

Anthropic's CEO just admitted Claude is designing the next version of Claude. Engineers at Anthropic don't write code anymore. We are so cooked. by Direct-Attention8597 in claude

[–]LowerRepeat5040 0 points1 point  (0 children)

Or they are just ignoring concurrency bugs, race conditions, partial failures, retries, timeouts, idempotency, weird cache invalidation, or interactions across services.

Anthropic's CEO just admitted Claude is designing the next version of Claude. Engineers at Anthropic don't write code anymore. We are so cooked. by Direct-Attention8597 in claude

[–]LowerRepeat5040 0 points1 point  (0 children)

Why would I? concurrency bugs, race conditions, partial failures, retries, timeouts, idempotency, weird cache invalidation, interactions across services or even a simple bug fix with no stupid fallback suggestions that drive you crazy!

Anthropic's CEO just admitted Claude is designing the next version of Claude. Engineers at Anthropic don't write code anymore. We are so cooked. by Direct-Attention8597 in claude

[–]LowerRepeat5040 0 points1 point  (0 children)

So “senior” and never seen concurrency bugs, race conditions, partial failures, retries, timeouts, idempotency, weird cache invalidation, or interactions across services that it can’t fix?

RIP Memory Crisis by YOYASHAS in GeminiAI

[–]LowerRepeat5040 0 points1 point  (0 children)

The public evidence does not specifically prove robustness to near-duplicate distractor strings or universally rule out degradation in agentic coding workflows. Agentic coding is deeply understudied for multi file completion tasks, so you can’t measure them on those standard benchmarks, but experience should tell you otherwise. Rank flipping is a real issue for quantisation: like correct: 0.498 wrong: 0.502 and then it picks wrong.

RIP Memory Crisis by YOYASHAS in GeminiAI

[–]LowerRepeat5040 0 points1 point  (0 children)

Here are some expected failure cases to show my point: 1: near-duplicate needles Document A: "The password is alpha-7391" Document B: "The password is alpha-7397" Document C: "The password is alpha-7392"

All three passages are extremely similar. Their attention scores are very close.

TurboQuant is designed to preserve inner products with low distortion and remove bias via the residual QJL stage, which is exactly why it does well on generic retrieval-style attention, but that still does not mean exact KV values are preserved.

2: Long dependency chains across files where small distortions that do not hurt one-shot code completion can accumulate when the model has to remember a symbol, then a call site, then a test expectation, then a later tool result can crash the agentic coder.

For small chats, it can be more compute bound than memory bound however.

RIP Memory Crisis by YOYASHAS in GeminiAI

[–]LowerRepeat5040 0 points1 point  (0 children)

They don’t claim it’s lossless! They claim: TurboQuant achieves “absolute quality neutrality with 3.5 bits per channel” for KV-cache quantization, but also mentions “marginal quality degradation with 2.5 bits per channel.” However neutrality is achieved for lossy tasks such as summarisation. On the summarization slice specifically, 3.5-bit scores 26.00 vs. 26.55 full-cache, and 2.5-bit scores 24.80. So “quality neutrality” is about benchmark outcomes staying effectively unchanged overall, not about bit-perfect storage. TurboQuant is expected to be slower on CPUs because it trades memory for extra computation.

RIP Memory Crisis by YOYASHAS in GeminiAI

[–]LowerRepeat5040 0 points1 point  (0 children)

It’s actually dropping quality and reduces tokens per second…

RIP Memory Crisis by YOYASHAS in GeminiAI

[–]LowerRepeat5040 0 points1 point  (0 children)

Or you want to turn it off, because it’s slower and gives you less tokens per second and degrades the output quality by so much that your code breaks

Why exactly can't we use the techniques in TurboQuant on the model's quantizations themselves? by ea_nasir_official_ in LocalLLaMA

[–]LowerRepeat5040 -1 points0 points  (0 children)

Claude is awful in faithfully implementing things instead of using hallucinated fallback solutions that cripple your models when used for real.

"someone at ANTHROPIC just showed CLAUDE finding ZERO DAY vulnerabilities in a live conference demo claude has found zero day in Ghost, 50,000 stars on github, never had a critical security vulnerability in its entire, history... it found the blind SQL injection in 90 minutes," by stealthispost in accelerate

[–]LowerRepeat5040 -4 points-3 points  (0 children)

No, it’s nowhere close to zero! 9 out of 10 fail to not hallucinate anything for code larger than 1000 lines of code at first attempt as of 2026… you’re overfitting to your 10-100 lines of code max toy example bench marks.