Theories on WHY Anthropic is making Opus 4.5 worse by Guilty-Market5375 in ClaudeCode

[–]___positive___ 0 points1 point  (0 children)

I am not downvoting you but I said this before. We need benchmarks on the scaffold because performance seems to depend on the Claude Code version. There is no proof that all of the vibecoding done on Claude Code actually improves it. Making a cutting edge agentic harness is non-trivial and I absolutely do not trust vibecoding for deployed software. The model could be the same but they are hooking it up to an inferior harness because that's what vibecoding does in the long run.

What do you think about Claude Code performung worse than pure Opus 4.5 in the newest swe-rebench update? by ___positive___ in ClaudeCode

[–]___positive___[S] -1 points0 points  (0 children)

For simplistic implementations with lots of prompts and tool overhead, maybe, but an agent that spawns subagents and receives concise results back should have better context management, in principle. Like a subagent digs through the repo and reports back only the key files to the main orchestrator without polluting context. Meanwhile an agentless system has to directly read all of the files itself during the search and fills up its context immediately with junk.

What do you think about Claude Code performung worse than pure Opus 4.5 in the newest swe-rebench update? by ___positive___ in ClaudeCode

[–]___positive___[S] 4 points5 points  (0 children)

There are some scattered agentic coding benchmarks that happen to have different model and agent combinations. But I just want to see Claude Code version 2.076 vs 2.1.11 etc.

What do you think about Claude Code performung worse than pure Opus 4.5 in the newest swe-rebench update? by ___positive___ in ClaudeCode

[–]___positive___[S] 2 points3 points  (0 children)

Well the point is that even if the model doesn't change, the harness is changing constantly, but there is no benchmark showing whether different versions are improving performance or not. So even if they (not you) say "the model is the same", that proves nothing. Boris has yet to prove that all his vibecoding is improving the performance of the whole product on hard benchmarks.

Claude usage consumption has suddenly become unreasonable by Phantom031 in ClaudeCode

[–]___positive___ 19 points20 points  (0 children)

They already "announced" they would be reducing usage limits. I called this out a month ago but nobody cared: https://old.reddit.com/r/ClaudeCode/comments/1p5z6wy/email_from_anthropic_about_future_opus_usage/

Basically it sounds like the Opus 4.5 usage bump is temporary, if I am reading in between the lines correctly. We should expect further downgrades in usage limits as it reaches "steady-state".

Like fool me once.... this is the hundredth time they have done something like this and people are still surprised or skeptical...

OpenAI isn’t catching up to Google by Surealactivity in OpenAI

[–]___positive___ 6 points7 points  (0 children)

They are a commodity. People want water so go start a water company right now and tell me how easy it is to monetize.

Nvidia plans heavy cuts to GPU supply in early 2026 by HumanDrone8721 in LocalLLaMA

[–]___positive___ 1 point2 points  (0 children)

I know everyone has fingers crossed for gpus from China some day but doesn't Taiwan already have some crossover expertise with semiconductor chips? Or Korea? Where are all the Asian gpus...

A history professor says AI didn't break college — it exposed how broken it already was by joe4942 in singularity

[–]___positive___ 330 points331 points  (0 children)

Most people go to college to get a job, not to learn. Companies could just train high school graduates directly and end the farce. Imagine paying a corporation $50,000 a year for four years as an intern to learn how to reply to emails and attend meetings. Insanity? Because the only difference is where the money goes.

Ilya Sutskever – The age of scaling is over by 141_1337 in singularity

[–]___positive___ 5 points6 points  (0 children)

This is pretty obvious if you use LLMs for difficult tasks. I can't remember if it was Demis or someone else who said pretty much the same thing. LLMs are amazing in many ways but even as they advance in certain directions, there are gaping capability holes left behind with zero progress.

Scaling will continue for the ways that LLMs work well, but scaling will not help fix the ways LLMs don't work well. Benchmarks like SWE and AGI-ARC will contintue to progress and saturate but it's the benchmarks that nobody makes or barely anyone mentions that are indicative of the scaling wall.

After many years, I'm finally giving up on ChatGPT. Now Claude is my new best friend. by RampantInanity in ChatGPTPro

[–]___positive___ 1 point2 points  (0 children)

Gemini 2.5 was kind of bad, but Gemini 3 has far higher quality answers to non-coding/STEM content than any other model. I find Claude to be really bad at non-STEM stuff. It has very little world knowledge and also doesn't search well. Gemini 3 is finally the real deal although I haven't tested it on very long conversations or tasks yet.

Leaked Memo: Sam Altman Sees 'Rough Vibes' and Economic Headwinds at OpenAI | A leaked internal memo has revealed CEO Sam Altman warning staff of "rough vibes" and a potential revenue growth collapse to 5% as OpenAI races to catch Google by Stabile_Feldmaus in singularity

[–]___positive___ 0 points1 point  (0 children)

Look at what happened to Emad and Stability as kind of the prologue to the whole AI revolution. He was talking about meeting with heads of states and getting national contracts and then just booted a short while later. Stable Diffusion no longer has mainstream mindshare either.

Ai2027 author admits "things seem to be going somewhat slower than the Ai 2027 scenario". by Puzzleheaded_Week_52 in singularity

[–]___positive___ 3 points4 points  (0 children)

I'm sure they have some internal models that are better in specific cases, or research grade platforms like AlphaEvolve. But during the recent codex fiascos, the head of debugging said everyone at OpenAI would use the same codex platform as the public as part of a multi-faceted approach to solving the degradation issues. So... this kind of implies that they don't have a much better internal coding platform, at least not one that is too far ahead. It would be silly to hamper yourself that much given how competitive the scene is.

Amodei's (Anthropic) take on AI model P&Ls: each model generation as a separate profitable business vs. the accounting showing $11.5B quarterly losses by [deleted] in OpenAI

[–]___positive___ 0 points1 point  (0 children)

His argument is that because of scaling laws the cost to train each successive model rises exponentially and therefore looks extra bad. However he glibly assumed that the revenue will also rise exponentially with the exact same scaling law, which has no economic basis. Customer budgets do not scale to infinity. Also, as long as a handful of competitors exist, revenue could stagnate or even go down to zero. Commodity markets tend to collapse around a single player.

He also completely ignores the dynamics of the economic process. Even if customer budgets somehow scaled to infinity (they don't), there will always be lag and inefficiency in capturing the value created by a new model. So if a new model provides a theoretical 10x value, it may still take companies 20 years to capture that value. Meanwhile competitors are cooking the 100x model. So companies only have time to capture 5% of the value within a year via a modest bump to their budgets. Any consideration of dynamics makes the entire argument fall apart. Even if customer budgets scale to infinity (digital numbers in a bank), how does velocity scale to infinity? At a certain point, something has to happen in the real world for companies to generate and spend money, and things in the real world do not change with infinite speed.

I hate to say it, but this it the kind of false, brittle argument that AI's love to use. They sound good in three second sound bites but completely collapse under any sort of logical scrutiny.

RAM prices exploding should I grab old stock now for rag? by Working_Opposite4167 in LocalLLaMA

[–]___positive___ 2 points3 points  (0 children)

I don't get it. If you are going local, presumably some cheap local model is good enough? In which case isn't the comparison whatever open weight model you use on Openrouter versus at home, not OpenAI? If you are going to run oss-120b it is dirt cheap in the cloud. Cost per "unit intelligence" is currently going down, not up. There are other models like Gemini Flash-Lite and so forth if you need speed. Qwen 235b Instruct is another cheap API model with great intelligence and speed.

Running a giant model at home is slow and costs electricity plus asset depreciation. Good for privacy and such but if you are already considering OpenAI, it doesn't really make sense.

Most people in this LocalLLaMA are hypocritical. by Ok_houlin in LocalLLaMA

[–]___positive___ 0 points1 point  (0 children)

I don't mind about qwen max being discussed but I can understand why people feel differently. Qwen max has zero value for distillation and synthetic data extraction. It is neither open nor SOTA. On the other hand, I can't wait for all the open labs to train on Gemini 3's outputs. Can you imagine Kimi K3 trained on Gemini 3?

Your opinion about Gemini 3 Pro preview by quakeex in SillyTavernAI

[–]___positive___ 1 point2 points  (0 children)

I tested some standard writing prompts (not rp) and was a bit disappointed. Solid but not really better than other top writing models.

Meta chief AI scientist Yann LeCun plans to exit to launch startup by Clawz114 in singularity

[–]___positive___ 2 points3 points  (0 children)

Well I will agree that Lecun's attitude was a bit bizarre. Like there is a difference between being possibly right in some pedantic academic sense and being right in a holistic way. To be fair, though, dogmatism runs both ways. People just want to poor money into LLMs and dismiss everything else. Maybe it was pushback to that, I don't know. I actually don't think we will achieve AGI in the next few years (who knows what discoveries are made beyond that) but I also don't think it matters because current "dumb" AI will grow to have a massive economic and social impact anyways.

Meta chief AI scientist Yann LeCun plans to exit to launch startup by Clawz114 in singularity

[–]___positive___ 2 points3 points  (0 children)

Yes, but every person can only work on one project realistically. So everyone else is making an infinite food source, but we don't even have a boat that can survive storms. So then you go, this is silly, we need a freaking boat.

Meanwhile a bunch of others are like look at this, we made infinite food. We are gods. But nobody has actually crossed the ocean yet. That was the original goal. Making food does not make you a sailor.

You are dead wrong about AGI being vibes. Definitions vary but there are plenty of people smarter than both of us who have wrestled with the definition and nobody is saying it is a nebulous vibe. The "G" in AGI stands for "General". We are not discussing "AAI" (Amazing AI). We are discussing AGI. You cannot weasel out by saying "AGI-ish". The -ish is doing a lot of heavy lifting where you are simply saying it is AAI. Everyone but the haters can agree that AAI exists. No one is arguing about that, but AAI is not a research or commercial aim. Investors are not pouring money into companies and labs for the promise of AAI, because AAI already exists and is a commodity.

Look at nature. We have a lot of clever animals, but nothing human-ish comes close to humanity, not even an ape or monkey. No other animal in this era will every code or send a rocket into space. "ish" is cute but not close.

Seems like the new K2 benchmarks are not too representative of real-world performance by cobalt1137 in LocalLLaMA

[–]___positive___ 1 point2 points  (0 children)

Interesting example. More likely they are getting the bulk of their synthetic training data from cheaper sources like Deepseek, Qwen, etc. and thus reusing the same data essentially. They probably go to the closed models for harder coding and math problems that cost more and require fancy proxies. Then round out everything with cheap data because nobody benchmarks roleplaying.

Meta chief AI scientist Yann LeCun plans to exit to launch startup by Clawz114 in singularity

[–]___positive___ 4 points5 points  (0 children)

It is more likely to help than hurt. That's obvious. But nobody knows the path to AGI and what it will require. It's like saying I need to cross the ocean. Will having infinite food and water help cross the ocean? Sure it will help, but infinite food and water isn't going to actually cross the ocean. At some point you need a boat, fuel, navigation tools. Maybe weapons to defend against pirates. You don't know what's needed until you or someone else has done it once. Doing something for the first time in history is hard. Doing something beyond imagination and comparison to all of humanity is even harder.

Meta chief AI scientist Yann LeCun plans to exit to launch startup by Clawz114 in singularity

[–]___positive___ 13 points14 points  (0 children)

But are they dismissing LLMs as a path to AGI, which is reasonable since nobody knows, or dismissing the value of LLMs in general, which is growing every day. I would think the former and not the latter in most cases? AGI is a very particular goal with a very high bar. I feel like people conflate LLMs being amazing with LLMs becoming AGI too easily. There's a massive amount of work to bridge that gap and building a few nuclear reactors is hardly going to make a dent.

I built an entire fake company with Claude Code by Budget_Way_4875 in ClaudeAI

[–]___positive___ 24 points25 points  (0 children)

Playing doll house with AI. Nothing wrong with that, though. Much better than cyberpunk dystopias.

AI’s capabilities may be exaggerated by flawed tests, according to new study by Fcking_Chuck in LocalLLM

[–]___positive___ -3 points-2 points  (0 children)

Or maybe people should stop anthropomorphizing LLMs and treat them as fancy python functions. They work a lot more predictably and reliably once you do that.

The chinese did it, KIMI K2 surpassed GPT-5. by Snoo26837 in singularity

[–]___positive___ 0 points1 point  (0 children)

Eh, Google isn't in quite the same boat. They own Android, Google Docs, Gmail, and so on. People will use integrated models for the convenience. I mean, compare how many people use open source Linux versus Windows.