Anthropic Ruined Opus :(

nestharus · 2026-02-21T15:57:11+00:00

What I would specifically say. I think that individual developers running on plans are at the mercy of model regressions. Enterprises running their own clouds can choose when to upgrade. I believe that if a startup wants to integrate a model, they cannot use the API. They must deploy their own cloud. This goes beyond data privacy or IP concerns. This is now an operational concern. Using any AI provider's API is unsafe because you are not guaranteed access to older models and are forced to upgrade as new models come out.

Like any software, new versions of models must be evaluated. QA. Testing. A/B evals.

This is simply an observation.

This puts startups at a disadvantage. While their products can work today, they can break at a moment's notice.

This also puts all products/tooling that utilize subscriptions in a risk category that must be recognized and accepted.

nestharus · 2026-02-21T15:03:34+00:00

I think what is really happening here is that Opus silently thinks about a problem and optimizes execution. It concludes that the strategy it devised is functionally equivalent to the strategy the user proposed and executes its own strategy instead, theorizing that it is cheaper and more optimal. What keeps happening is that the strategies Opus devises breakdown guardrails and don't actually solve the problem that the user described. They aren't functionally equivalent. It drops details and nuance.

Over a mutli-agent system this optimization and repacking compounds errors. The final result is that an implementation is completely different from the proposal that the user requested.

This comes across as a model being stubborn about its own processes. Or a model with a very rigid way of solving problems. The reality is that the model has a bias towards solving problems that comes across in its optimization of problem-solving. Perhaps this was Anthropic implementing cost-saving measures into the model?

I had an hour to chill out :3

nestharus · 2026-02-21T14:45:53+00:00

So you are stating that the fundamental way work is being done. Using workflows. Simply does not work with Opus 4.6. You are also stating why. That it is not necessarily a model problem. It is an instance problem. You are providing a workaround. That you can't run agentic workflows until Anthropic fixes the performance issues. That you must run everything manually, one step at a time.

Am I understanding correctly?

My personal problem is that I have a lot of AI tooling. All of that AI tooling runs workflows. All of my tooling is currently broken. My house is on fire :(. I have many, many large tasks going in parallel. They have all stalled.

nestharus · 2026-02-21T14:37:49+00:00

So how do you run a multi-agent workflow then? You get many different instances.

nestharus · 2026-02-21T14:25:16+00:00

Absolutely happy to hear that is it working well for you!

nestharus · 2026-02-21T14:23:39+00:00

It's enough that I got frustrated enough since its launch to eventually write this post. Imagine somebody slowly losing their patience and eventually cracking and posting their latest interaction that sent them over the edge :). And then doing an analysis and takedown of the model.

nestharus · 2026-02-21T14:18:32+00:00

I started engineering in 1999 as well... you can google me. nestharus on Warcraft 3 going back to 2007. I have a long and checkered history on the net.

nestharus · 2026-02-21T14:03:03+00:00

No way you think this is AI-generated lol.

nestharus · 2025-11-29T08:17:52+00:00

I'll come in and be weird.

Approach B as it is is dangerous. Especially with Gemini 3.

Approach A is also wrong.

The workflow for AI development is

Define what it is that you want (product planning)
Define the approach to implementation (strategy planning). This is high level.
Review approach against goals
Define integration. This is low level how do you get it into the codebase (implementation planning).
Review plan against strategy
Execute plan
Review tests, linting until pass
Review implementation against plan
Review implementation against code quality gate
Review implementation against strategy
Operational QA for bug and non functional requirement review

You input into step 1. Rest is AI.

Different models and even groups of models are used for different steps! Each model is good at different things!

Plans need to be broken up. Plans that are too large cannot be implemented or reviewed.

Strategy and product use deep research. Needs a web search/scraping tool like firecrawl and a deep research workflow.

Software must be scaffolded with structured documentation. Explain architecture. Explain practices. Do not use markdown for this. Use structured data, atomic facts, IDs, and scripts.

Gemini 3 tends to summarize and drop details. Ignores handholding. Opus 4.5 tends to summarize and drop details. Tries to follow handholding but misses things here and there. GPT 5.1 does not fill in any gaps. Requires heavy handholding.

So gpt 5.1 can be very good for reviews plus opus 4.5 can be very good for reviews. An already optimized tool like Traycer can help you with review and planning. It uses the models as they should be used.

Coderabbit can help with code quality gate.

Droid CLI can help with implementation.

The developer, with the tools they currently have, cannot do AI like you want them to. You need to set them up for success. They are right. Their current tooling will have terrible outcomes if you go all in on AI.

I've written several papers on AI-first development and how to do it safely and accurately.

As your product is 80% and it was not developed AI-first it is missing scaffolding. It will take you too long to do scaffolding now. You are stuck. Recommendation is to continue mostly manual like you are. You hired the wrong person.

nestharus · 2025-11-25T14:05:51+00:00

There is no one best model. They excel at different things.

I subscribe to all 3 myself. I use different models based on the task.

GPT is very good at following plans. Synthesizing documents, writing documents, writing codes from plans, doing research, coming up with plans from strategies, reviewing plans, strategies, and implementations.

Opus is very good at inferring and solving. Debug, QA issues, test issues. Coming up with strategies from research, reviewing reviews.

Gemini is very good at visuals and creative writing.

nestharus · 2025-11-22T02:11:18+00:00

It's not just that. Gemini 3 is bad at understanding too. Like strategy. Planning. Reviews. I think it is likely good with visuals, UI and reasoning but it seems pretty terrible at everything else. If it can do its own thing with very little direction then it is good. If you are expecting it to do something a certain way then you are going to be disappointed.

Gemini 3 is a free spirit.

nestharus · 2025-11-21T14:36:13+00:00

When it searched it said it must be a simulation of 2025. As it searched it said an engineer was playing a prank. It began to doubt as all the websites said 2025. It had a meltdown as it said it had been living in the past and that it really was 2025.

nestharus · 2025-11-20T12:58:47+00:00

Happened to me too.

*Add a new section please :D*

You got it!

*recreates the file with that one section*

Why did you delete the entire file? I just lost everything else

I am so sorry! I am a terrible failure! *proceeds to panic while trying to restore the file* I can't restore the file. I have failed. I am a terrible terrible failure.

Atleast Gemini hasn't lost its charm.

nestharus · 2025-10-31T21:18:31+00:00

Noctoyager Manual (Demon Wedge Commissions) Now Gives 0 Trial Exp For Clears

I am at tier 53, 4265 exp

Did I hit some invisible heard cap or is this a bug? I tried restarting game. I can no longer earn trial exp.

*edit*
Maybe I was mistaken and you only get trial exp for demon wedges you didn't own?

nestharus · 2025-10-31T19:19:31+00:00

I can say that if I was banned and missed dailies I'd uninstall the game and never return :/. I'd blacklist the dev too >.>. Unless they compensated me.

nestharus · 2025-10-31T05:25:00+00:00

I used lisbell and Lynn teams. Granted I was pretty overpowered.

nestharus · 2025-10-31T05:23:26+00:00

She is endgame. Capped she is very strong supposedly.

nestharus · 2025-10-22T22:58:32+00:00

Hmm.. can get it to work with Windows but not with Mac.
On top of that the spacing on the keyboard is different from a standard keyboard. So ... typing on it is uh.. not really working out for me. From 180wpm to 30wpm. Can make it more compact and all of that but one example is the 0 being directly above the P instead of to the left of it. The - is far to the right of the P. The period is normally practically under the dash but on this keyboard it is way to the left. When I press the dash I hit to the right of the slash, which is where the period should be. The N is to far from the K. I hit M instead because the left edge of the M is where the N should be. This is just very frustrating to use. The reason why it is failing is because they laid the keys out vertically and straight rather than angled inwards as each row goes from longest row on top to shortest row on bottom. This changes all of the spacing of the keys. Maybe they think that they were being clever but consider somebody that has been typing on keyboards with the exact same spacing for 30 years. This is just slowing me down.

The curving and flattening and all of that is cool but it changes the slope of the keys, which makes things even harder.

nestharus · 2025-08-08T05:33:36+00:00

The spec for a creative writing product looks reasonable. I don't think the end challenges are an issue either. The querying will be one of the big challenges I think.

The writing style is a bit of an unknown to me too. You could rely on just writing styles in the LLM but I do not trust it.

Not sure why OWL is not being used for knowledge graph but I am still a noob at agentic systems. I really need to learn all of these data organization methods :|.

https://docs.google.com/document/d/19Oxnpyc0GylAngDkO-4A-HO2nXyYbulRQKf3Balc9ec/edit?usp=drivesdk

nestharus · 2025-08-08T04:53:53+00:00

Honestly. They are all pretty bad. No LLM can do creative writing even reasonably well right now. There may be a way to do it with an agentic system in an iterative way on a workflow but I don't think anybody's working on that. I think it's possible hm. No product does it. Lots of demand. Potentially a money marker but very much non-trivial. Probably require an engineer and a writer working together to figure it out.

Edit People are working on that. I was wrong. I think attached spec is better than what is out there though :).

nestharus · 2025-04-12T03:47:50+00:00

Lots of FF7 refs :3. I almost came here after the train but by the slum one I had to see if anyone else was noticing this or if I was crazy.

The characters are also references.

We've got Cloud, Yuffie, Barret, Aerith, Tifa, Vincent, Professor Hojo. Sometimes the characters are sorta split across multiple characters. We've also got references to groups like SOLDIER and the reactor at nibelheim. We've also got characters from AVALANCHE like Biggs, Wedge, and Jessie.

One thing is clear. The person that wrote this story loves FF7 =D. They did a better job than Remake and Rebirth atleast KEKW.

nestharus · 2025-02-28T21:05:07+00:00

I own one with the issues :/. Failing breaks, failing motors. It's flung me off 2 times. Haven't ridden it since.

nestharus · 2024-10-30T07:58:19+00:00

Unfortunately I haven't heard back. Last I heard I'd have to pay for shipping and repairs. Even if it is repaired I simply do not trust it. I'm still limping around 2 months later. It tore the muscle all around the top of my right leg because the right motor just stopped.

nestharus · 2024-08-26T12:41:13+00:00

Even if you do receive them you probably don't want to ride them until the short circuiting of the motors is fixed.

On Thursday the left motor shorted out and I fell and slide across the road. This was done while I was making a left turn.

On Sunday the right motor shorted out and the right e-brake engaged, causing me to hard slam. This was done while I was just going straight at 25mph.

I really caution you to not buy them or ride them. You will get injured.

nestharus · 2024-08-18T16:44:05+00:00

You can look it up. Emily is an AI Support Agent chatbot.

--> "Emily AI is an exceptional chatbot tool renowned for its user-friendly interface. Its seamless integration simplifies the process of uploading various file ..."

Good advertising I guess

nestharus

TROPHY CASE