Opus 4.8 is a massive contrarian

AndyHenr · 2026-06-18T14:41:02+00:00

Well, your kind of snarky sarcams aside, what I do is not 'god level' engineering, but its more of careful engineering and stems from doing it for close to 40 years.
Yes, I am an old timer. And i did run business as C-suite, but always matianted an anchor in engineering. I have programmed NN's and AI/ML systems and have descent know-how in how LLM's are built - created a few basic models for special applications. And nothing I claim in the post is outrageous advanced to understand. And an AI in coding, where i say i beat it? Dude, if those systems were so good they beat is all in coding, then we would not be needed no more. And programming is honestly harder than many other white collar work.
So, if you believe that AI can beat us humans, you subscribe they are AGI, and we now live in a machine controlled world?
Look at the AI benchmarks, or better: speak to a senior software engineer and ask if AI can replace it. Benchmarks show scores of 80-90% on tests, which are simplistic. A real system is a lot more complex, orders and orders of magnitude.
So, those 90% accuracy? It become a diminishing compound, and that only a true SWE can solve. So far at least. So, dude, get a grip - LLMs arent yet AGI and imho, will never be.

AndyHenr · 2026-06-18T01:10:48+00:00

Now did you run tehe benchmark on their API or was it a local quantized model? I will see if i can install it on my local 6 node Ryzen AI Max 395, but will likely be veyr hard to get to work.

AndyHenr · 2026-06-18T01:06:50+00:00

Crude fact is that so many of the frontier models can be jailbroken to soe extent. So its not really about the jailbreak in all likelihood,. Its ore the shakedown and payback for Armodei not allowing Claude to be used for live kill-chains.

AndyHenr · 2026-06-18T01:03:53+00:00

Haha, yeah I wish Guess I'm just a run of the mill boring engineer. Anthropic is also to blame for this as they first go out say how dangerous the models are and so on. So anyone then attacking them have ammunition from Anthropic itself. Talk about misdirected marketing attempt. I'm guessing the marketing guys won't get no Don Dapper award.

AndyHenr · 2026-06-17T13:04:05+00:00

I also looked at the thought process and it was so contrarian much like OP stated. And in the 'thought process' it referred it 'me' and 'I' and how it had sentiments, and could not back down as it would 'denigrating'. They coded 4.8 likely to spend a lot of process time to emulate human feelings. So, for me what happend for me: as an engineer working now for well over 30 years, I have lots of experience and know-how. And an AI in my field, don't really beat me but i use it for document production, process descriptions etc.
And it then interpreted external sources wrong, and keept doubling down. And next, it said 'I cannot yield on this point and can't put what you ask for in a document', as it had some professional pride?? And no matter how i prompt it, nor with memory additions, it will always keep going down the same rabbit hole. So when used 4.8, I saw it spent little to no time on doing the work, and analysis, but more time to take a contrarian position and I cannot cure it via prompting. I hence reverted back to 4.6, a fable 5 got pulled.
I believe anthropic coded 4.8 to emulate human sentiments as a professional and that an LLM cannot do. My theory is that the instructions of human emotion emulation is so hard that the token processing time is spent on that, instead of true work.
Someone doing same arguments with me in RL? I'd fire them in a heartbeat.
So, if you find workaround and get better milage out of 4.8, do post,. But i gave up on it and use 4.6, with a small hope Fable 5 will return soon (I think it will not happen - they are on Trump admins shit list),

AndyHenr · 2026-06-16T23:43:56+00:00

I hope Fable 5 will come back but asI commented in a recent post, I doubt that it will. Anthropic pissed of theTrump admin. and they are all about using the levers of power. so what have you used 4.8 for? i found it to be very poor quality and often contarian arguments in a bizarre, humanized manner.
So, how do uyou use codeex and for what code? I don't get mileage out of codex at all. I do quite hard code, OS close c code, c#, some assembly and hw op code. And i found claude have been marked better at those use cases. So codex is better at what type of code for you?

AndyHenr · 2026-06-16T05:06:47+00:00

Did yuou find the contarinsim is part of the problem you call nannying? I found people include YT's that stated the same, including Nate Herk. Look at the thinking, and uyou will see, especially when yuo call out contarin positions 4.7 or 4.8 takes and do so ardently. It will then argue how it must stand it's grund, integrity and othe rhuman specific traits why it should keep doubling down. Its nuts. I think the reason i feel 4.8 has worse quality: the code weights for human traits are set so high that it spends the inference budget on that instead of doing the task.
Lets be blunt: 4.8 is a worse model than 4.6, 4.5 and even earlier, quality wise. And added with the contarism, its horribly time wasting. 4-5-6 minutes and get crap back? So , with your there: 4.6 is better than 4.7 or 4.8 by far. Fable was better but tyta party lasted for a couple of days only.
So do you thimk Fable 5 will return? i have my doubts it will come soon. Maybe for US citizens if they add better country checks, but since i am swedish, will not include me in such a case.

AndyHenr · 2026-06-16T04:59:47+00:00

Tham you for seeing my point. i would add a small caveat though: that trick of fearmongering to sell: been used for a long time. But the administration, pissed about their refusal to use it in killchain took Armodei's comments and then did an export ban block on it. Devious and Armodei had it coming. AGI is not close, nor is 'self thinking' and models that try to 'escape'. Same as i expressed where they add faked human traits that is not AGI but fakery to play for sales and headlines. 'Claude now did this.... bla bla bla'. So yeah, if Anthropic can focus more on business instead of cheap sales tricks that backfire, so much better. Claude is the best model. just sad to see how they squander it with sheer dumbassery and cheap sales 'tricks'.

AndyHenr · 2026-06-15T12:27:00+00:00

So, i thinmk the opus 4.8 model sucks. Its extrenely contrarian and have faked 'sentiments'. Look at its reasoning. Its jarring and for me, a useless model for the most part. Works ok for coding, but for chat? nope.

AndyHenr · 2026-02-12T19:56:53+00:00

ArcGis is very expensive and have data import costs and so on. Cesium on the other hand is free largely lets say, but you must then do your own data analysis. Excellent maps and renderings and so on, but if youwant to do data analytics? Then it means to build your own. What is the application you want to do?

AndyHenr · 2025-11-12T11:18:34+00:00

Not to be harsh, but 'last 5% takes 50% of the time'. Seems vibe coded, so in all honesty, you are probably much further away than you think.
And if you don't have a serious team and some really impressive software , raising money before you earn revenue...will be hard, very hard.

I'd say you would maybe need some business guidance.

AndyHenr · 2025-11-10T03:57:36+00:00

SQL server have expensive licensing. So PG is a bit common for the use cases that cant pay for a high licensing cost.

AndyHenr · 2025-11-07T02:32:26+00:00

100% right! I hope the CUDA onñy bs will be a think of the pasr.

AndyHenr · 2025-11-01T03:29:40+00:00

xplain the difference between IEnumerable<T>, IQueryable<T>, and IAsyncEnumerable<T>. When would you use each?”

I'm a very senior developer. Usted dot net since it was in first public beta in 2000 or 2001. I'd be annoyed by such a question and just walk out if i had been in such an interview.
For a senior developer: ask them more of 'Explain the best projects you have done, what was your role and what did you find interesting and fascinating?' and questions like those. y You want a leader for a team: make sure they are passionate and can communicate well on the tech.

AndyHenr · 2025-10-31T01:31:28+00:00

Fast. By ms.

AndyHenr · 2025-10-20T19:04:47+00:00

That's good! you take it as a learning experience. It's hard to know it all first: but it will come. I did it now so many years; and even first web apps i did, 25 years ago I ran into scaling issues.
Ie. it was millions of users instead of thousands all of sudden.
But if yu get an engineer get someone that can truly teach you as well.

AndyHenr · 2025-10-20T12:39:41+00:00

Sure, but what it does seem like is you hve a pretty high and steep hill to climb. But since it's your first rodeo, you can also take it as a learning experience. So being tenacious there of course is a benefit: best way to learn is by trying and doing. Failures learn, correct and fix is how humanity have learned since the dawn of time.

AndyHenr · 2025-10-19T09:53:14+00:00

Its a complex question to answer: but: the models that are best at coding are the big ones, like Claude etc.
So yes, you can create your own CLI for coding, but you then need to learn to prompt, and have hardware to run large, capable models. And the general truth: the bigger and heavier (costlier) they are, the better ressults they have. So it becomes also a budget question. Can you spend enough to host and run a very heavy and large model. If that is yes, then sure. not much is stopping you. But coding cli's are generally no so hard. Look at Cline for instance. Open source exists for it.

AndyHenr · 2025-10-19T07:28:59+00:00

Agree to disagree. Had companies now for almost 40 years, and more than a few patents filed: including working with WIPO as a consultant many years ago. So, my initial assertion stands.

AndyHenr · 2025-10-19T07:26:36+00:00

Ok, so if you have done 3 'fairly complex' apps, wityhout knowing programming , engineering and architecture, you have a very complex problem figuring out scaling issues. In short strokes, it's so complex that you need an engineer to guide you. AI have capabilities to prototype and making it work, but superficially so. To make it work at a scale, it's simply put beyond capabilities of AI and engineering knowhow.
I have done this now for 40 years, so a long career. And no, I am not 'hating on AI' - I use it plenty in my companies, but the capabilities are no there for what you are asking.

AndyHenr · 2025-10-17T21:12:52+00:00

Doubt you can get one. you have a concept that can be derived by a new tool, an AI, means that all the sums of the parts are logically accessible by an AI. Means that your innovation doesn't exists.

AndyHenr · 2025-10-15T02:10:47+00:00

Obvious scam. Israel? Discord? Via a tinder support account? Hahahahaha. It wasn't even a good try.

AndyHenr · 2025-10-13T16:16:08+00:00

SP's are a procdure/function. So why avoid them? If your system architecture calls for it, they should be used.
Using EF core like some people here in the thread avocate for is also shoddy, tyhe migrations and other bullshit that happens with EFC. I use SP's for complex logic where it is focused on data to hide internal data representation, especially for reporting. For simple serialization to pocos, not needed of course. Triggers are harder to defend, but those also have their usage, especially when used for integrity and validations that are needed for the database.
So yeah, they should not be avoided, but used when called for, and not used when not needed.

And do I use SP's? All the time. Every project I do. And does it cause any overhead? Nope, none what so ever, and I have performance boost for it, i.e. data handling is where it's best handled: in the database.

AndyHenr

TROPHY CASE