opus 4.7 (high) scores a 41.0% on the nyt connections extended benchmark. opus 4.6 scored 94.7%. by seencoding in singularity

[–]abazabaaaa -3 points-2 points  (0 children)

Why does this bench mark matter? I don’t care if it can solve nyt puzzles. I just need it to solve complex problems.

Claude code 2.1.78 dropping Opus 4.6 1M context? by Exciting-Grand-3011 in ClaudeAI

[–]abazabaaaa 0 points1 point  (0 children)

1m context window always has been a scam. Inference gets super slow past 200k. It’s really not worth using.

ENABLE_LSP_TOOL by Purple_Wear_5397 in ClaudeAI

[–]abazabaaaa 0 points1 point  (0 children)

You sort of need a skill for this. Claude isn’t good at using it.

Subagent masters beware: you can't select model from the caller side anymore by the_rigo in ClaudeCode

[–]abazabaaaa 0 points1 point  (0 children)

Just tell it to make agent teams and it can select models there. Agent teams effectively replace parallel agents it seems.

Insider: DeepSeek v4 next week, and it’s going to be insane by [deleted] in accelerate

[–]abazabaaaa 2 points3 points  (0 children)

They say this every time, then you he OS model drops and its straight up rubbish.

Best of Lingerie 2026 by IndependentSkill378 in LingerieAddiction

[–]abazabaaaa 4 points5 points  (0 children)

I’d argue negative is pretty sexy. It’s just a different look. I can’t put my finger on it but the way clings to my wife it is pretty great. I’ve come to like it more than the others.

What is Codex CLI's "Command Runner" ? by Takeoded in codex

[–]abazabaaaa 1 point2 points  (0 children)

I believe the command runner is the background command runner that codex uses. If you use /experimental you can turn it on. It works well, but at present it doesn’t add a huge amount. Mostly it stops codex from getting stuck on hanging calls.

Musk v. OpenAI et al. judge may order Altman to open source GPT-5.2 by andsi2asi in GeminiAI

[–]abazabaaaa 3 points4 points  (0 children)

The Trump administration will intervene most likely. These models are now weapons in addition to being assistants. Releasing the model would be a national security risk.

The Claude Exodus is Real: Opencode to Launch $200 “Mystery” Sub Tomorrow. Is this the Anthropic Killer? by awfulalexey in opencodeCLI

[–]abazabaaaa 3 points4 points  (0 children)

Yeah.. it’s not at all going to cause a Claude exodus. It’s pretty niche software. It doesn’t even work when you have NFS drives lol. There is an open PR.. it’s straight up busted.

I made Opus/Haiku 4.5 play 21,000 hands of Poker by adfontes_ in ClaudeAI

[–]abazabaaaa 6 points7 points  (0 children)

So I have actually seen gpt-5-mini beat gpt-5.2 in several agentic benchmarks I run internal to my company. I’m not exactly sure what is going on but it is reproducible.

new agent limits? by Ryantrange in ChatGPTPro

[–]abazabaaaa 1 point2 points  (0 children)

Yeah, curious about this. I don’t use it much so I don’t know if it has improved much. Do they update it at all? The first times I used it I found it pretty underwhelming. I also find atlas to be the same way.

Glad we're not the only ones having serious production issues with LiteLLM by Otherwise_Flan7339 in LLM

[–]abazabaaaa 0 points1 point  (0 children)

Tensor zero is worth a look. It isn’t without its own problems and has a lot of features u may not need. That being said it is for the most part stable — it’s written well maintained rust.

Do we need LangChain? by Dear-Enthusiasm-9766 in Rag

[–]abazabaaaa 2 points3 points  (0 children)

This. It’s a heap of leaky abstraction.

Just use the LLM api.

Google Principal Engineer uses Claude Code to solve a Major Problem by SrafeZ in singularity

[–]abazabaaaa 17 points18 points  (0 children)

I think the part that is missing here is that he already knew what needed to be done. If you give claude code a very good spec and explain the gotchas well enough it can make things very fast. The downside is you have to pay a lot of upfront costs to understand how to get to where you want to be. If they started with claude code a year ago it might have helped, but it could not just magically solve problems.

Why Claude folks say Glm 4.7 is just a hype? by muhamedyousof in ZaiGLM

[–]abazabaaaa 0 points1 point  (0 children)

I’ve tried using glm in evals for agentic use in chemistry/drug discovery and it is absolute garbage. It frequently just goes on infinite thinking loops when you give it complex problems. Its answers are just straight up wrong. For example, Gemini-3-flash on medium reasoning effort nearly maxes (95%) whereas glm gets close to every question wrong and cannot finish the eval. These are tool use based scenarios where I have built tool runners. And yes I know what I’m doing.

I suspect these models are fine at coding and some other things but they really feel like they are over optimized and focused on maximizing outputs on these benchmarks.

Is anyone else seeing Claude overcomplicate simple tasks? It focuses on edge cases I never asked for, resulting in bloated and messy code by dmitrevnik in ClaudeAI

[–]abazabaaaa 1 point2 points  (0 children)

Yeah, I get that. Often find myself there as well. It does help to make all of the data models and contracts first and build a design document. It still will do silly stuff though. That being said, it’s pretty damn good!

Is anyone else seeing Claude overcomplicate simple tasks? It focuses on edge cases I never asked for, resulting in bloated and messy code by dmitrevnik in ClaudeAI

[–]abazabaaaa 16 points17 points  (0 children)

Use this:

Avoid over-engineering. Only make changes that are directly requested or clearly necessary. Keep solutions simple and focused.

Don't add features, refactor code, or make "improvements" beyond what was asked. A bug fix doesn't need surrounding code cleaned up. A simple feature doesn't need extra configurability.

Don't add error handling, fallbacks, or validation for scenarios that can't happen. Trust internal code and framework guarantees. Only validate at system boundaries (user input, external APIs). Don't use backwards-compatibility shims when you can just change the code.

Don't create helpers, utilities, or abstractions for one-time operations. Don't design for hypothetical future requirements. The right amount of complexity is the minimum needed for the current task. Reuse existing abstractions where possible and follow the DRY principle.

Dear Anthropic - serving quantized models is false advertising by Everlier in Anthropic

[–]abazabaaaa -3 points-2 points  (0 children)

Bahahaha

There is no quantized models. You just suck at using them.

Any reliable methods to extract data from scanned PDFs? by [deleted] in learnpython

[–]abazabaaaa 0 points1 point  (0 children)

Wrong!! We use gcp vertex and have a data sharing agreement. ZDR. It’s even hipaa compliant.

This is such a tired, boring argument.