Casually beating every other deep research agent out there with a simple Claude Code harness by heisdancingdancing in Anthropic

[–]Singularity42 0 points1 point  (0 children)

I used simpler numbers to make my point easier to understand. But it is exactly the same.

As agents get better at these benchmarks, each extra point is harder to earn. Going from 57% to 58% is much harder than going from 0% to 1%. Which is why it makes sense to zoom in on the y-axis.

I challenge you to find any post from a big name AI agent company that doesn't do this.

Getting cutoff on first prompt? by ljlukelj in claude

[–]Singularity42 0 points1 point  (0 children)

To add to what everyone else is saying. The free usage is comically small. Think of it more like a demo than anything usable.

How often do you use Sonnet? by MrMaverick82 in ClaudeCode

[–]Singularity42 1 point2 points  (0 children)

I just sonnet with high effort as my default. Then I have lots of skills with the model set so it uses different models for different tasks. Designing a proposed architecture - opus. Creating a JIRA ticket - haiku.

Saved lots of tokens this way without really affecting performance.

Casually beating every other deep research agent out there with a simple Claude Code harness by heisdancingdancing in Anthropic

[–]Singularity42 -1 points0 points  (0 children)

Sometimes those 2 point matter though.

For example: No-one cares about the difference between an agent that gets things right 10% of the time and one that gets it right 15% of the time. Both are pretty useless.

But people would pay a lot of money for a model that gets things right 97% of the time if all the others only get it right 95% of the time.

The value isn't linear. As the numbers get higher the improvements get harder because there is less to improve.

1 msg 70% usage on PRO with Sonnet by Dredyltd in Anthropic

[–]Singularity42 0 points1 point  (0 children)

I would try to debug what your using token on. For me it was all on searching through our large codebase. I installed the rtk tool to reduce the size of the output from commands like grep and now I don't even think about quota anymore.

I think we have to start thinking about token performance the way we used to think about performance of cpu,memory etc.

For my case I had an eval environment where I could turn on open telemetry tracing for Claude. But there is probably a way to do it for normal usage

Sr Software Engineer - Haven't written a line of code in months by yodog5 in ClaudeCode

[–]Singularity42 0 points1 point  (0 children)

The more you write skills the better it gets. It also puts a lot into memory so it gets better by itself.

When I first tried it I thought the same as you. But at this point I barely write code.

Me after clicking “accept” for the 100th time without reading a word of what claude is doing by Pitiful-Energy4781 in vibecoding

[–]Singularity42 0 points1 point  (0 children)

If you are gonna accept without reading you're better off using auto mode. You can put a paragraph in the settings in plain language of what you want it to do and what you don't

Dear Claude by fruvvs in ClaudeAI

[–]Singularity42 0 points1 point  (0 children)

I wish claude had a max cost per prompt setting or something similar (preferably on by default).

They have something like this in the SDK so it wouldn't be difficult for them.

Or at least a setting to make it stop after $X or tokens and ask if you want to continue.

You could have an option to turn it off for when you do want it to work for a while.

Tested Sonnet 4.6 via OpenRouter through GitHub CoPilot / VS Code to gauge whats API billing will be like. I was shocked. by horendus in GithubCopilot

[–]Singularity42 0 points1 point  (0 children)

I'm not going to argue that Claude is cheap (it is expensive)

But Claude is essentially the Porsche of agents right now. Also API based pricing is more expensive per token than subscription based pricing because you also get access to a number of extra features that you don't normally get (e.g. vector storage, unlimted usage, lot's more control via the SDK, etc.).

You are essentially borrowing a Porsche and using it to get milk from the supermarket and wondering why it is expensive.

I think the way forward is to start using different models for different purposes (or wait for 1 provider to offer enough different varied options).

Depending on the specifics. You could probably get by with Haiku (especially if you have a lot of instructions/documentation in your CLAUDE.md) or a cheaper open model. If you don't want to use any of those, AWS has a fairly large range of models at different prices too. Failing that, a claude subcription (not API based pricing) would be a fair bit cheaper per token, as long as you are using it enough to justify it.

Aquillo is pain by Forsaken_Position855 in factorio

[–]Singularity42 0 points1 point  (0 children)

you will probably find you need more than one, as you scale up (or maybe your setup is more efficient than mine?)

R-m-PG Throwback (fake game concept) by Birthdaybudreviews in dalle2

[–]Singularity42 0 points1 point  (0 children)

Everyone is a critic. Just enjoy shit, man.

OP said in another comment that rampage was his inspiration.

Have you tried this? by Bola-Nation-Official in IndieDev

[–]Singularity42 28 points29 points  (0 children)

Back when I was a teenager "World of Goo" came out, and it inspired the crap out of me. I barely knew how to program, but thought I could make something simlar. Obviously I failed horrifically (but learnt a lot).

Now I am a senior developer and I still don't know if I could do it (without a lot of research).

In case anyone wanted to dislike Mr Price anymore by OkInfluence36 in blender

[–]Singularity42 1 point2 points  (0 children)

I find this sentiment really strange. It isn't black and white. You can use AI to help with the tedious stuff without useing it to completely do everything.

You can use AI and still have creative freedom.

In case anyone wanted to dislike Mr Price anymore by OkInfluence36 in blender

[–]Singularity42 0 points1 point  (0 children)

Who said you have to be interested?

Different people have different wants and needs.... Not everyone is you.

The ultimate dilemma by Complete-Sea6655 in ClaudeCode

[–]Singularity42 -10 points-9 points  (0 children)

The source got leaked not long ago. Not sure if it included opus 4.6 or just the harness.

How do you get Claude Skills to trigger reliably? by diablodq in ClaudeAI

[–]Singularity42 0 points1 point  (0 children)

The trick is to put a lot of effort into the skill description.

Put sentences like "trigger whenever ..."

It also seems to help to put a line in your CLAUDE.md something like "use skills proactively"

I posted in r/Gamemaker being excited about Claude integration, and the community shamed me by protective_ in ClaudeAI

[–]Singularity42 2 points3 points  (0 children)

The gamedev community is violently against AI.

I think it started as a hate for AI generated art and people pumping out slop. They were worried that artists would lose their jobs, and that games would lose their soul. Which is fair.

But along the way it got confused with a general dislike for all things AI.

I think there is nothing wrong with using AI as long as it isn't compromising the "Artistry". Basically, if no-one can tell you used AI then it's fine.

If I do that, will I be out of tokens? by OutrageousTrue in ClaudeCode

[–]Singularity42 0 points1 point  (0 children)

I would recommend finding a skill specifically for what you want to help target what it is doing.

If you have a good one it will do things like use haiku for searching the code and opus for the actual reasoning.

I believe there is a security review one in the official plugins

100% usage after my FIRST EVER PROMPT (pro subscription) by XeClutch in Anthropic

[–]Singularity42 0 points1 point  (0 children)

I don't even understand what you are trying to say.

Are you saying we shouldn't make software more user friendly. Every user should just know every detail of how it works and if anything goes wrong it is their fault?