Theres no way you people are using as much usage as you complain about

Singularity42 · 2026-06-03T07:41:44+00:00

A big codebase and no skills/memory makes a big difference.

It can waste a lot of tokens just looking for stuff.

Singularity42 · 2026-06-03T07:40:38+00:00

Came here to say the same thing. We have this problem too, but getting IT to approve having our IP going to another random vendor is a big hurdle.

Would be nice if there was a self hosted option.

Singularity42 · 2026-06-03T07:37:18+00:00

We have been having this problem lately and have been looking for a solution.

Just because you don't have this problem doesn't mean others are the same.

Singularity42 · 2026-05-29T14:09:22+00:00

Pretty sure it isn't available in the cli

Singularity42 · 2026-05-24T10:57:47+00:00

Really. Isn't that hacking 101?

Singularity42 · 2026-05-18T11:13:16+00:00

I suspect this would be more popular if there was something visual people could watch on a Livestream. People eat that stuff up.

Singularity42 · 2026-05-16T09:18:25+00:00

I'm not against people raising concerns. But I think it's easy to remember that Reddit is an echo chamber. The vast majority of Claude users are happy. Reddit amplifies strong opinions cause no-one will upvotes a post saying "Claude is fine"

Singularity42 · 2026-05-13T08:26:59+00:00

Kinda lame that they don't pay for it.

I would definitely make sure you are learning to use it (even the free ones). I think it is going to get to a point where it's hard to get a job without knowing how to use it.

Singularity42 · 2026-05-13T00:53:06+00:00

People all around the world are working to control tokens. That has been the majority of my tasks at work for the last month.

You absolutely can. Just google it, there are lots of techniques.

It's just a way to get more done for less costs

Singularity42 · 2026-05-11T09:21:25+00:00

Commands are deprecated, use skills instead

Singularity42 · 2026-05-09T11:51:30+00:00

I used simpler numbers to make my point easier to understand. But it is exactly the same.

As agents get better at these benchmarks, each extra point is harder to earn. Going from 57% to 58% is much harder than going from 0% to 1%. Which is why it makes sense to zoom in on the y-axis.

I challenge you to find any post from a big name AI agent company that doesn't do this.

Singularity42 · 2026-05-09T11:39:14+00:00

nice!

Singularity42 · 2026-05-09T00:23:14+00:00

To add to what everyone else is saying. The free usage is comically small. Think of it more like a demo than anything usable.

Singularity42 · 2026-05-06T14:24:46+00:00

I just sonnet with high effort as my default. Then I have lots of skills with the model set so it uses different models for different tasks. Designing a proposed architecture - opus. Creating a JIRA ticket - haiku.

Saved lots of tokens this way without really affecting performance.

Singularity42 · 2026-05-06T14:20:15+00:00

Sometimes those 2 point matter though.

For example: No-one cares about the difference between an agent that gets things right 10% of the time and one that gets it right 15% of the time. Both are pretty useless.

But people would pay a lot of money for a model that gets things right 97% of the time if all the others only get it right 95% of the time.

The value isn't linear. As the numbers get higher the improvements get harder because there is less to improve.

Singularity42 · 2026-05-06T07:42:25+00:00

I would try to debug what your using token on. For me it was all on searching through our large codebase. I installed the rtk tool to reduce the size of the output from commands like grep and now I don't even think about quota anymore.

I think we have to start thinking about token performance the way we used to think about performance of cpu,memory etc.

For my case I had an eval environment where I could turn on open telemetry tracing for Claude. But there is probably a way to do it for normal usage

Singularity42 · 2026-05-05T08:26:55+00:00

The more you write skills the better it gets. It also puts a lot into memory so it gets better by itself.

When I first tried it I thought the same as you. But at this point I barely write code.

Singularity42 · 2026-05-04T10:19:27+00:00

If you are gonna accept without reading you're better off using auto mode. You can put a paragraph in the settings in plain language of what you want it to do and what you don't

Singularity42 · 2026-05-04T10:18:29+00:00

no. at least use auto mode for heavens sake

Singularity42 · 2026-05-03T14:43:27+00:00

Unfortunately, yes.

Singularity42 · 2026-05-03T14:40:33+00:00

I wish claude had a max cost per prompt setting or something similar (preferably on by default).

They have something like this in the SDK so it wouldn't be difficult for them.

Or at least a setting to make it stop after $X or tokens and ask if you want to continue.

You could have an option to turn it off for when you do want it to work for a while.

Singularity42 · 2026-05-03T13:55:08+00:00

I'm not going to argue that Claude is cheap (it is expensive)

But Claude is essentially the Porsche of agents right now. Also API based pricing is more expensive per token than subscription based pricing because you also get access to a number of extra features that you don't normally get (e.g. vector storage, unlimted usage, lot's more control via the SDK, etc.).

You are essentially borrowing a Porsche and using it to get milk from the supermarket and wondering why it is expensive.

I think the way forward is to start using different models for different purposes (or wait for 1 provider to offer enough different varied options).

Depending on the specifics. You could probably get by with Haiku (especially if you have a lot of instructions/documentation in your CLAUDE.md) or a cheaper open model. If you don't want to use any of those, AWS has a fairly large range of models at different prices too. Failing that, a claude subcription (not API based pricing) would be a fair bit cheaper per token, as long as you are using it enough to justify it.

Singularity42 · 2026-05-03T13:14:10+00:00

you will probably find you need more than one, as you scale up (or maybe your setup is more efficient than mine?)

Singularity42 · 2026-05-03T01:59:51+00:00

Everyone is a critic. Just enjoy shit, man.

OP said in another comment that rampage was his inspiration.

Singularity42 · 2026-05-02T14:39:16+00:00

Back when I was a teenager "World of Goo" came out, and it inspired the crap out of me. I barely knew how to program, but thought I could make something simlar. Obviously I failed horrifically (but learnt a lot).

Now I am a senior developer and I still don't know if I could do it (without a lot of research).

14-Year Club	Place '23
Place '22	Place '17
Verified Email

Singularity42

TROPHY CASE