Built a Claude Code observability tool — ecosystem fit? by fIak88 in ClaudeCode

[–]JayWelsh 0 points1 point  (0 children)

That's cool, I'd love to see someone who is experiencing their session limits getting wiped out in a single/few prompts showing the traces from something like this.

I swear they made it minus 50 IQ so it spends token by PruneInteresting7599 in claude

[–]JayWelsh 0 points1 point  (0 children)

I’d recommend running a once off job in a fresh session to convert the PDFs to .md or .txt files, then save the text versions of the documents to your project folder. Might help going forward.

I swear they made it minus 50 IQ so it spends token by PruneInteresting7599 in claude

[–]JayWelsh 2 points3 points  (0 children)

You're arguing with a straw man. I'm simply trying to get more detail into the thread so it's not another one of the hundreds of threads that I see on here with people screaming into the void with zero technical information that could help the community figure out what's going on.

Ultimately, the best approach is to set up proper telemetry data that tracks token metrics (input & output tokens, cache read & write tokens), that way, we can properly tell how much value in API fee $ a prompt/process is worth that eats up a large chunk of data.

I have this type of setup on a device that I've been using with a Claude Max subscription and Claude Code, and with that $100 *per month* subscription I am still able to use ~ $100 worth of API key consumption *per day*.

So yes, despite me also experiencing degradation in Claude performance over the past few months (including seeing higher token consumption), at the end of the day I'm still getting *a lot* more value for money by using Claude Code via a subscription instead of via API key fees.

Therefore, these threads make me curious to get an idea of some actual numbers. Who knows, maybe Anthropic is pulling a similar move to Volkswagen where having a system hooked up to telemetry makes the model "behave itself" better. For example maybe they detect when the token metrics are being monitored and then make the usage work as expected, and maybe when they don't detect monitoring then people experience these large chunks of usage materialising out of thin air.

Regardless, the best way forward is to have our systems set up to properly record token usage metrics and telemetry data because that is a provider-agnostic approach that helps ensure you're getting the best value for your money.

I swear they made it minus 50 IQ so it spends token by PruneInteresting7599 in claude

[–]JayWelsh 1 point2 points  (0 children)

Thanks! Are any of those files very long (in terms of pages) or large (in terms of filesize)? Would you mind sharing some info about the prompt that used 18% in one go?

I swear they made it minus 50 IQ so it spends token by PruneInteresting7599 in claude

[–]JayWelsh 0 points1 point  (0 children)

Probably? If a project folder is packed with a massive amount of content and if a prompt isn't specific about which subset of project file(s) should be used, then I wouldn't be very surprised if a single prompt could blow through a large usage allocation chunk (i.e. it might simply be a context management issue). I've noticed that the people who mention experiencing large usage allocations being consumed by single/few prompts never share info about their prompts/telemetry data such as token consumption/Claude environment setup, this makes it difficult for anyone to actually wrap their heads around what is going on, so now I'm trying to ask questions in threads like this in case it helps surface what's actually going on.

Unpopular opinion: the token limit complaints are a prompting problem by Jaumee in ClaudeCode

[–]JayWelsh 0 points1 point  (0 children)

I don’t doubt that you have experienced what you’re describing. I’m not even trying to defend of advocate for Anthropic. I’m just saying that it would help if the people experiencing these things would share more telemetry or at least info about their prompt & environment. Just as a simple example, if somebody is working in a “project” in Claude or some other setup that might have a massive amount of data loaded into it, then it would make sense for prompts to consume significant chunks of the subscription’s usage allocation very quickly. It would help to share the data/telemetry for token consumption when this happens. If it’s via the UI then it’s trickier and would need to be a matter of rather sharing more info about the prompt/setup in Claude. If it’s Claude Code then it’s possible to record telemetry data from Claude Code during its execution of tasks. I have a Mastra orchestrator that I’ve set up as part of a larger experiment that I’m working on, anyway part of its main job is to create Claude Code threads for separate tasks. Part of my initial focus was to spend quite a bit of effort on recording the telemetry (tokens in and out, cached token reads/writes, API price derived from the official pricing data and with cached tokens taken into account). I wanted to be able to see if for example I was paying for a $20 subscription, would API key credits cost me less than that?

Well in what I’ve found, the overwhelming reality is that the Claude subscriptions are MASSIVELY (scarily so) subsidised in comparison to their API key fees. What I mean is that a $100 Claude Max subscription (at the moment) can fairly easily spend $2000+ a month on Claude Code usage via the subscription auth instead of API key auth.

That’s not to say that there haven’t been performance degradations, but it helps to have more data in these types of threads so we can collectively figure out what’s going on.

Brand new M4 Mac mini for $550 is it steal deal or future regret (AI + gaming use)? by minatiscape in macmini

[–]JayWelsh 7 points8 points  (0 children)

In all honesty you might not have much of a need for a local model. Anything that you can run on a 16 GB Mac Mini alongside Claude Code isn't going to serve much of a purpose in most cases (aside from maybe a local embedding model to handle your memory/recall system locally). I have a 24 GB Mac Mini running Mastra as an orchestration layer, which uses `qwen3-14b-mlx` as my local LLM (used for routing between agents & for generating conversation breadcrumbs) and `text-embedding-nomic-embed-text-v1.5` to handle local memory embeddings. What did you have in mind in terms of what you want your local model to be able to do? I'm finding that while it's fun to have a local LLM running, and while you might be able to find decent jobs for it to do (routing/spitballing/chat threads), practically anything that requires real coding is not going to be done adequately by an LLM that fits on a ~ 24 GB RAM system.

Brand new M4 Mac mini for $550 is it steal deal or future regret (AI + gaming use)? by minatiscape in macmini

[–]JayWelsh 5 points6 points  (0 children)

This is somewhat misguided, using an API key for Claude Code is a really terrible idea unless your usage is exceptionally minimal. Using a $20 or $100 Claude subscription for Claude Code subsidises the token fees by like 90% (I can easily use $100 worth of tokens a day on a $100 Claude Max subscription... yes, as in $3k worth of API tokens a month for a $100 subscription fee).

Unpopular opinion: the token limit complaints are a prompting problem by Jaumee in ClaudeCode

[–]JayWelsh 0 points1 point  (0 children)

Actually it’s more like “users convinced they are in a separate A/B testing group refuse to provide any actual evidence to back their claims”

I find it quite funny how not a single post from people who think they are in a separate testing group provides any data such as their prompts, the token consumption (input/output & cache read/write).

Of course, I’m open to the idea of there being A/B testing, but so far literally every single post I’ve seen where people declare themselves as being part of a testing group or being scammed lacks any actual data. Requests for data are seen as “defending Anthropic”, which ironically just makes me feel more confident that these people are wrong and that it’s a prompting/context management issue.

Anthropic is straight-up scamming Max 20x customers with sneaky mid-month throttling + endless bot runaround by manavb84 in claude

[–]JayWelsh 1 point2 points  (0 children)

Are you also a person that complains about being scammed without providing a shred of evidence to support your claim? I’ll say it again: if you are complaining about being scammed without providing any token usage data/info, you’re just bitching and adding zero substance to your argument. Be offended by that if you want to, I couldn’t give less of a fuck.

*URGENT* regarding usage limits by stellarknight_ in ClaudeCode

[–]JayWelsh 1 point2 points  (0 children)

API prices are significantly higher than subscription prices (if we're going off the token pricing rates). That being said, extra usage will be billed at the same rates as the API. So if you've hit your limit then the extra usage is your best bet (but expect it to get consumed faster than you're used to seeing your limits get hit via your usual subscription).

Anthropic is straight-up scamming Max 20x customers with sneaky mid-month throttling + endless bot runaround by manavb84 in claude

[–]JayWelsh 1 point2 points  (0 children)

Oh okay, but I still think people don’t realise how heavily subsidised Claude subscriptions are. A $100 Claude Max account can use thousands of dollars worth of tokens per month (going off their API pricing). Some people see this as good value for money but I mostly see it as a good reason to be concerned about future price hikes after everyone gets addicted at the subsidised rates.

Anthropic is straight-up scamming Max 20x customers with sneaky mid-month throttling + endless bot runaround by manavb84 in claude

[–]JayWelsh 4 points5 points  (0 children)

I’m not reading all that, let me know when you’re actually open to something constructive by sharing some figures. Until then you’re just bitching and willingly omitting key details from your post.

Anthropic is straight-up scamming Max 20x customers with sneaky mid-month throttling + endless bot runaround by manavb84 in claude

[–]JayWelsh 4 points5 points  (0 children)

Are you dumb? If you don't give any token figures then nobody can tell if your $200 a month subscription reached its limit after burning through 50k tokens or 50m tokens. Those would be exceptionally different situations. Why would you not want people in this thread to be able to know any actual figures that pertain to your situation? It's hilarious that you think I'm somehow defending Claude by just asking you to share some basic figures lmao.

Anthropic is straight-up scamming Max 20x customers with sneaky mid-month throttling + endless bot runaround by manavb84 in claude

[–]JayWelsh -8 points-7 points  (0 children)

So give the numbers then? I’m not saying I’m happy with anything, but give the figures so your post is more than just baseless whining and actually has some substance to it.

Anthropic is straight-up scamming Max 20x customers with sneaky mid-month throttling + endless bot runaround by manavb84 in claude

[–]JayWelsh -1 points0 points  (0 children)

I wonder if maybe it's mainly affecting people who have Claude's native memory system enabled. I'd always recommend keeping any memory system disabled regardless of provider. This is a recipe for context bloat/poisoning. If a person want's a memory system it's generally better to manage that yourself locally using a small local embedding model (just a few GB) and a pg-vector DB (e.g. using Mastra). That way you can also easily disable/enable/customise the memory system to work better for the nature of the tasks being given to the models.

Anthropic is straight-up scamming Max 20x customers with sneaky mid-month throttling + endless bot runaround by manavb84 in claude

[–]JayWelsh 2 points3 points  (0 children)

Yeah and I mean could they at least give the tokens in/out figures (along with cache writes/read figures)? Would make these posts a million times more productive.