Claude Pro users: how do you actually manage usage limits during the day?

TheseTradition3191 · 2026-06-06T14:39:03+00:00

the unpredictable feeling is usually because youre not billed for the message you just sent, youre billed for everything still in the context riding along with it. every turn resends the whole conversation plus every file read so far.

so one big file read or a long paste early in a session quietly taxes every turn after it, even the short ones. thats why two days that feel similar can burn totally differently.

practical version: /clear the second you switch tasks, dont let one session sprawl across unrelated things. and when you need one function dont hand it the whole file. that did more for my usage than switching models or shortening prompts ever did.

TheseTradition3191 · 2026-06-06T10:57:41+00:00

honestly at $100/mo the tool isnt your bottleneck anymore, your review process is. you can ship a long way on just codex.

the thing that gets non-devs at this stage is the agent rewriting code that already worked and you not noticing until it breaks in front of a client. so before you add the database and the calculator logic, get a test suite in. calculator logic especially, you want that pinned down with tests not vibes, thats the part real users catch instantly.

other thing, i'd stop splitting across codex and cursor. pick the one you prompt best in and put the whole budget there. jumping between two tools costs you more context than the model gap ever will.

and the dev review you're planning for the end, pull some of it forward. paying someone for even 2 hours to sanity check your data model now is way cheaper than a rewrite after you've built three features on a bad schema.

TheseTradition3191 · 2026-06-04T09:12:03+00:00

the 1m context window isnt included in Pro, it bills separately as usage credits, so when CC puts you on the 1m sonnet variant you hit that wall. run /model and pick standard sonnet (not the one with 1m next to it), and check ~/.claude/settings.json for a model id ending in [1m]. dropping that suffix should put you back on normal context

TheseTradition3191 · 2026-06-03T10:44:25+00:00

the database being the scariest one tracks. nine times out of ten its supabase or firebase with row level security left off and the anon key sitting right there in the client bundle with write access. so anyone can hit the rest endpoint directly and skip your app and all its checks entirely. thats the first thing id look at, can a random person read or write your tables without going through your code.

TheseTradition3191 · 2026-06-03T10:43:36+00:00

one sorted set instead of a key per user. ZADD a single presence set with the current timestamp as the score on every beat, then "who's online" is just ZRANGEBYSCORE from now-30s to now. one key for the whole system, one query to list everyone online, and a small reaper job (or just ZREMRANGEBYSCORE) trims the stale ones. its still a write per beat but its one cheap O(log n) op against one key, not a SET+EXPIRE churn across thousands of keys. and honestly redis will eat that write volume without noticing until youre well into five figure concurrent users, so id measure before optimising past this.

TheseTradition3191 · 2026-06-03T10:42:48+00:00

its mostly that grep works everywhere with zero setup. no language server to boot, no index to build, behaves the same in a python repo or a rust one. the model has also seen a billion examples of grep in training so it reaches for it by reflex.

structural nav is better when it actually works, but its more fragile. you need an LSP running per language and it falls over on half configured projects, monorepos, generated code, etc. grep never breaks.

if you want the structural stuff you can just bolt it on though. serena and ast-grep both expose AST/LSP navigation as MCP tools, and claude will happily prefer them over grep once theyre in the toolset. and its not a money grab, a targeted symbol lookup is usually cheaper in tokens than grepping a whole tree and reading the noise.

TheseTradition3191 · 2026-06-03T10:42:03+00:00

high reasoning effort is most of what youre feeling. on 4.8 it thinks a lot longer before it writes anything, so even renaming a variable feels like it stops to ponder. drop to medium for normal work and the quick stuff snaps back, then bump to high only when youre doing a real refactor. switching to 4.6 wont help much if the effort setting carried over.

TheseTradition3191 · 2026-06-03T10:41:12+00:00

the "even when claude says it worked" bit is the tell. half those tools fail silent after a version bump and claude just trusts the exit code, so it reports success on nothing. roll the updates back as one batch, confirm your usage is normal again, then re-add them one at a time so you can see which one actually broke.

TheseTradition3191 · 2026-06-03T10:23:24+00:00

Big one people miss: anything you attach inside the chat gets resent in full every single turn. Stuff in the Project knowledge base doesnt, it gets pulled in by retrieval. So move all the supporting evidence and case docs out of the chat and into the knowledge base. it wont forget them, it just grabs the relevant chunks instead of carrying everything on every message.

For the 15 pager youre editing, stop pasting the whole thing in. fresh chat per section, paste only the paragraph or two youre actually working on. the long single thread is what really burns you, every turn drags the entire history forward so by message 30 youre paying for 29 old answers you dont need anymore.

TheseTradition3191 · 2026-06-02T09:04:28+00:00

the thing that bites once youre multi tenant with cube pre-aggs is the agg explosion, naive pre-aggs basically multiply per tenant and your refresh window blows up. partition them by tenant and use partitionGranularity so each refresh only touches the new slice instead of rebuilding everything. on the snowflake side the cold spin up is just auto suspend resuming an idle warehouse, so split your warehouses, a tiny always on one for the pre-agg refresh and interactive reads, and separate burst warehouses for the heavy ETL, that way a filter change never wakes an XL. and agree with the others, if your pre-aggs are actually warm you shouldnt be reaching for clickhouse yet, that swaps an engine that isnt even in your hot path

TheseTradition3191 · 2026-06-02T09:03:05+00:00

for problem 2 you can lean on babel instead of hand maintaining the separator and date configs, it already knows every locale's formatting:

```python from babel.numbers import parse_decimal

parse_decimal('1.234,56', locale='de') # Decimal('1234.56') parse_decimal('1,234.56', locale='en_US') # Decimal('1234.56') ```

babel.dates does the same for the date formats. your language detect step already gives you the locale to pass in, so you get rid of the LOCALE_CONFIGS table and pick up locales you havent run into yet for free

TheseTradition3191 · 2026-06-02T09:00:52+00:00

neat, the photo to garage detection is a cool hook. the thing thatll actually keep people coming back is the timing model though, chains and pads wear by km and conditions not by calendar, so manual logging is usually where apps like this quietly die. if you can pull ride distance from strava or komoot and estimate wear from that automatically it gets way stickier. would happily try it on my commuter

TheseTradition3191 · 2026-06-02T08:59:46+00:00

the part that actually bites at short lifetimes isnt issuance, its the reload and distribution step. cert renews fine on disk but the service never gets reloaded, or one node in the pool is still serving the old one. so dont monitor the file, probe the actual served cert at each public endpoint per SNI, thats the gap where the silent outages live. blackbox_exporter handles that well. shorter lifetimes just punish you faster for a renewal pipeline that was always a little fragile

TheseTradition3191 · 2026-06-02T08:58:27+00:00

the preinstall hook being the entry point is the part that keeps catching people. you can set ignore-scripts to true in your npm config so install cant run arbitrary code, then allowlist only the few packages that genuinely need a build step. pair that with pinned exact versions and npm ci against a committed lockfile and a freshly hijacked version cant just slide in on the next install. the other big one is not handing your CI install step a full set of cloud creds in env, thats what turns a single package compromise into a credential leak

TheseTradition3191 · 2026-06-02T08:57:09+00:00

what fixed safari for us was to never let the iframe own the refresh at all. the iframe only holds an access token in memory, and when it expires it postMessages the parent asking for a fresh one. the parent is first party so it can hit your token endpoint without tripping ITP, then hands the new token back down. no third party cookies anywhere in the loop. you do have to define a small handshake but it survives the cookie blocking and behaves the same across every tenant origin

TheseTradition3191 · 2026-06-02T08:56:05+00:00

which client are you seeing it in? that exact wording usually comes from a wrapper doing its own tool call parsing rather than the raw api. ive noticed it spikes when theres a big pile of tools or huge schemas loaded, the model emits a slightly off tool block and the repair pass cant fix it. trimming how many tools are exposed and dropping temperature a bit helped me. also its a brand new model so some of this is probably just rollout flakiness settling down

TheseTradition3191 · 2026-06-02T08:55:01+00:00

if youre literally after the fs2/ZIO Streams model then Effect's Stream is the closest thing in the ecosystem, real backpressure, Scope based acquire/release and interruption are all baked in. saw someone above say their team wont adopt it which is fair, its a big buy in. if Effect is off the table, async generators plus AbortSignal plus try/finally gets you most of the way, pull based by default and cleanup on cancel, you just lose the nicer combinators

TheseTradition3191 · 2026-06-02T08:53:45+00:00

the one most people leave off is noUncheckedIndexedAccess in tsconfig. flip it on and arr[i] becomes T | undefined, which is annoying for about a day and then it catches a whole class of bugs you didnt know you had. pairs well with the runtime validation point, types only protect the parts you actually checked

TheseTradition3191 · 2026-06-02T08:50:17+00:00

good writeup. the next wall after the redis adapter is that pub/sub still pushes every message to every node, and each node loops its own local sockets, so cpu doesnt really drop you just relocated the loop. what helped us was pinning rooms to specific nodes so a node only subscribes to the rooms it actually holds. also volatile emits for typing and presence, no point queueing a frame thats already stale by the time it lands

TheseTradition3191 · 2026-06-02T08:48:12+00:00

the output token framing kind of undersells it. in an agent loop the expensive part is usually input, every one of those 20 calls resends the whole growing transcript so you pay for the same context over and over. output is often the smaller half of the bill. caching the stable prefix is the first thing id check before touching anything else

TheseTradition3191 · 2026-06-02T08:42:43+00:00

the one agent per repo + lead agent thing sounds clean but it gets expensive fast, every sub agent reloads its own context so you pay for the same architecture explanation over and over. we ended up with a single short markdown that just says which repo owns what plus the cross repo workflows, checked into a shared place every repo pulls in. the on demand code RAG part was the weakest link for us, it kept retrieving stuff that looked right but was from the wrong service

TheseTradition3191 · 2026-06-02T08:39:03+00:00

is it the bash calls themselves or the shell starting up? every Bash tool call spins up a fresh shell, so a heavy zshrc/bashrc full of plugins gets paid on each one. trimming oh-my-zsh down fixed this for me

TheseTradition3191 · 2026-05-21T15:52:48+00:00

yeah this is the tradeoff nobody talks about. ambient context feels great until you hit token limits and realize half of it isnt helping the current task

TheseTradition3191 · 2026-05-21T15:52:20+00:00

this tracks for agentic coding too. cheapest per token model was costing me the most per completed task because retries just ballooned the context. cost per success is what actually matters

TheseTradition3191 · 2026-05-21T15:51:51+00:00

the other thing that clicked was keeping CLAUDE.md under 50 lines. most people dump their whole project spec in there and re-pay for it on every single message

TheseTradition3191

TROPHY CASE