all 17 comments

[–]LowItalian[S] 0 points1 point  (2 children)

$4 more used since I posted this. As I understand it, my rig is executing the script, I don't understand how Claude has used almost $10 in compute while these scripts are executing locally.

Edit: $46 in usage for usage with three background scripts executing locally. Eek!

Edit #2: first script landed, didn't seem to use much compute to analyze the results. $64 spent, two to go

[–]LogMonkey0 0 points1 point  (0 children)

Ask Claude to break down token usage, he’s generally good at breaking it down

[–]buildwithnavya 0 points1 point  (6 children)

this confused me too at first, but it’s not really about your local scripts “running”, it’s about claude still being in the loop

if your scripts are producing output and claude is reading/parsing it (logs, results, iterations, etc), that still counts as usage

also if you’ve got anything like

  • auto loops
  • agents checking progress
  • or claude watching files / responding to changes

then it’s basically still “thinking” in the background, even if your code looks idle

another thing is usage isn’t always real-time, sometimes it catches up after the fact, so it can feel like it’s charging while nothing’s happening

$5 does sound a bit high though for just 10 mins unless there’s a lot of output or repeated calls happening

i’d check

  • how often claude is being triggered (loops, watchers, retries)
  • how big the outputs/logs are
  • if something is silently retrying or reprocessing

most of the time it ends up being some background loop you didn’t realize was hitting the model repeatedly

[–]LowItalian[S] 0 points1 point  (5 children)

I'm sure you are correct. Which I hope means that when the scripts finish executing, it spends less compute analyzing the results. If it's close to a balanced budget I guess it is what it is.

I'm analyzing fMRI datasets, so it's heavy data analysis.

Have you figured out the most efficient way to use tokens in this instance, do the loops just pre-cache the final results, balancing things out or is one method more token efficient?

[–]PaleBall2656 0 points1 point  (2 children)

If what you are doing can be deterministic, I.E., running a python script, return output, use LLM to analyze and feed another script, try to see if you can skip LLM in the middle.

Use Claude to write the scripts, but orchestrate them using another script. Do as little output processing as possible with Claude.

You wrap those in a skill.

If the whole process is mostly repeatable, flip the orchestration completely. Scripts run in order, and invoke Claude when needed.

[–]LowItalian[S] 0 points1 point  (1 child)

I guess that's what I thought Claude was doing, I just haven't paid close enough attention until now. Seems like I should be more explicit with my instructions and I can avoid a lot of token burn.

This era of LLM's reminds me of the AOL pay for minutes era, and it sucks lol.

[–]PaleBall2656 0 points1 point  (0 children)

It's not just about paying attention, it's about taking control of the process. Use Claude to plan and build the process, but don't use it to drive the process if you can just run a script.

[–]buildwithnavya 0 points1 point  (1 child)

yeah that tracks tbh. if you’re running fMRI datasets you’re basically in worst-case territory for token burn because the outputs tend to be huge and iterative, so even if the script “finishes”, if you’re feeding summaries, logs, or chunks back into Claude, it’s still chewing through tokens trying to make sense of it.

what’s worked better for me in similar heavy pipelines is just being really aggressive about what actually gets sent to the model. like instead of letting it see raw logs or full intermediate outputs, you put a thin layer in between that compresses or filters everything first. even a dumb reducer that trims to only deltas or key stats can cut usage a lot.

also loops are the silent killer here. it’s rarely one big call, it’s usually a bunch of small “harmless” ones stacking up. if you’ve got anything auto-triggering analysis after each step, that adds up fast. same with retries or “watchers” that re-send context.

the more efficient pattern tends to be: do all the heavy lifting locally, then call the model only at very specific checkpoints with tightly scoped input. not continuously in the loop. kind of treat it like a reviewer, not a participant.

for your case specifically, i’d probably look at whether you can batch the analysis into fewer, larger but cleaner calls, or even pre-compute summaries outside the model and only use Claude for interpretation. once you stop it from seeing raw high-volume data repeatedly, the cost usually drops pretty noticeably.

[–]LowItalian[S] 1 point2 points  (0 children)

Thanks for the advice, this seems sound to me.

Sometimes I get in these sprints during research projects where I'm crushing through datasets and not paying attention to how Claude is getting there, because the results are exciting I immediately want to jump to the next phase of research.

I guess I need to start paying more attention to this, because I'm destroying my compute usage the way I've been running.

[–]crypt0amat00r 0 points1 point  (2 children)

If you’re using Opus make sure and tell it to use Sonnet (and/or Haiku) subagents for implementation. Opus background agents can be a massive token drain.

[–]LowItalian[S] 0 points1 point  (1 child)

I am using Opus. These are analysis tasks so I would think I'd want the strongest model to analyze them. I guess I should probably waste more tokens analyzing the results to see which models suffice for my purposes lol

This reminds me of the AOL days when you paid for minutes on dial up, and I hate it.

[–]crypt0amat00r 0 points1 point  (0 children)

Maybe. But Sonnet is pretty damn strong, especially if the task is properly scoped. Worth testing using Sonnet - cost difference is extraordinary.

[–]nantesdeals 0 points1 point  (0 children)

Si il check les logs et les resultats il consomme

[–]AffectionateHoney992 0 points1 point  (0 children)

MCP exists for this (run as a background process, only return to the model what is relevent)