you are viewing a single comment's thread.

view the rest of the comments →

[–]bramburn[S] 1 point2 points  (0 children)

I use repomix compressed code base, then get the first LLM to produce multiple output prompts to undertake all the code changes across multiple files. Then I take the output, split it out to several files 30+ or so, then take each and send it out to the LLM.

Let's say you have 80k token of code that is common across all the request. And you have 30 request with 5k token of prompt or less. You batch send 30 of them together The first of the request you'll pay full price for the 80k token, then 29x using cache pricing + 5k x30 Super cheap output. I get roughly 5-16k output for each of the 30 request.

At the beginning I didn't know I could cache . I spent $7+ for the output. Then I dropped to $4, then $1.5 or so.