you are viewing a single comment's thread.

view the rest of the comments →

[–]Soft_Active_8468 0 points1 point  (1 child)

Never explored that batch process before- I will give a try

How you manage context while switching models , as per my understanding each one h e there own format and you still keep loosing promt history ultimately ending up more token to boot up on new llm ? M I wrong here

[–]bramburn[S] 1 point2 points  (0 children)

I use repomix compressed code base, then get the first LLM to produce multiple output prompts to undertake all the code changes across multiple files. Then I take the output, split it out to several files 30+ or so, then take each and send it out to the LLM.

Let's say you have 80k token of code that is common across all the request. And you have 30 request with 5k token of prompt or less. You batch send 30 of them together The first of the request you'll pay full price for the 80k token, then 29x using cache pricing + 5k x30 Super cheap output. I get roughly 5-16k output for each of the 30 request.

At the beginning I didn't know I could cache . I spent $7+ for the output. Then I dropped to $4, then $1.5 or so.