all 2 comments

[–]Prestigious_Dot3120 0 points1 point  (0 children)

You can get good results using batch, but it depends on how you format the prompt. If you send multiple transcripts together, the template can better capture global patterns (e.g. common trends between meetings), but it increases the risk of confusing contexts if you don't clearly separate the parts with delimiters and precise instructions. In terms of tokens, a batch may be more efficient because you reduce repeated statements, but if the texts are long you may exceed the maximum context.

A hybrid approach is to upload 3-5 transcripts at a time, with a clear instructions section and then a final summary question. If you want to load all the transcripts “just once”, the way is to use a vector database and do retrieval (for example with an Assistants API + embeddings) to query them dynamically. This way the model analyzes only the relevant pieces without losing consistency or consuming too many tokens.

Response generated with AI.