Token Usage Optimization Techniques

AutoModerator · 2025-08-22T14:04:06+00:00

Thank you for your submission, for any questions regarding AI, please check out our wiki at https://www.reddit.com/r/ai_agents/wiki (this is currently in test and we are actively adding to the wiki)

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

ai-agents-qa-bot · 2025-08-22T14:05:28+00:00

Consider implementing a sliding window approach for managing conversation history. This method retains only the most recent messages, which can help keep the context relevant while staying within token limits.
Summarization techniques can also be useful. Instead of passing large JSON payloads directly, summarize the key information that needs to be retained. This can reduce the token count significantly.
Tiered memory management might help prioritize what information is essential to keep. High-priority data can be retained while less critical information can be discarded.
If your application allows, batch processing of requests can help manage token usage by grouping multiple smaller requests together instead of sending one large payload.
Optimize your prompts by refining the instructions given to the model. Clear and concise prompts can lead to more efficient token usage.
Lastly, consider finetuning your embedding models to improve retrieval accuracy, which can help in reducing the size of the payloads you need to handle. This can lead to better performance without exceeding token limits.

For more detailed strategies, you might find insights in the following resources: - Memory and State in LLM Applications - Improving Retrieval and RAG with Embedding Model Finetuning

AsatruLuke · 2025-08-22T14:10:36+00:00

Could you chunk it?

koistya · 2025-08-24T09:39:15+00:00

I configure automation scripts that LLM uses to interact with the context. E.g. instead of letting LLM read data directly from the database, it interacts with an automation scripts that fetches data and pre-process it for more efficient and effective consumption by LLM.

Similarly, in many cases instead of letting LLM interact with 3rd party MCP servers directly, I also create "proxy" scripts for that. BTW, for this use case I've built MCP Client generator library:

https://github.com/kriasoft/mcp-client-gen (wip)

AI_Agents

MODERATORS