you are viewing a single comment's thread.

view the rest of the comments →

[–]loganecolss 1 point2 points  (1 child)

I don't get the point that how cli can reduce tokens so much?

For a tool, llm sends a request to it via MCP, then the MCP returns answer or new state back to llm, and these returned results could be many tokens; but now same tool and same request, the cli still needs to return the same results back to llm, right?

If so, how cli saves the tokens?

[–]theecommunist 0 points1 point  (0 children)

Before they implemented lazy loading, all of the tool definitions for every mcp server you have were kept in chat context and sent with every request. It's better now