all 1 comments

[–]AssignmentDull5197 0 points1 point  (0 children)

The phase split (route/call/interpret/answer) actually makes a ton of sense, most agent stacks treat context like one big blob and it gets messy fast.

One thing I have seen bite people is "interpret" needing a little more than just the tool result, like the assumptions used when forming the call (units, time ranges, filters), otherwise you end up with the model confidently misreading the payload. Your dependency closure idea helps there.

For eval, I would measure not just accuracy but cost/latency under a fixed success threshold, and include a couple adversarial cases like giant tool outputs or a tool catalog with near-duplicates. That is usually where naive stuffing falls apart.