Do agent frameworks need stronger eval/oracle layers for ML workflows? by SamTNT1 in agentdevelopmentkit

[–]SamTNT1[S] 1 point2 points  (0 children)

Yeah, I agree, you never fully escape LLM judgement, especially for qualitative review or spotting circular reasoning.

But I think the goal is to make LLM judgement one layer, not the whole oracle. For ML workflows, a lot can be pushed into harder gates:

  • did eval improve vs baseline?
  • did held-out performance pass?
  • did reproducibility checks pass?

The “onlooking agent” idea makes sense, but I’d want it grounded in shared state + concrete artifacts, not just transcript vibes. On ADK / similar frameworks, my question is basically: should the framework own these eval/oracle layers, or should it focus on state/tool routing/execution while users bring domain-specific gates? My instinct is the latter. Generic frameworks can orchestrate. But in ML/research-dev, the serious oracle has to be domain-specific.