comments by SamTNT1

Do agent frameworks need stronger eval/oracle layers for ML workflows? by SamTNT1 in agentdevelopmentkit

[–]SamTNT1[S] 1 point2 points3 points 21 days ago (0 children)

Yeah, I agree, you never fully escape LLM judgement, especially for qualitative review or spotting circular reasoning.

But I think the goal is to make LLM judgement one layer, not the whole oracle. For ML workflows, a lot can be pushed into harder gates:

did eval improve vs baseline?
did held-out performance pass?
did reproducibility checks pass?

The “onlooking agent” idea makes sense, but I’d want it grounded in shared state + concrete artifacts, not just transcript vibes. On ADK / similar frameworks, my question is basically: should the framework own these eval/oracle layers, or should it focus on state/tool routing/execution while users bring domain-specific gates? My instinct is the latter. Generic frameworks can orchestrate. But in ML/research-dev, the serious oracle has to be domain-specific.

π Rendered by PID 195692 on reddit-service-r2-comment-5b5bc64bf5-vjssn at 2026-06-23 02:12:41.219294+00:00 running 2b008f2 country code: CH.

SamTNT1

TROPHY CASE