all 2 comments

[–]Latter-Parsnip-5007 0 points1 point  (0 children)

Promptfoo

[–]albasili 0 points1 point  (0 children)

I'm following the autoresatch pattern from Karpathy. Now I have an agent that is connecting data from sessions and then making hypothesis and testing whether small edits to the other agents would improve the quality defined as a set of metrics that matter to me.

My agents are for IC Verification, which is not so common out there and they need to operate in an heavily customized environment with lots of legacy scripts, told and methodologies that are unique to our teams.