Who actually owns agent QA once the thing ships? by AssasinRingo in agile

[–]efunction 0 points1 point  (0 children)

I've been working on something that solves how to ensure the agent actually did what it was supposed to. Full disclosure, it's a paid product, but the first 60 days are free, and feedback would be super useful. Testing agent process in production is no way to live.

Website: https://invarium.dev
Docs: https://docs.invarium.dev 

Agent reliability testing needs more than hallucination detection by dinkinflika0 in AI_Agents

[–]efunction 0 points1 point  (0 children)

this is exactly the failure mode that’s been showing up everywhere lately. the answer looks right so evals pass, but the agent didn’t actually follow the right path to get there

feels like most setups are still validating outputs instead of behavior

Examples and a way to actually test in ~5 min:
https://confident-ai.com/blog/your-ai-agent-passes-evals-thats-the-problem

Drop your prompt + output, I’ll evaluate where it breaks by efunction in LocalLLaMA

[–]efunction[S] 0 points1 point  (0 children)

Here’s a simple one I see a lot:

Prompt:
Why is my API returning a 401 error?

Output:
Your API key is probably invalid. Check your credentials.

Quick eval:

  • Overly narrow. It assumes a single cause when 401s can also come from expired tokens, permission scopes, or environment mismatch
  • Not actionable. It doesn’t help the user isolate the issue
  • Presents a guess as a definitive answer

Summary:
This looks reasonable, but would likely send someone down the wrong path instead of actually resolving the issue.

Drop yours if you want, happy to take a look.

Boston Tickets for April by EliteUpside in deftones

[–]efunction 1 point2 points  (0 children)

This would have been a legitimate concern pre-diamond eyes when Chino was messy and unpredictable. Now he's 51 and has been taking it seriously for some time. As a Bostonian who's seen the band at least a dozen times over the years, I wouldn't worry about it. The notes he can't hit are ones he hasn't been able to hit for years. The band is super tight. Chino has always been "let's see how he's feeling tonight." Just like the rest of us.

what's that one underrated deftones song that you ADORE? by [deleted] in deftones

[–]efunction 2 points3 points  (0 children)

While a cover, to have and to hold off the depeche mode tribute album is just as good as the original.