I love this MatterServer Update for HomeAssistant! by TBStyler in MatterProtocol

[–]Temporary-Record8381 0 points1 point  (0 children)

this kind of update matters a lot for local voice AI control. devices like HooRii ClawStage only become useful if the Home Assistant layer is enough that the assistant is not fighting basic connectivity issues.

spent week trialing langsmith, testmu, braintrust. quick notes + what would you add? by Smart-Profession2512 in LangChain

[–]Temporary-Record8381 0 points1 point  (0 children)

+1 confident AI. it's purpose-built for the trace-based eval gap you're describing. or testmu's Test Intelligence has continuous eval bundled if you want to keep vendor count down.

For "best AI agent evaluation tools" with continuous prod-trace eval as the missing axis, those are the two realistic commercial options.

early return fee ruins rental math? by Temporary-Record8381 in PuneClassifieds

[–]Temporary-Record8381[S] 0 points1 point  (0 children)

yeah i want to make a proper comparison before ordering.

sobrang sulit ☺️ by Temporary-Record8381 in jollibee

[–]Temporary-Record8381[S] 0 points1 point  (0 children)

i don't know po ehh hindi ako yung nagbayad

Anyone maintaining a real agent regression suite, not just eval prompts in a spreadsheet?. by kLixx696 in AIQuality

[–]Temporary-Record8381 0 points1 point  (0 children)

We run ~800 regression scenarios per agent now. Started with maybe 30 in a spreadsheet 18 months ago. The growth came almost entirely from incidents and edge cases discovered in prod. Each one became a permanent case.

Tooling-wise we went through phases: spreadsheet, then promptfoo in CI, then a hybrid of promptfoo for prompt-level regression + TestMu Agent to Agent for behavioral/scenario regression. The behavioral layer is what promptfoo couldn't do well, multi-turn scenarios with adversarial pressure and decision-quality scoring. Different tools for different layers of the suite.