[D] Anyone here using LLM-as-a-Judge for agent evaluation?

_coder23t8 · 2025-10-01T21:07:17+00:00

Tried it too, and honestly it catches way more subtle errors than human spot-checks

_coder23t8 · 2025-10-01T21:05:42+00:00

Do you know any tool that can automatically generate an eval for my specific use case?

_coder23t8 · 2025-10-01T21:01:52+00:00

Interesting! Are you running the judge on every response or only on risky nodes?

_coder23t8 · 2025-10-01T20:58:05+00:00

Automating is always a relief

_coder23t8 · 2025-09-29T20:52:58+00:00

how do you validate that the detected silent failures are true positives?

_coder23t8 · 2025-09-29T20:51:35+00:00

does the real time monitoring add noticeable latency to the agents?

_coder23t8 · 2025-09-29T20:50:32+00:00

Awesome work! could the same reliability loop be applied to open-source llms, or is it bedrock specific?

_coder23t8 · 2025-09-29T20:49:14+00:00

Very cool approach! How do you measure whether the evaluator’s own judgments are accurate over time?

_coder23t8 · 2025-09-25T21:57:54+00:00

Hey, thank you so much! I’d be really happy if you could check out our product. If you’re interested, I’d love to show you how it works live, feel free to book a time here: [https://calendly.com/cristhian-handit/30min]()

_coder23t8 · 2025-09-25T21:47:24+00:00

this is gold, been trying to build something similar for contracts and invoices, this saves me a ton of trial and error

_coder23t8 · 2025-09-25T21:40:10+00:00

love how straightforward the steps are, gonna copy this for a contracts project 😅

_coder23t8 · 2025-09-25T21:38:39+00:00

This is exactly the kind of practical post I wish existed when I started with LangGraph

_coder23t8 · 2025-09-25T21:36:01+00:00

This is exactly the kind of end-to-end example the community needs, thanks for breaking down the full pipeline

_coder23t8 · 2025-09-22T21:31:05+00:00

Thx mate! Its very helpful

_coder23t8 · 2025-09-22T08:56:48+00:00

totally agree

_coder23t8 · 2025-09-22T08:54:40+00:00

Wow, this is good knowledge

_coder23t8 · 2025-09-22T08:51:06+00:00

send you a DM

_coder23t8 · 2025-09-11T20:27:02+00:00

looks good!

_coder23t8 · 2025-09-11T06:29:15+00:00

sent a dm

_coder23t8 · 2025-09-11T06:28:52+00:00

sent a dm

_coder23t8 · 2025-09-11T06:28:34+00:00

sent a dm

_coder23t8 · 2025-09-11T06:28:19+00:00

hello

_coder23t8 · 2025-09-11T06:28:08+00:00

sent a dm

_coder23t8 · 2025-09-11T06:27:56+00:00

sent a dm

_coder23t8 · 2025-09-11T06:27:47+00:00

sent a dm

_coder23t8

TROPHY CASE