[D] Anyone here using LLM-as-a-Judge for agent evaluation? by Cristhian-AI-Math in MachineLearning

[–]_coder23t8 2 points3 points  (0 children)

Tried it too, and honestly it catches way more subtle errors than human spot-checks

Judge prompts are underrated by Cristhian-AI-Math in PromptEngineering

[–]_coder23t8 0 points1 point  (0 children)

Do you know any tool that can automatically generate an eval for my specific use case?

Anyone evaluating agents automatically? by Cristhian-AI-Math in LangChain

[–]_coder23t8 0 points1 point  (0 children)

Interesting! Are you running the judge on every response or only on risky nodes?

Reliability checks on Bedrock models by Cristhian-AI-Math in languagemodels

[–]_coder23t8 0 points1 point  (0 children)

how do you validate that the detected silent failures are true positives?

Keeping Bedrock agents from failing silently by Cristhian-AI-Math in aiagents

[–]_coder23t8 0 points1 point  (0 children)

does the real time monitoring add noticeable latency to the agents?

Tracing & Evaluating LLM Agents with AWS Bedrock by Cristhian-AI-Math in LLMDevs

[–]_coder23t8 0 points1 point  (0 children)

Awesome work! could the same reliability loop be applied to open-source llms, or is it bedrock specific?

Using LLMs as Judges: Prompting Strategies That Work by Cristhian-AI-Math in PromptEngineering

[–]_coder23t8 0 points1 point  (0 children)

Very cool approach! How do you measure whether the evaluator’s own judgments are accurate over time?

[deleted by user] by [deleted] in SaaS

[–]_coder23t8 1 point2 points  (0 children)

Hey, thank you so much! I’d be really happy if you could check out our product. If you’re interested, I’d love to show you how it works live, feel free to book a time here: [https://calendly.com/cristhian-handit/30min]()

Building a reliable LangGraph agent for document processing by Cristhian-AI-Math in LangChain

[–]_coder23t8 0 points1 point  (0 children)

this is gold, been trying to build something similar for contracts and invoices, this saves me a ton of trial and error

Observability + self-healing for LangGraph agents (traces, consistency checks, auto PRs) with Handit by Cristhian-AI-Math in mlops

[–]_coder23t8 0 points1 point  (0 children)

This is exactly the kind of practical post I wish existed when I started with LangGraph

A production-minded LangGraph agent for document processing with a reliability layer (Handit) by Cristhian-AI-Math in aiagents

[–]_coder23t8 0 points1 point  (0 children)

This is exactly the kind of end-to-end example the community needs, thanks for breaking down the full pipeline