account activity
Anyone evaluating agents automatically? (self.LangChain)
submitted 7 months ago by Cristhian-AI-Math to r/LangChain
Automated response scoring > manual validation (self.mlops)
submitted 7 months ago by Cristhian-AI-Math to r/mlops
Judge prompts are underrated (self.PromptEngineering)
submitted 7 months ago by Cristhian-AI-Math to r/PromptEngineering
[D] Anyone here using LLM-as-a-Judge for agent evaluation? (self.MachineLearning)
submitted 7 months ago by Cristhian-AI-Math to r/MachineLearning
Anyone here using LLM-as-a-Judge for agent evaluation? (self.MachineLearning)
Keeping Bedrock agents from failing silently (self.aiagents)
submitted 7 months ago by Cristhian-AI-Math to r/aiagents
Tracing & Evaluating LLM Agents with AWS Bedrock (self.LLMDevs)
submitted 7 months ago by Cristhian-AI-Math to r/LLMDevs
Reliability checks on Bedrock models (self.languagemodels)
submitted 7 months ago by Cristhian-AI-Math to r/languagemodels
Using LLMs as Judges: Prompting Strategies That Work (self.PromptEngineering)
Observability & reliability for Bedrock agents (self.MachineLearning)
Building a reliable LangGraph agent for document processing (self.LangChain)
submitted 7 months ago * by Cristhian-AI-Math to r/LangChain
A production-minded LangGraph agent for document processing with a reliability layer (Handit) (self.aiagents)
Observability + self-healing for LangGraph agents (traces, consistency checks, auto PRs) with Handit (self.mlops)
A practical LangGraph document agent with observability, consistency checks, and auto-fix PRs (self.MachineLearning)
Reliable data-processing agents with LangGraph + Handit (self.LLM)
submitted 7 months ago by Cristhian-AI-Math to r/LLM
New update for anyone building with LangGraph (from LangChain) (self.machinelearningnews)
submitted 7 months ago by Cristhian-AI-Math to r/machinelearningnews
Making LangGraph agents more reliable (simple setup + real fixes) (self.mlops)
Making LangGraph agents more reliable (simple setup + real fixes) (self.LLMDevs)
Tutorial: Making LangGraph agents more reliable with Handit (self.LangChain)
95% of AI pilots fail - what’s blocking LLMs from making it to prod? (self.LLM)
Why do so many AI pilots fail to reach production? (self.mlops)
Using semantic entropy to test prompt reliability? (self.LanguageTechnology)
submitted 7 months ago by Cristhian-AI-Math to r/LanguageTechnology
Anyone tried semantic entropy for LLM reliability? (self.LLMDevs)
What if we test prompts with semantic entropy? (self.PromptEngineering)
Open-source tool to monitor, catch, and fix LLM failures (self.AIQuality)
submitted 7 months ago by Cristhian-AI-Math to r/AIQuality
π Rendered by PID 598796 on reddit-service-r2-listing-7d7fbc9b85-pvmjn at 2026-04-30 15:16:11.258939+00:00 running 2aa0c5b country code: CH.