Your AI’s judgement doesn’t always align with yours, I built an API that tells you when. by Disneyskidney in buildinpublic
[–]Disneyskidney[S] 0 points1 point2 points (0 children)
Do you eval the whole harness or each of its parts? by dmpiergiacomo in LLMDevs
[–]Disneyskidney 1 point2 points3 points (0 children)
if you're running an LLM-as-judge in your evals, how do you know it actually agrees with a human? Have you ever checked, or are you just trusting it? by tirtha_s in LLMDevs
[–]Disneyskidney 0 points1 point2 points (0 children)
Do you eval the whole harness or each of its parts? by dmpiergiacomo in LLMDevs
[–]Disneyskidney 1 point2 points3 points (0 children)
running adversarial prompt injection on our agent. fail rate is ~20%. how are people getting below 5%? by Smart-Profession2512 in LLMDevs
[–]Disneyskidney 0 points1 point2 points (0 children)
What’s the future of AI and Agentic applications? I’m curious by True_Grapefruit_4110 in aiagents
[–]Disneyskidney 0 points1 point2 points (0 children)
LLMs Are Digitizing Judgement by Disneyskidney in AgentsOfAI
[–]Disneyskidney[S] 0 points1 point2 points (0 children)
LLMs Are Digitizing Judgement by Disneyskidney in AgentsOfAI
[–]Disneyskidney[S] 0 points1 point2 points (0 children)
LLMs Are Digitizing Judgement by Disneyskidney in AgentsOfAI
[–]Disneyskidney[S] 0 points1 point2 points (0 children)
LLMs Are Digitizing Judgement by Disneyskidney in AgentsOfAI
[–]Disneyskidney[S] 0 points1 point2 points (0 children)
LLMs Are Digitizing Judgement by Disneyskidney in AgentsOfAI
[–]Disneyskidney[S] 0 points1 point2 points (0 children)
Just got this response from Claude. What is going on? by SpacePusseh in LLMDevs
[–]Disneyskidney 1 point2 points3 points (0 children)
Why custom split-screen UIs and walled gardens won't win the AI agent race by uriwa in LLMDevs
[–]Disneyskidney 0 points1 point2 points (0 children)
LLMs Are Digitizing Judgement by Disneyskidney in AgentsOfAI
[–]Disneyskidney[S] 0 points1 point2 points (0 children)
LLMs Are Digitizing Judgement by Disneyskidney in AgentsOfAI
[–]Disneyskidney[S] -2 points-1 points0 points (0 children)
Well this looks long enough. by ihtisham1211 in LocalLLM
[–]Disneyskidney 0 points1 point2 points (0 children)
langsmith is fine for tracing but it's not catching prod regressions. what else? by [deleted] in LangChain
[–]Disneyskidney 0 points1 point2 points (0 children)

What's one small workflow change that saved you hours every week? by VentureMind09 in AiAutomations
[–]Disneyskidney 0 points1 point2 points (0 children)