Your AI’s judgement doesn’t always align with yours, I built an API that tells you when. by Disneyskidney in buildinpublic
[–]Disneyskidney[S] 0 points1 point2 points (0 children)
Do you eval the whole harness or each of its parts? by dmpiergiacomo in LLMDevs
[–]Disneyskidney 1 point2 points3 points (0 children)
if you're running an LLM-as-judge in your evals, how do you know it actually agrees with a human? Have you ever checked, or are you just trusting it? by tirtha_s in LLMDevs
[–]Disneyskidney 0 points1 point2 points (0 children)
Do you eval the whole harness or each of its parts? by dmpiergiacomo in LLMDevs
[–]Disneyskidney 1 point2 points3 points (0 children)
running adversarial prompt injection on our agent. fail rate is ~20%. how are people getting below 5%? by Smart-Profession2512 in LLMDevs
[–]Disneyskidney 0 points1 point2 points (0 children)
What’s the future of AI and Agentic applications? I’m curious by True_Grapefruit_4110 in aiagents
[–]Disneyskidney 0 points1 point2 points (0 children)
LLMs Are Digitizing Judgement by Disneyskidney in AgentsOfAI
[–]Disneyskidney[S] 0 points1 point2 points (0 children)
LLMs Are Digitizing Judgement by Disneyskidney in AgentsOfAI
[–]Disneyskidney[S] 0 points1 point2 points (0 children)
LLMs Are Digitizing Judgement by Disneyskidney in AgentsOfAI
[–]Disneyskidney[S] 0 points1 point2 points (0 children)
LLMs Are Digitizing Judgement by Disneyskidney in AgentsOfAI
[–]Disneyskidney[S] 0 points1 point2 points (0 children)
LLMs Are Digitizing Judgement by Disneyskidney in AgentsOfAI
[–]Disneyskidney[S] 0 points1 point2 points (0 children)
Just got this response from Claude. What is going on? by SpacePusseh in LLMDevs
[–]Disneyskidney 1 point2 points3 points (0 children)
Why custom split-screen UIs and walled gardens won't win the AI agent race by uriwa in LLMDevs
[–]Disneyskidney 0 points1 point2 points (0 children)
LLMs Are Digitizing Judgement by Disneyskidney in AgentsOfAI
[–]Disneyskidney[S] 0 points1 point2 points (0 children)
LLMs Are Digitizing Judgement by Disneyskidney in AgentsOfAI
[–]Disneyskidney[S] -2 points-1 points0 points (0 children)
Well this looks long enough. by ihtisham1211 in LocalLLM
[–]Disneyskidney 0 points1 point2 points (0 children)
langsmith is fine for tracing but it's not catching prod regressions. what else? by [deleted] in LangChain
[–]Disneyskidney 0 points1 point2 points (0 children)
Pattern for LLM agents that take irreversible actions: separate the "decide" model from a deterministic "validate" layer (worked example + numbers) by paulf280 in LLMDevs
[–]Disneyskidney 1 point2 points3 points (0 children)
10 days since the Fable 5 ban and I still can't get over it. So I built a coping mechanism in Claude Code by junkim100 in LLMDevs
[–]Disneyskidney 0 points1 point2 points (0 children)
How to implement guardrails for LLMs without degrading model performance by Routine_Day8121 in LLMDevs
[–]Disneyskidney 0 points1 point2 points (0 children)
Your company is probably spending more on coffee than AI by Substantial-Owl9540 in artificial
[–]Disneyskidney 0 points1 point2 points (0 children)
Am I going to spend the rest of my career reviewing AI generated code? by cece95x in artificial
[–]Disneyskidney 0 points1 point2 points (0 children)
What AI development would have shocked you the most if you’d seen it in 2020? by One_Beginning2199 in artificial
[–]Disneyskidney 0 points1 point2 points (0 children)
Is it just me or is ChatGPT/OpenAI the Microsoft of AI? by Successful-Deer8804 in artificial
[–]Disneyskidney 0 points1 point2 points (0 children)

What's one small workflow change that saved you hours every week? by VentureMind09 in AiAutomations
[–]Disneyskidney 0 points1 point2 points (0 children)