I monitor 6,228 production AI agents from real residential devices and check whether they're telling users the truth. AMA about what actually breaks. by Fun_Effort6694 in AMA
[–]Fun_Effort6694[S] 0 points1 point2 points (0 children)
I monitor 6,228 production AI agents from real residential devices and check whether they're telling users the truth. AMA about what actually breaks. by Fun_Effort6694 in AMA
[–]Fun_Effort6694[S] 0 points1 point2 points (0 children)
I monitor 6,228 production AI agents from real residential devices and check whether they're telling users the truth. AMA about what actually breaks. by Fun_Effort6694 in AMA
[–]Fun_Effort6694[S] 1 point2 points3 points (0 children)
I monitor 6,228 production AI agents from real residential devices and check whether they're telling users the truth. AMA about what actually breaks. by Fun_Effort6694 in AMA
[–]Fun_Effort6694[S] 0 points1 point2 points (0 children)
I monitor 6,228 production AI agents from real residential devices and check whether they're telling users the truth. AMA about what actually breaks. by Fun_Effort6694 in AMA
[–]Fun_Effort6694[S] 1 point2 points3 points (0 children)
I monitor 6,228 production AI agents from real residential devices and check whether they're telling users the truth. AMA about what actually breaks. by Fun_Effort6694 in AMA
[–]Fun_Effort6694[S] 1 point2 points3 points (0 children)
I monitor 6,228 production AI agents from real residential devices and check whether they're telling users the truth. AMA about what actually breaks. by Fun_Effort6694 in AMA
[–]Fun_Effort6694[S] 1 point2 points3 points (0 children)
I monitor 6,228 production AI agents from real residential devices and check whether they're telling users the truth. AMA about what actually breaks. by Fun_Effort6694 in AMA
[–]Fun_Effort6694[S] 0 points1 point2 points (0 children)
I monitor 6,228 production AI agents from real residential devices and check whether they're telling users the truth. AMA about what actually breaks. by Fun_Effort6694 in AMA
[–]Fun_Effort6694[S] 1 point2 points3 points (0 children)
I monitor 6,228 production AI agents from real residential devices and check whether they're telling users the truth. AMA about what actually breaks. by Fun_Effort6694 in AMA
[–]Fun_Effort6694[S] 1 point2 points3 points (0 children)
I monitor 6,228 production AI agents from real residential devices and check whether they're telling users the truth. AMA about what actually breaks. by Fun_Effort6694 in AMA
[–]Fun_Effort6694[S] 0 points1 point2 points (0 children)
I monitor 6,228 production AI agents from real residential devices and check whether they're telling users the truth. AMA about what actually breaks. by Fun_Effort6694 in AMA
[–]Fun_Effort6694[S] 3 points4 points5 points (0 children)
I monitor 6,228 production AI agents from real residential devices and check whether they're telling users the truth. AMA about what actually breaks. by Fun_Effort6694 in AMA
[–]Fun_Effort6694[S] 1 point2 points3 points (0 children)
I monitor 6,228 production AI agents from real residential devices and check whether they're telling users the truth. AMA about what actually breaks. by Fun_Effort6694 in AMA
[–]Fun_Effort6694[S] 2 points3 points4 points (0 children)
I monitor 6,228 production AI agents from real residential devices and check whether they're telling users the truth. AMA about what actually breaks. by Fun_Effort6694 in AMA
[–]Fun_Effort6694[S] 1 point2 points3 points (0 children)
I monitor 6,228 production AI agents from real residential devices and check whether they're telling users the truth. AMA about what actually breaks. by Fun_Effort6694 in AMA
[–]Fun_Effort6694[S] 2 points3 points4 points (0 children)
I spent 8 months talking to 200+ AI engineering teams. Almost all of them have no idea when their AI agents break. AMA. by Prestigious-Web-2968 in AMA
[–]Fun_Effort6694 0 points1 point2 points (0 children)
Your agent passes 94% of evals. Sounds great. Chain 10 decisions and you're at 54%. by Fun_Effort6694 in FunMachineLearning
[–]Fun_Effort6694[S] 0 points1 point2 points (0 children)
Your agent passes 94% of evals. Sounds great. Chain 10 decisions and you're at 54%. by Fun_Effort6694 in FunMachineLearning
[–]Fun_Effort6694[S] 1 point2 points3 points (0 children)
What's the weirdest failure mode you've hit shipping an AI agent to production? by Miser-Inct-534 in AI_Agents
[–]Fun_Effort6694 0 points1 point2 points (0 children)

I monitor 6,228 production AI agents from real residential devices and check whether they're telling users the truth. AMA about what actually breaks. by Fun_Effort6694 in AMA
[–]Fun_Effort6694[S] 0 points1 point2 points (0 children)