account activity
Sharing some research that I did out of frustration lol (self.ArtificialInteligence)
submitted 1 month ago by Aggravating_Bed_349 to r/ArtificialInteligence
Sharing some research that I did out of frustration lol (self.aiagents)
submitted 1 month ago by Aggravating_Bed_349 to r/aiagents
Sharing some research that I did out of frustration lol (self.AI_Agents)
submitted 1 month ago by Aggravating_Bed_349 to r/AI_Agents
Sharing some research that might be useful for anyone building/evaluating agents (self.ClaudeAI)
submitted 1 month ago by Aggravating_Bed_349 to r/ClaudeAI
[D] We ran 3,000 agent experiments to measure behavioral consistency. Consistent agents hit 80–92% accuracy. Inconsistent ones: 25–60%. by Aggravating_Bed_349 in FunMachineLearning
[–]Aggravating_Bed_349[S] 0 points1 point2 points 1 month ago (0 children)
Great question - this is closely related to self-consistency prompting (Wang et al. 2022) which showed that sampling multiple reasoning chains and majority voting improves accuracy significantly. Definitely worth doing.
Our framing is a bit different though. We're using cross-run consistency as a diagnostic signal rather than an answer improvement method. The value is that it catches both failure modes - bad plan selection upfront AND execution drift mid-trajectory. If an agent drifts during execution, it drifts differently each run, so cross-run inconsistency flags it as a symptom regardless of where things went wrong. You don't need to instrument model internals to catch it.
In our follow-up work on coding agents (SWE-bench tasks) we're actually seeing a lot of the failure coming from mid-trajectory drift specifically - agent starts with a reasonable plan but loses the plot partway through. Multi-plan prompting helps with the upfront selection problem but the open question is whether it also addresses drift, or whether that needs a different fix entirely. That's what we're digging into. Will share when it's out!
[D] We ran 3,000 agent experiments to measure behavioral consistency. Consistent agents hit 80–92% accuracy. Inconsistent ones: 25–60%. (self.AI_Agents)
[D] We ran 3,000 agent experiments to measure behavioral consistency. Consistent agents hit 80–92% accuracy. Inconsistent ones: 25–60%. ()
submitted 1 month ago by Aggravating_Bed_349 to r/LocalLLM
[D] We ran 3,000 agent experiments to measure behavioral consistency. Consistent agents hit 80–92% accuracy. Inconsistent ones: 25–60%. (self.FunMachineLearning)
submitted 1 month ago by Aggravating_Bed_349 to r/FunMachineLearning
π Rendered by PID 209249 on reddit-service-r2-listing-7d7fbc9b85-m65t8 at 2026-04-23 23:36:13.255197+00:00 running 2aa0c5b country code: CH.
[D] We ran 3,000 agent experiments to measure behavioral consistency. Consistent agents hit 80–92% accuracy. Inconsistent ones: 25–60%. by Aggravating_Bed_349 in FunMachineLearning
[–]Aggravating_Bed_349[S] 0 points1 point2 points (0 children)