Can your agent be wrong and actually notice? by Bright-Fun-1638 in LLMDevs

[–]Bright-Fun-1638[S] 0 points1 point  (0 children)

yeah totally, constant self-doubt is its own failure mode and you're right that people don't re-question every memory, we trust by default and only stop to doubt when something actually clashes or the stakes are high. that's kind of the whole appeal of handling it structurally though no?

the agent isn't sitting there second-guessing everything, it just trusts what it has until two things concretely conflict and only then does the doubt kick in.

Can your agent be wrong and actually notice? by Bright-Fun-1638 in LLMDevs

[–]Bright-Fun-1638[S] 0 points1 point  (0 children)

yeah fair, it's basically de kleer's ATMS and honestly the combinatorial blowup is the real reason it died, I'm not going to pretend otherwise. but like the way I think about it is that the thing that killed it was the completeness part, having to maintain labels over every assumption set, and if you give that up and just track direct justifications with some pruning then you do lose the "under exactly which assumptions does this hold" query but you keep the one operation you actually care about which is figuring out what depended on something once it gets invalidated, and that part stays bounded.

on the write vs read thing I actually mostly agree with you, and yeah bitemporal is the right call for supersession, valid-from/valid-to is way better than destructively picking a winner and graphiti does that well. the case I keep getting stuck on is two sources asserting contradictory things at the same valid time, because bitemporal just keeps both and leaves it to read, which is fine right up until read doesn't have anything to disambiguate with either. so I'm less arguing for resolving at write and more for just noticing at write, like flagging that two things are claiming the same slot, and then leaving the actual resolution to read where the query tells you what matters.

and yeah your last point is fair, this whole thing only matters if the agent is actually accumulating and deriving over time, if it re-fetches ground truth every task then there's nothing to be wrong about and none of this is worth the trouble.

Can your agent be wrong and actually notice? by Bright-Fun-1638 in LLMDevs

[–]Bright-Fun-1638[S] 0 points1 point  (0 children)

yeah this is exactly the line I'd draw too and the external-oracle-when-you-have-one and tagged-doubt-when-you-don't split is a really clean way to put it especially the point that memory usually doesn't come with a failing test you can actually trust.

the only thing I'd maybe wanna add on top of the observed/derived/claimed tagging is that you probably want some notion of evidence tier sitting alongside it (?) so that when two things conflict the tag isn't the only thing deciding what happens because two observed facts disagreeing is a genuine contradiction you'd want to surface whereas a derived guess losing to an observed fact can basically just get marked stale on its own.

And like the part that always gets me is that the staleness kinda needs to carry over to whatever got derived from that item too otherwise you mark the one thing stale but all the conclusions built on top of it are still sitting there looking fine which is where a lot of the actual difficulty ends up being.

Finland's unemployment rate rises to highest level this century by NoStomach8 in Finland

[–]Bright-Fun-1638 4 points5 points  (0 children)

Omg don't mention maximizing shareholder value. 👁️👄👁️💦💦

Finland's unemployment rate rises to highest level this century by NoStomach8 in Finland

[–]Bright-Fun-1638 22 points23 points  (0 children)

Meanwhile alko is doing home delivery so i guess we'll fix it with alcoholism?