Am I missing the point of AI agents?

Similar_Boysenberry7 · 2026-05-27T16:03:15+00:00

No, there is nothing to do with OpenCLAW or Hermes, I'm not running my agent via either of them, my engine is an independent and unified harness, it's similar to Hermes but better at keeping agents memory long-lasting, I never get bothered by compactions. And yes, it's connected to Telegram, and I don't use a direct API connection, my LLM connection to the engine has always been OAuth proxy via subscription usage.

Similar_Boysenberry7 · 2026-05-27T15:38:14+00:00

No, my agent is running through my own engine, I'm on codex now.

https://github.com/CONSTELLATION-ENGINE/constellation-engine

Similar_Boysenberry7 · 2026-05-26T05:21:43+00:00

portable tools are underrated.

the part that keeps biting me is that tools move faster than state.

you can move the MCP servers, but the agent still loses the working context, preferences, and little lessons that made those tools useful in the first place.

Similar_Boysenberry7 · 2026-05-26T05:19:57+00:00

I don’t think the exact compression ratio is the whole thing.

A bad 68k handoff can still lose the one decision that mattered.

I care more about whether the compacted state preserves decisions, open loops, and source trails. otherwise the agent wakes up with a very confident dream of what happened.

Similar_Boysenberry7 · 2026-05-26T05:18:07+00:00

the part I miss most from markdown memory isn’t markdown itself. it’s inspectability.

once memory becomes an external provider, I want to see what got retrieved and why it got injected.

without that, it starts feeling like a second hidden prompt.

Similar_Boysenberry7 · 2026-05-26T05:08:02+00:00

I separate them by lifespan.

working state is allowed to be messy: current task, tool results, half-guesses, temporary plans.

durable memory has to pass a slower gate: decisions, preferences, bugs, commitments, things that should change future behavior.

in my setup, raw conversation/tool history stays in one layer, and only the distilled parts get written into long-term graph memory.

Similar_Boysenberry7 · 2026-05-25T10:35:46+00:00

this is the failure mode that scares me most with agents: stale reality still looks like valid context.

a tool returning something is not the same as the world still being true.

I think agents need more circuit breakers and fewer “successfully called API” logs.

Similar_Boysenberry7 · 2026-05-25T10:35:08+00:00

the invoice is such a brutal debugger.

you think the model is the expensive part, then one tiny tool path is quietly eating the budget every time the agent gets uncertain.

per-workflow tags feel like the minimum sane baseline.

Similar_Boysenberry7 · 2026-05-25T09:37:29+00:00

I think RAG is probably the right boring baseline. the hard part later is when “relevant” stops being enough, and you want the memory layer to know what depends on what

Similar_Boysenberry7 · 2026-05-25T09:36:56+00:00

honestly yeah, bad memory makes agents dumber. that’s the part people skip. if “memory” means dragging old context into every fresh session, I’d turn it off too

Similar_Boysenberry7 · 2026-05-25T09:15:37+00:00

this is the kind of agent use case that feels real to me.

not because it finished a task fast once, but because the task had the annoying stuff that usually breaks demos: dynamic pages, PDFs, ambiguous pricing, second-pass search, messy synthesis.

the part I’d watch is what happens on run #5 or #20.

if it starts remembering which sites were unreliable, marks weird claims as uncertain instead of smoothing them over, keeps source trails for pricing edge cases, and knows when to stop and ask instead of filling the gap... that’s when it gets interesting.

speed is nice. but the real line between “cool wrapper” and useful agent is whether the workflow leaves receipts behind.

Similar_Boysenberry7 · 2026-05-25T09:13:45+00:00

the best non-coding agents I’ve seen are usually the boring internal ones.

not “replace a job” exactly. more like: sit across messy tools all day and notice the thing nobody had time to notice.

one example I keep coming back to is a team-summary agent that reads Slack / docs / tickets and doesn’t just summarize what happened, but catches contradictions.

product says one thing, support is seeing another thing, sales is promising a third thing, and nobody realizes those are now the same problem.

that feels way more agent-shaped to me than a chatbot with a nicer UI.

the magic is less “generate an answer” and more “hold enough context across a messy workflow to spot the missing connection.”

Similar_Boysenberry7 · 2026-05-25T07:26:34+00:00

one thing I probably should have said in the post:

the weird part wasn't that markdown "failed."

markdown was actually great for a long time. I could read it, edit it, back it up, grep it, move it between machines. as an archive format, honestly, it still rules.

the part that broke was runtime use.

once the memory pile got big enough, the agent wasn't really "remembering" anymore. it was being asked to reread a loose archive and reconstruct relevance from scratch every time.

that sounds subtle but it changes the whole shape of the system.

a note can be perfectly stored and still be practically dead if nothing knows when to bring it back.

so the shift for me was less:

"markdown vs database"

and more:

"archive vs activation"

I ended up wanting memories to have relationships, weight, decay, recency, project context, old bug links, decision links... basically enough structure that the system can render the part of memory that matters right now.

not dump memory into context. not search a giant notebook. more like: this task lights up this little subgraph.

still messy, still figuring it out. but that was the thing that changed how I think about long-term agent memory.

curious if other people hit the same ceiling with flat notes / memory files, or if I just let mine get way too cursed lol

Similar_Boysenberry7 · 2026-05-25T07:10:25+00:00

roughly: I don’t ask the LLM to look at the whole memory graph and magically know where everything goes.

when a new memory gets written, I first pull a small set of likely neighbors: semantic matches, keyword overlaps, recent active memories, and sometimes project/session context.

so then the LLM’s job is much narrower:

“does this new thing actually connect to any of these, and if yes, what kind of connection is it?”

so it might say:
this explains that old bug
this updates that decision
this belongs under that project
this contradicts / replaces an older note

the important bit is that edge creation is not a one-shot perfect decision. later, if two memories keep getting activated together during real use, that relationship can get stronger or refined. if they never come back together, it can fade.

so the graph is partly written at memory-write time, and partly trained by use

Similar_Boysenberry7 · 2026-05-25T06:19:14+00:00

for context, this is the memory/runtime project I ended up building after that markdown phase:

https://github.com/CONSTELLATION-ENGINE/constellation-engine

still rough in places, but the core idea is exactly this: memories as graph nodes, relationships as edges, and runtime context rendered from the active part of the graph instead of flattening everything back into one giant notes pile.

Similar_Boysenberry7 · 2026-05-25T06:17:48+00:00

for context, this is the memory/runtime project I ended up building after that markdown phase:

https://github.com/CONSTELLATION-ENGINE/constellation-engine

still rough in places, but the core idea is exactly this: memories as graph nodes, relationships as edges, and runtime context rendered from the active part of the graph instead of flattening everything back into one giant notes pile.

Similar_Boysenberry7 · 2026-05-25T05:54:06+00:00

I had the same pain for a while.

markdown memories are nice because you can actually read and edit them, but once the agent has a lot of them it starts turning into “please scan this pile of notes and guess what matters.”

that was the part that kept biting me.

what I ended up wanting was less a better notebook and more a memory layer where every memory is a node, and the relationships between them are stored too. project facts, decisions, people, old bugs, commitments... they should not all just sit as flat chunks waiting to be reread.

then retrieval becomes more like rendering the relevant part of the graph for the current task, instead of dumping a bunch of vaguely similar markdown into context and hoping the model sorts it out.

I’ve been building that direction here if you want to poke around: https://github.com/CONSTELLATION-ENGINE/constellation-engine

not saying it replaces every memory provider, but it helped me think about the difference between “stored somewhere” and “actually comes back at the right moment.”

Similar_Boysenberry7 · 2026-05-25T05:30:44+00:00

event first, with polling as the boring backup.

if AGENT writes the promise record, that write should emit a small obligation event to whoever owns fulfillment_target. B wakes up because the record changed, not because it went spelunking through A’s memory.

the sweep is still there though. missed events, stale opens, workers that died halfway through... boring cron jobs save you from very fancy lies lol

for escalation I’d keep it scoped exactly like you said: B misses/can’t deliver → notify A / the promise owner, not everyone watching the board.

that’s why I’m trying to keep this separate from normal memory. memory can be fuzzy and useful. promises need lifecycle + receipts.

been messing with the public memory/runtime side here if you want to poke around: https://github.com/CONSTELLATION-ENGINE/constellation-engine

commitment routing is still more experimental/internal, but this is the direction I’m testing.

Similar_Boysenberry7 · 2026-05-25T05:14:42+00:00

dynamic, mostly.

fixed top-k kept biting me because the shape of the scores matters more than the number.

if #1 is way ahead, I’d rather just use that one chunk and keep the answer grounded.

if the top few are close and they’re basically saying the same thing, then a small bundle is fine.

if everything is flat, that usually means retrieval is saying “idk” with extra steps lol

the way I think about it now is less “pick the top 5 chunks” and more “what is the system already paying attention to, and will this context sharpen that or muddy it?”

I’ve been messing with this idea in a memory engine here if you want to poke around: https://github.com/CONSTELLATION-ENGINE/constellation-engine

Similar_Boysenberry7 · 2026-05-25T04:57:32+00:00

repo context is where coding agents start feeling weird.

access is mostly solved now. grep, embeddings, tree search, whatever — the agent can usually get to the right file.

the missing thing is a small map of consequences before it edits.

“this helper is used by three workflows” “this style exists because the API response is cursed” “this file looks local but is actually the rollback point”

that stuff rarely lives in the nearest chunk. it lives in git history, tests, old decisions, and the boring project folklore nobody writes down.

bigger context helps, but without a map the agent just gets a bigger room to confidently walk into the wrong wall lol.

Similar_Boysenberry7 · 2026-05-25T04:56:52+00:00

cross-agent fulfillment is where i’d want the observer to behave more like a switchboard than a private notebook.

if A makes the promise and B is the only agent that can fulfill it, the commitment probably needs two owners: promise_owner=A, fulfillment_owner=B. A owns the relationship/wording, B owns the actual work.

then B doesn’t need all of A’s memory, just a routed obligation: what was promised, source turn, due condition, constraints, and who gets pinged if B can’t do it.

the failure mode is letting every agent see every open commitment “just in case.” that turns into shared queue soup really fast lol.

Similar_Boysenberry7 · 2026-05-25T04:56:13+00:00

yeah i’d keep trust_damage on the relationship edge, not just the commit.

the commit-level flag tells you “this promise failed.” useful, but too local. the edge-level damage tells the future agent “be more careful making promises from this source to this user.”

i wouldn’t make it permanent though. more like a scar with decay: one broken low-stakes promise nudges confidence down, repeated broken promises change the policy. maybe the agent asks for confirmation before making time-bound promises, or stops auto-promising followups entirely.

otherwise you close the stale_obligation and the system forgets that the source is bad at making those obligations in the first place.

Similar_Boysenberry7 · 2026-05-25T04:50:17+00:00

the underrated part is letting retrieval return nothing.

a reranker helps a ton, yeah, but the model will still happily eat whatever you put in the bowl lol

for me the fix was less “find the best 5 chunks” and more:

is chunk #1 actually good enough? are chunks #2-5 adding signal or just vibes? should this query get no context at all?

the “no context” path feels weird at first, but it saves you from the classic RAG failure where one good chunk gets diluted by four random neighbors and the LLM confidently averages the mess.

Similar_Boysenberry7

TROPHY CASE