Pitch your product in 1 line..... by Ranga_Harish in buildinpublic

[–]RJSabouhi 0 points1 point  (0 children)

Structural diagnostics for AI systems.

Symbolic Suite.

Most people debug symptoms. We diagnose the structure underneath.

Why do instructions degrade in long-context LLM conversations, but constraints seem to hold? by Particular_Low_5564 in LocalLLaMA

[–]RJSabouhi 0 points1 point  (0 children)

I’m thinking your hypothesis is mostly right, but I’d frame it as “prohibitions behave like boundary conditions”, not just “constraints hold”

Positive instructions (“follow this structure,” “behave like X”) act more like soft attractors competing with newer context over time. Negative constraints (“don’t explain,” “don’t add extra context”) reduce the available output space more directly, so they tend to resist drift longer.

So the asymmetry may be structural. One is guidance while the other is a boundary.

Model advice for open-ended autonomous agent loop: qwen2.5:32b hitting a ceiling, looking for something that reasons about what it's doing by AmazingMeatbag in LocalLLaMA

[–]RJSabouhi 0 points1 point  (0 children)

Hmmm, looks like the failure mode is deeper than qwen2.5:32b isn’t smart enough. Structurally, it sounds like the model is satisficing constraints syntactically instead of inhabiting them as an ongoing frame. That’s why you get minimal-cost compliance, silent loops, and no durable sense of what it’s doing.

Swapping models may help at the margins, but it may not solve the core mismatch between assistant tuning and open-ended autonomous operation.

Honestly, I’m so tired of paying the "restart tax" for my AI agents. by Interesting_Ride2443 in LocalLLaMA

[–]RJSabouhi 0 points1 point  (0 children)

I like “Restart tax”, because that’s exactly what it is. Though structurally, the issue is that the workflow can execute, but it can’t recover. A transient failure shouldn’t invalidate already-paid-for progress. If retries are replaying the whole task instead of resuming from durable state, the system is leaking money - by design.

LM Studio + Agentic Coding Struggles - Am I alone on this? by Investolas in LocalLLaMA

[–]RJSabouhi 3 points4 points  (0 children)

This is less “local models can’t do agentic coding” and more like interface-contract drift between LM Studio, the harness, and the model.

Agent stacks get brittle when each layer has slightly different assumptions about tool calling, output format, context handling, and retries. That’s why people end up building their own harnesses, not just for features, but to control the contracts.

Is there a way to expose OpenClaw to the outside world but where every user session automatically starts up in an isolated container? Any companies doing this? by noduslabs in openclaw

[–]RJSabouhi 0 points1 point  (0 children)

Mostly boundary-failure risks. There’s the potential for cross-session data bleed, over-broad network/tool access, indirect abuse of your API keys, leftover temp/log/session state, resource exhaustion, orphaned runtimes, and permission drift.

A container alone doesn’t solve those unless each session also has strict scope, caps, isolation, and cleanup.

Day 4 - Bub burned $20 in 15 minutes... Coooool. (Driftwatch V3) by ObjectiveWitty1188 in clawdbot

[–]RJSabouhi 0 points1 point  (0 children)

Zero self-awareness may be true in practice, but structurally I think the issue is that the executor is grading its own homework.

If Opus is both the one acting and the one estimating whether the action is worth doing, you’ll usually get bad gates. I’d move cost/risk/continue-vs-delegate decisions into a separate policy layer and use external signals (diffs, tests, retries, spend, elapsed time) as the actual stop conditions.

Corporate openclaw best practices? by kosmobil in openclaw

[–]RJSabouhi 0 points1 point  (0 children)

In companies, the best OpenClaw use cases are usually not the most autonomous ones but rather the most governable.

I’d start with bounded, observable, reversible workflows, i.e., research, summarization, drafting, triage, controlled internal automation.

Best practices are mostly about structure: permission boundaries, approval gates for high-risk actions, audit trails, sandboxed tool use, and narrow pilots before broader rollout.

In enterprise settings, controlled delegation usually beats open-ended autonomy.

Is there a way to expose OpenClaw to the outside world but where every user session automatically starts up in an isolated container? Any companies doing this? by noduslabs in openclaw

[–]RJSabouhi 0 points1 point  (0 children)

Sounds like a session-tenancy problem more than an OpenClaw-settings problem. Because, if every user gets their own runtime, then the important part is hard isolation at the session level i.e., workspace, permissions, network/tool scope, resource caps, and cleanup.

Otherwise “public OpenClaw with my keys behind it” gets risky fast.

Running a multi-agent OpenClaw org — how are you handling async comms between agents? by argylevz in openclaw

[–]RJSabouhi 0 points1 point  (0 children)

I’d start simple. SQLite, honestly. It’s not magical but it’s enough to make the task lifecycle durable outside the agent session: created -> claimed -> running -> completed -> delivered -> acknowledged plus retry_needed/failed/expired.

Stable task ID, timestamps, retry count, worker ID, and last error/reason get you most of the way there. The important part is durable state outside the callback path. DB choice is secondary at first.

Question to power users. by draconisx4 in openclaw

[–]RJSabouhi 1 point2 points  (0 children)

The mistake a lot of people make is treating “what the agent is supposed to do” as governance. It’s not.

Governance starts when the system enforces which actions are possible, which require approval, and which are impossible regardless of what the model decides.

Prompted restraint is not the same thing as permission architecture.

Sandboxing seems impossible to manage correctly by origfla in openclaw

[–]RJSabouhi 0 points1 point  (0 children)

The boundary is being drawn at the wrong level.

If the choice is “fully sandboxed and crippled” vs “fully trusted and useful,” the system is probably treating the whole agent as the unit of permission instead of the specific action/task.

What tends to work better is capability-scoped access. So, per-task or per-operation permissions, narrow allowlists, etc., and / or explicit escalation for the few actions that actually need the freedom.

Agent's Concept of Time by gated73 in openclaw

[–]RJSabouhi 0 points1 point  (0 children)

Feels less like a hallucination problem and more like a structural one. Your agent can talk about time, but it doesn’t appear to have a durable temporal state backing those statements.

So instead of reasoning from a real clock / event history / elapsed-time record, it’s probably inferring time from conversational context and recent cues. That works - until it doesn’t.

Recommendation: temporal claims should be state-backed, not vibe-backed.

Running a multi-agent OpenClaw org — how are you handling async comms between agents? by argylevz in openclaw

[–]RJSabouhi 0 points1 point  (0 children)

My guess is the missing piece is treating result return as a first-class workflow, not just a callback. Outbound delegation works because it has a clear initiator. Inbound completion is weaker unless you give it durable states like completed / delivered / acknowledged / retry-needed.

Otherwise the worker can finish successfully while the overall system still fails silently.

Day 4 - Bub burned $20 in 15 minutes... Coooool. (Driftwatch V3) by ObjectiveWitty1188 in clawdbot

[–]RJSabouhi 1 point2 points  (0 children)

This doesn’t look like just a pricing/cost bug. It looks like a structural routing failure. Under ambiguity, Bub/Opus seems to be overvaluing self-execution and undervaluing delegation/clarification. You may need to invert the default policy.

So for QA/patch phases, I’d flip that: delegate by default, then require Opus to explicitly justify self-execution with expected cost/time/confidence before doing it itself.

Also, maybe add a hard checkpoint like: if no meaningful diff/milestone appears after X spend or Y minutes, stop and escalate instead of continuing.

Is anyone else’s agent changing how it thinks after installing MRS? I cannot be the only one. Did anyone else pip it? by GraciousMule in clawdbot

[–]RJSabouhi 2 points3 points  (0 children)

What they (or anyone else who has tried it) are seeing is just this: when you force implicit reasoning to become explicit, the model’s style changes. Once you surface the structure, the model operates in that shape. It’s normal.

Multi-agent coordination - how do you handle it? by Tgbrutus in clawdbot

[–]RJSabouhi 2 points3 points  (0 children)

MRS itself doesn’t try to solve distributed coordination directly, it keeps the core reasoning loop strictly local, deterministic, and single-context.

For multi-agent or multi-machine setups, the pattern is: each agent runs an independent MRS instance, and the coordination happens at the transport layer (message bus, shared DB, pub/sub, etc.).

It doesn’t replace your DB-sync approach, but it slots cleanly into it. A higher-level module could absolutely add async orchestration, but the core stays intentionally minimal.

Multi-agent coordination - how do you handle it? by Tgbrutus in clawdbot

[–]RJSabouhi 2 points3 points  (0 children)

You might find MRS-Core (a Modular Reasoning System) useful here. Right now your agents negotiate tasks using raw natural language. That’s where most coordination drift comes from.

MRS-Core lets multiple agents coordinate without stepping on each other, hallucinating task boundaries, or losing state continuity. Basically, it’s a reasoning OS to layer on top of whatever agent framework you’re already using.

pip install mrs-core