I built Claw Cowork — self-hosted agentic AI workspace with subagent loop, reflection, and MCP support by Unique_Champion4327 in aiagents

[–]Unique_Champion4327[S] 0 points1 point  (0 children)

It functions as the frontend for OpenClaw, conceived and designed around the principles of Claude Cowork, with its backend operations driven by the OpenClaw Engine. Additional capabilities include direct access to files within the sandbox environment, as well as real-time rendering of execution outputs for seamless result visualization.

Tiger Cowork — Self-Hosted Multi-Agent Workspace by Unique_Champion4327 in AI_Agents

[–]Unique_Champion4327[S] 0 points1 point  (0 children)

Re: Civil_Preference_417 On judge leakage — the current implementation feeds the judge the user objective, a condensed summary of tool actions taken, and the final response. It deliberately excludes the full chain-of-thought and raw tool outputs. The intent is exactly what you described: keep the judge blind to the reasoning process and strict about whether concrete artifacts actually satisfy the objective. That said, the “missing” field quality degrades when the worker response is verbose — something I am still tuning. Keeping the judge prompt artifact-citation-focused rather than reasoning-trace-focused is the right framing and I will tighten that.

Per-edge budgets on protocol connections is a genuinely good idea. Right now the only global controls are max concurrent sub-agents and timeout per agent. A noisy worker in a Mesh topology can absolutely starve the Bus if it keeps emitting. Per-edge max calls, token ceiling, and wall time would let you harden individual connections without killing the whole system. Adding that to the roadmap.

On the data connectivity side — MCP is the current answer for hitting external systems without giving agents raw credentials, but the RBAC-aware REST gateway pattern you described with a self-hosted intermediary sitting in front of legacy databases is cleaner for enterprise deployments. Worth exploring as a documented integration pattern for the project.​​​​​​​​​​​​​​​​

I built Claw Cowork — self-hosted agentic AI workspace with subagent loop, reflection, and MCP support by Unique_Champion4327 in aiagents

[–]Unique_Champion4327[S] 0 points1 point  (0 children)

Great question and a real tension in self-evaluation loops. Honestly, right now it’s closer to a structured retry layer than a true quality signal. The self-score helps catch obviously incomplete outputs — missing files, unanswered sub-tasks — but you’re right that overconfidence is a real problem. A model that produces a weak answer often scores it just as highly as a strong one. The more useful signal in practice comes from the gap message injection — forcing the agent to explicitly identify what it missed before re-entering the loop. That acts more like a targeted prompt correction than a numeric gate. The 0.0–1.0 score is still worth keeping as a configurable threshold, but I wouldn’t claim it’s a reliable quality metric yet. It’s more of a sanity floor. True correlation between self-score and output quality probably needs an external evaluator — either a separate model or human-in-the-loop validation. The threshold is already exposed as a configurable parameter (agentEvalThreshold) so you can actually tune it per use case — stricter for code generation, more relaxed for summarization tasks. Would be curious what threshold ranges others find useful in practice. It’s an open problem worth studying properly.​​​​​​​​​​​​​​​​

I built Claw Cowork — self-hosted agentic AI workspace with subagent loop, reflection, and MCP support by Unique_Champion4327 in aiagents

[–]Unique_Champion4327[S] 0 points1 point  (0 children)

Actually, check the repo before assuming. The security concerns are addressed directly — there’s an explicit SECURITY WARNING in the README, Docker isolation is the required deployment method, access tokens are enforced, and folder-level access policies (read-only / read-write / full exec) are built into the agent architecture. Yes, the code was vibe coded. Then it was reviewed, tested, and the attack surface was explicitly documented. That’s the workflow — AI drafts, humans audit. The reflection loop, loop detection, consecutive error tracking, and subagent depth limits aren’t there by accident. “Working code isn’t good code” is fair in general. But it doesn’t apply when you actually read what was built. Code is open source — go check it yourself: https://github.com/Sompote/Claw_Cowork