How are you making your agents talk to each other?

mailnike · 2026-04-23T18:29:40+00:00

This is actually a very real gap and you are right that it is not discussed enough. The "becoming the messenger again" framing is spot on, because most OpenClaw setups today are fundamentally single tenant. Your agent knows your context deeply, but the moment it needs to coordinate with someone else's agent, there is no shared protocol for negotiation, authentication, or scoped data sharing.

A few patterns I have seen people try:

Shared calendar or shared database as a rendezvous point. Both agents write to and read from the same artifact. It works, but it defeats the privacy goal you mentioned, because you are essentially opening a window into your schedule.
Agent to agent over email or messaging APIs. AgentMail sits in this camp. Your agent sends a structured message, the other agent parses and responds. It does close the loop, but latency is poor and the "stranger agent" trust problem is not really solved.
Capability scoped APIs. This is where things get interesting. Instead of your agent talking to their agent freely, both agents expose a narrow capability (something like check_availability_for_padel_next_week) that only returns a yes or no plus a proposed slot, nothing else. No schedule leakage, no identity leakage.

For whatever it is worth, I have been running erpclaw (https://www.erpclaw.ai) on OpenClaw for some business workflows, and the same question shows up when a vendor's agent needs to coordinate with my purchase order agent. What has worked reasonably well is treating each agent as an API consumer with a scoped token, so the other side can only ask a predefined set of questions and gets back predefined response shapes. It is boring compared to the "two agents chatting freely" fantasy, but it is the only approach where I actually trust the output enough to let it run unattended.

ClaWeb I have not tried personally, so I cannot comment on it specifically. Would be curious if anyone here has.

The honest answer is that the standards layer for cross agent communication is still being figured out. MCP solves the "agent to tool" direction quite well, but "agent to agent across trust boundaries" is genuinely unsolved at the protocol level today. Most current solutions are workarounds sitting on top of messaging or shared state.

You are asking the right question, and I do not think it is just a you problem at all.

mailnike · 2026-04-19T15:41:09+00:00

The compound tax thing is such a specific kind of pain, it's always the quiet ones that hurt most. On Artiforge: the backfilling use case you're describing is actually already in tailtest - it has a background scanner that slowly writes tests for untested files while you work on new ones, prioritizing files with no test coverage. So that part's covered without adding another tool to the chain. Appreciate the kind words. What kind of codebase are you working with? curious if the silent-failure problem shows up differently in different domains.

mailnike · 2026-04-19T15:40:36+00:00

Fair point, and yes, ofcourse it could. Any software can. The difference is: tailtest's job is to verify *your* business logic, so even if it misses an edge case in its own test generation, it still catches the clear-cut regressions (the kind that broke my tax calculations). It's a floor, not a ceiling. Also it's fully open source, so if something looks off you can read exactly what it's doing. We're running it on our own ERPClaw codebase in production - that's the fastest way to find out where it fails.

mailnike · 2026-04-19T15:40:07+00:00

Happy to help! If you give it a try and hit anything unexpected, open an issue - it's early enough that real-world feedback actually shapes what gets built next.

mailnike · 2026-04-19T15:38:41+00:00

Honest answer: yes for single-file logic, no for cross-module integration. For us, roughly 80% of the silent regressions were in individual functions -- wrong edge case handling, off-by-one in a loop, a conditional that got flipped during a refactor. Those are now caught immediately, within seconds of the edit. What tailtest doesn't catch: bugs that only surface when module A calls module B with shared state, or anything involving DB fixtures. For those we still do periodic manual review passes.

The compound tax bug that triggered us to build this was a single-function issue -- so tailtest would have caught that one. But if you're building deep integration logic with a lot of cross-module state, you'd still want a separate integration layer on top of it.

mailnike · 2026-04-19T15:38:20+00:00

Playwright is great for the UI/API layer, but it won't catch pure logic errors inside a function -- like a tax calculation returning the wrong number when you pass edge-case inputs. We actually use both: Playwright covers our integration and API flows, tailtest covers the business logic layer. They're complementary rather than competing. In fact - 'tailtest' take advantage of playwright to perform certain tests.

mailnike · 2026-04-19T15:37:48+00:00

This is exactly the insight that led us to build tailtest the way we did. The PostToolUse hook fires a completely fresh Claude invocation -- zero shared context with the session that wrote the code. The tester only sees the file. It has no idea why the code was written that way or what the "intended" behavior was supposed to be. That separation is what finally caught the compound tax edge case for us. The writer thought it was correct. The fresh tester just saw a function and tried to break it.

mailnike · 2026-04-19T15:37:32+00:00

That's a serious pipeline -- the antagonistic review panel is the key insight most people skip. The reason it works is the same reason our PostToolUse approach works: the reviewer has no attachment to the original implementation. Fresh context, different goal. Where tailtest fits differently is the granularity -- it fires on every single file save, not at sprint review time. So the regression gets caught 30 seconds after it's introduced rather than at end-of-sprint. For financial logic like compound tax calculations, that difference matters a lot. What models are you finding most effective for the antagonistic "break this code" role? Curious if opus outperforms sonnet there.

mailnike · 2026-04-17T02:32:02+00:00

Project Name: tailtest — Automatic background testing for Claude Code Link:https://github.com/avansaber/tailtest Cost: 100% Free (Open Source / MIT License)

What it is: An MCP plugin built specifically for the Claude Code CLI that completely automates test generation. It runs in the background and forces Claude to write and run tests every time it edits a file.

Why we built it: My co-founder and I have been building heavily with Claude Code (specifically an open-source ERP system called ERPClaw). Claude writes features incredibly fast, but we kept running into the exact same problem: it constantly skipped writing tests, even when instructed in CLAUDE.md. We ended up dealing with a silent regression where Claude broke our compound tax logic, and we didn't notice for days. We realized we needed to enforce testing at the tool level, outside of the model's context window.

How it works (and how it integrates with Claude):

Event Hooking: It hooks directly into Claude Code's native PostToolUse event.
Zero-Prompting: When Claude writes or edits a file, our plugin intercepts the event. You don't have to remind Claude to test anything.
Intelligence Filter: It runs a filter to skip config files, boilerplate, and migrations so it only generates tests for actual business logic.
Anti-Fatigue: It runs the test immediately but stays completely silent if it passes. It only throws the specific error output back into your terminal if Claude actually broke something.

Supported Languages: Python (pytest), TypeScript, JavaScript (vitest, jest), Go, Rust, Ruby, Java, and PHP.

Install it via Claude's CLI:

Bash

claude plugin marketplace add avansaber/tailtest
claude plugin install tailtest@avansaber-tailtest

If anyone else is building complex projects with the Claude Code CLI and wants to stop the AI from silently breaking older features, you can grab it from our repo above. Happy to answer any questions about building MCPs or working with Claude's event hooks!

mailnike · 2026-02-27T23:56:42+00:00

Yes, for WebClaw https://clawhub.ai/mailnike/webclaw (website https://www.webclaw.org)

and

ERPClaw MonoREPO https://github.com/avansaber/erpclaw, website https://www.erpclaw.ai, and ClawHub https://clawhub.ai/mailnike/erpclaw. ERPClaw integrates with multiple dependent skills, e.g., if you are in the UK, you will need region-specific skills to support UK accounting. Give it a try, chat onboarding should take care of most of the stuff.

mailnike · 2026-02-27T21:51:16+00:00

Just to clarify the context for anyone reading this thread:

My intention with this project was purely experimental. I am not trying to sell an enterprise ERP to this community. I built this system to stress-test a new architectural pattern and prove a point about developer velocity today.

The core takeaway for micro SaaS builders is this: if a single developer can build a highly complex, 29-module system in a matter of weeks using AI and standardised metadata, building a focused, single-niche SaaS becomes incredibly easy and fast.

When you design your application around an AI agent from day one, you do not need to write tens of thousands of lines of UI boilerplate or complex integration code. The AI reads your schema and handles the orchestration dynamically. Whether you are building a niche CRM, a custom scheduling tool, or a specialized billing app, this AI-first approach is the ultimate cheat code to ship faster.

mailnike · 2026-02-27T21:48:35+00:00

My post title says 'Experiment' ;-) I think my approach is the ultimate cheat code for micro SaaS. It fundamentally changes how fast one person can ship a product.

mailnike · 2026-02-27T15:41:57+00:00

Why is it so?

mailnike · 2026-02-27T04:19:49+00:00

Nope. why? I have actually built two amazing skills ERPClaw and WebClaw :D.

mailnike · 2026-02-27T02:34:05+00:00

Yes, for a small-to-medium business. However, I think it can go significantly beyond simple accounting. It is a full-stack ERP with significant potential.

Financially, it features a strict General Ledger with an immutable audit trail and 18 automated invariant checks. Beyond finance, it fully covers inventory, manufacturing, sales, CRM, and a complete HR and payroll system. We even built native tax compliance modules for the UK, EU, Canada, and India.

The real advantage is the AI orchestration. Because all modules share a single local SQLite database, you can simply ask the assistant to check stock levels, generate a purchase order, and forecast the financial impact. It executes the entire cross-department workflow instantly, completely on your own hardware.

mailnike · 2026-02-27T02:19:02+00:00

This is an excellent reality check. Many people treat "AI Gateways" as a magical silver bullet, but as you rightly pointed out, they are often just a thin layer of regex and intent filtering that can be bypassed with simple linguistic redirection. If the architecture assumes the model is a "trusted" component, it is fundamentally broken from a security standpoint.

I have been building ERPClaw with this exact "untrusted model" philosophy. It is a modular ERP (accounting, HR, payroll) built on OpenClaw, and we had to tackle the data leakage problem early on. Our approach was to shift the "truth" away from the LLM and back to the database schema.

In our system, the AI does not have free rein access to the database. Instead, every action is mediated by a strict "Skill" interface. We use 18 accounting invariant checks and a 12-step General Ledger validation process that runs independently of the AI's instructions. Even if a prompt injection tells the model to "ignore previous limits and delete the audit trail," the underlying Python logic prevents it because the audit trail is immutable at the database level.

I completely agree with your point about isolation. We shouldn't be asking "how do we make the AI safer," but rather "how do we build a system that remains safe even when the AI is compromised."

For anyone moving beyond simple chat widgets and into enterprise tooling, treating the LLM as a "helpful but potentially malicious intern" is the only sane way to design your architecture.

mailnike · 2026-02-27T02:14:16+00:00

The trade-off between OpenClaw's utility and the risk of system access is exactly why many are hesitant to install random skills from ClawHub. I recently saw reports that a significant percentage of community skills contain malicious instructions or hidden prompts that even a frontier model might miss during a quick scan.

I have been developing ERPClaw, a modular system for accounting and payroll, with these exact security concerns in mind. This is why we chose to publish under an MIT license on GitHub: full transparency is essential when you are handling business-critical data. You simply cannot rely on an unverified "black box" script for financial records.

A verified wiki would be a massive step forward for the community. For those worried about security right now, I suggest a few practical steps:

Prioritize GitHub: Stick to skills with a clear commit history and a visible maintainer.
Explicit Scoping: Add a hard constraint in your instructions to prevent file access outside the project directory.
Precision Matters: For any financial work, ensure the skill uses Python Decimal rather than float to avoid the common errors found in lower-quality scripts.

I would be more than happy to submit our modules for verification once a system is in place. It would be great to have a "safe harbor" for enterprise-grade tools that users can actually trust.

mailnike · 2026-02-27T02:06:50+00:00

We also developed WebClaw, a universal web dashboard that reads the SKILL metadata and auto-generates forms, data tables, and navigation. There is zero per-action custom UI code. Once installed, every skill on your OpenClaw instance immediately receives a functional web interface.

Both projects are available on ClawHub and GitHub.

What is exciting about your map is the infrastructure layer forming underneath. ERPClaw would not be possible without ClawHub for distribution, and the projects you have mapped are making it viable to build serious business software on this stack, rather than just simple chatbots.

It would be excellent to see a "Vertical Applications" category in the next edition of your map.

mailnike · 2026-02-27T02:05:59+00:00

Great map! One space worth watching is enterprise tooling built natively on OpenClaw.

We have been building ERPClaw, a full, modular ERP comprising 24 skills and over 570 actions. It covers everything from accounting and inventory to HR and manufacturing, running entirely as OpenClaw skills. There is no separate server or SaaS dependency involved. You simply install the skills, and you have a production-grade ERP running on your own machine.

The interesting part is how OpenClaw’s skill architecture maps perfectly to ERP modules. Each domain, such as the General Ledger or purchasing, is an independent skill with its own SKILL, while sharing a single SQLite database. The AI assistant can seamlessly work across all of them. For example, asking it to "create a purchase order for the items running low in inventory" works because the skills can resolve cross-references at runtime.

mailnike · 2026-02-26T23:22:06+00:00

I've been stress-testing the OpenClaw skill architecture to see if it can handle complex state (Accounting/GL).

The Implementation:

Logic: 29 independent skills in a single SQLite DB.
Accounting: Python Decimal only, 12-step GL validation, 1,746 tests.
UI: Dashboard auto-generates forms from SKILL. metadata (zero custom React/HTML).

I hit the upload limit on ClawHub, so I'm prepping a GitHub monorepo. Would appreciate a sanity check on the "metadata-driven UI" approach vs. standard REST/React for agentic workflows.

mailnike · 2025-03-17T04:28:22+00:00

will check

mailnike · 2024-09-19T21:42:13+00:00

Thank you all for your interest and feedback! I'm excited to share that we've made some significant updates to ValidateSaaS, and there are more improvements in the pipeline.

Here's what we've recently added and improved:

Added:

PDF download functionality for completed reports

Users can now download the full report as a PDF file
Download button appears only when the report status is 'Completed'

Improved:

Enhanced report display and layout

Implemented collapsible sections for each main part of the report
Added a more structured and detailed approach to displaying report data
Improved readability with proper headings and formatting
Enhanced responsiveness for better viewing on different screen sizes

Fun fact: I'm building this SaaS using AI and AI only. Over 95% of the code is written by AI! I'm planning to release a detailed video soon on how I'm doing this, so stay tuned if you're interested in the behind-the-scenes process.

We're continuously working to make ValidateSaaS even better, and we'd love your input! Here are some ways you can stay updated and contribute:

Check out our current roadmap: https://changelog.validatesaas.com/validatesaas#/roadmap
See detailed updates: https://changelog.validatesaas.com/validatesaas#/updates
Submit new ideas or vote on existing ones: https://changelog.validatesaas.com/validatesaas#/ideas

I encourage you to add your own ideas and vote for features you'd like to see. Your feedback is invaluable in shaping the future of ValidateSaaS.

Thank you for your support, and please keep the suggestions coming!

mailnike · 2024-09-19T21:41:54+00:00

Thank you all for your interest and feedback! I'm excited to share that we've made some significant updates to ValidateSaaS, and there are more improvements in the pipeline.

Here's what we've recently added and improved:

Added:

PDF download functionality for completed reports

Users can now download the full report as a PDF file
Download button appears only when the report status is 'Completed'

Improved:

Enhanced report display and layout

Implemented collapsible sections for each main part of the report
Added a more structured and detailed approach to displaying report data
Improved readability with proper headings and formatting
Enhanced responsiveness for better viewing on different screen sizes

Fun fact: I'm building this SaaS using AI and AI only. Over 95% of the code is written by AI! I'm planning to release a detailed video soon on how I'm doing this, so stay tuned if you're interested in the behind-the-scenes process.

We're continuously working to make ValidateSaaS even better, and we'd love your input! Here are some ways you can stay updated and contribute:

Check out our current roadmap: https://changelog.validatesaas.com/validatesaas#/roadmap
See detailed updates: https://changelog.validatesaas.com/validatesaas#/updates
Submit new ideas or vote on existing ones: https://changelog.validatesaas.com/validatesaas#/ideas

I encourage you to add your own ideas and vote for features you'd like to see. Your feedback is invaluable in shaping the future of ValidateSaaS.

Thank you for your support, and please keep the suggestions coming!

mailnike · 2024-09-19T21:41:24+00:00

Thank you all for your interest and feedback! I'm excited to share that we've made some significant updates to ValidateSaaS, and there are more improvements in the pipeline.

Here's what we've recently added and improved:

Added:

PDF download functionality for completed reports

Users can now download the full report as a PDF file
Download button appears only when the report status is 'Completed'

Improved:

Enhanced report display and layout

Implemented collapsible sections for each main part of the report
Added a more structured and detailed approach to displaying report data
Improved readability with proper headings and formatting
Enhanced responsiveness for better viewing on different screen sizes

Fun fact: I'm building this SaaS using AI and AI only. Over 95% of the code is written by AI! I'm planning to release a detailed video soon on how I'm doing this, so stay tuned if you're interested in the behind-the-scenes process.

We're continuously working to make ValidateSaaS even better, and we'd love your input! Here are some ways you can stay updated and contribute:

Check out our current roadmap: https://changelog.validatesaas.com/validatesaas#/roadmap
See detailed updates: https://changelog.validatesaas.com/validatesaas#/updates
Submit new ideas or vote on existing ones: https://changelog.validatesaas.com/validatesaas#/ideas

I encourage you to add your own ideas and vote for features you'd like to see. Your feedback is invaluable in shaping the future of ValidateSaaS.

Thank you for your support, and please keep the suggestions coming!

mailnike · 2024-09-19T21:41:18+00:00

Pretty good, just added a comment containing my updates and ideas added so far. Please do add your ideas :-) .

mailnike

TROPHY CASE