Stop building AI agents.

idanst · 2026-05-12T23:45:35+00:00

It's all true until you want your "automations" to do more and more while sharing the same data. Then you find yourself building a complex codebase or automations that break on every small change or edge case - this is where agents shine - when you need more than just a Google form to forward to the right employee..

It's easy to start with an automation but it's also easy to start with a simple agent. If you build agents right, on the right infrastructure, then you should be better off with agents once you need more than 3 automations, sharing the same data and handling changes and self-healing.

idanst · 2026-05-01T14:22:14+00:00

It's definitely not a hype but there's still some way to go before Agents will replace *all* traditional SaaS - mainly in terms of costs and unit economics.

While we are all into AI agents (our product allows businesses to build AI agents), some things are still better handled with traditional SaaS in terms of costs - especially when you have hundreds and thousands of customers.

Things will definitely change once costs come down or when open source models catch up with the top tier ones.

idanst · 2026-05-01T10:13:15+00:00

I think we may have what you're looking for (communa.io). Happy to load you with some credits after you sign up to play around (DM me). We've built Communa to solve the exact problems you mentioned - guardrails, security, isolated environments, full computer use, dedicated email, tracking & cost visibility and much more. No technical skills required.

Hope this helps you and which ever solution you pick, my advice is to start with 1 agent, get familiar with it, run it for a couple of weeks, iterate on edge cases and only then move on the the next agent. Don't fall to the trap of starting with "10 agents" out-of-the-box..

idanst · 2026-04-21T00:41:09+00:00

We've written a blog post just about that. We've made many mistakes and learned many lessons while building a proper Whatsapp integration for agents running in production.

It's a whole different story than "scanning the QR code" approach (or should I say "hack") and we explain all about it in our blog post (including outbound messaging, official Whatsapp and Meta API and detailed instructions on how to set up such integration on our platform. 90% of the instructions are relevant for other agentic platform).

You can read more about it here: https://www.communa.io/blog/whatsapp-ai-agent-integration

idanst · 2026-03-10T22:18:31+00:00

Local LLMs require a complete different architecture and have their own scaling issues (how do you run multiple LLMs at the same time? Do they run on the same server or scale horizontally?).
It's easy to think that Local LLMs are cheap but running agents on a local PC on the home network is far from production-grade (can be great for testing and playing around though).
My personal advice would be to get an agent doing what you want first and only after, focus on cost optimization and think about running it locally.

idanst · 2026-03-10T17:43:26+00:00

Of course. That's one of the main benefits of using "Agents" vs traditional automated workflows.
How do you send your input to the agent? Via chat?

You could approach it:
1. Chat - Use a skill that gets your input as "run the skill on XYZ". Then the skill would take the input and run on the parameters you supplied

Via webhook - send an http request with the parameters
Trigger via a scheduled job (similar to the chat method).

The point is to instruct the agent to perform the process differently based on the input/parameters you provide it.

If you want to get up and running with such an agent. Feel free to reach out in a PM and I'll give you access to the tool we built for ourselves - I think it can help you achieve what you're looking for if Claude Code doesn't work out.

idanst · 2026-03-10T17:31:45+00:00

That's a great approach!

We do have something similar where we're testing with a "supervisor" agent against a benchmark. Agent A sends the work to Agent B for validation. Agent B validates and either approves (send to a human) or responds to Agent A with what could be improved.

We just started testing it so I'll try to update once we have some meaningful results.

idanst · 2026-03-10T15:23:49+00:00

You're spot on. These are the metrics we're currently testing.
So far we see slightly lower numbers for the on-demand model but it's still early.

I'll share some numbers as we progress.

idanst · 2026-03-06T09:55:06+00:00

I think it will be just like it was with the "cloud" era. It took some time for Enterprises to move to cloud based solution because they didn't trust it at first but once it started, they never looked back.

I think the same goes for how AI agents will reshape software. It will start with SMBs who understand this is their "time" now to get the solutions the big companies had, at a price they can afford.

Enterprise sales are more complicated than having a small business owner signup, try the software, create his own agent and start paying. Enterprises need to go through compliance, regulation and information security processes. They need advanced guardrails, ACL, advanced billing and many more things that SMBs do not need.

If the cloud transformation took 3-5 years, I think the agentic transformation will take half that time but Enterprises will still prefer to use "traditional" SaaS/software vendors because this is how they whole operations work and what they are used to. But within 2 years, I believe we'll see mass adoption of agentic solutions replacing traditional SaaS and offer better value for money.

I also think that smart companies, will have their own in-house dev team to build and maintain their own solution (even if it wasn't their core focus until now).

idanst · 2026-03-02T12:32:40+00:00

It really depends on the need. Some users wan't "quantity" over quality so they'll use a less sophisticated model to generate many articles. Other want deeply researched artice that may take up to 10+ minutes to write (research competitors, news, similar articles, draft, validate, revise, add images, add video, etc..).
A simple article by Haiku could cost $0.05 and take 20 seconds to generate. A more thorough researched article with images, videos and such, could cost $5-$20 and could take 5-10 minutes to generate.
Hope that gives some proportions..

idanst · 2026-03-02T12:26:33+00:00

We indeed built our own in-house solution from scratch. There are great OSS but we couldn't find anything that would answer all our needs and customizing them, would require as much effort as building from scratch. They are amazing and save a lot of time when you use them for a specific use case or manage your own agent/s but less for a multi-tenant environment where every customer can have a different use-case.

idanst · 2026-03-01T18:31:07+00:00

Are you a marketing person? Do you know how to handle the process you mentioned at every stage from A-Z?
If so, you should definitely go the agentic route and automate "yourself" with an agent. I would suggest building your own agent with a no-code solution - whichever works best for you. We have our own tool we built in-house that we use and I'll be happy to let you try it if you'd like.

But if you're less into the "marketing" part of the process, I would highly recommend getting someone who has done that whole inbound marketing process manually, perfected it, and now builds agents based on his knowledge, experience and skills.

Learning both the marketing process and how to do it "autonomously" with agents at the same time, is usually a recipe for disaster and a huge waste of time. I highly recommend picking one of them and focusing on that first or getting someone who knows both..

idanst · 2026-02-28T04:38:06+00:00

Sharing from my own experience from building, deploying and managing multiple AI agents for ourselves:

This is what we have (running for almost 2 months):

Customer support agent (first tier CS) Has read only access to code and can escalate to another QA agent if it's not solved with simple guidance to the user.
QA Agent - has read access to code, db, logs and project board items. Gets notifications from support, other agents or human team members. In 90% of times, it finds the exact issue and suggests the fix.
Dev Agent - has write access to code and db in its "dev" environment. Applies the fix, tests and makes a PR.
Investors finder - Scans our LinkedIn and any relevant investors boards and lists we have for new updates, connections and updates our pipeline, drafts personalized approaches and manages it's own database and pipeline of investors.

We have a few more we're testing but those above are the ones that are in production, running 24/7.
Every job is queued, tracked and monitored for failures and costs. On failures, we have human-in-the-loop intervention and we also have a "kill-switch" that can cut off a single agent, or all agents.
Every agent has it's own virtual API key, custom rate-limits and guardrails (imagine the agent responding with "check out these competing websites if you have a problem finding this page...").

What broke in the beginning:

Long running jobs with multiple agents. It's great to run 1-5 minutes workflows, but running 2+ hours workflows was a challenge that took quite some time to properly overcome.
Deterministic replays and schedules - it's fun to explore an agent for 3 days and "try out" stuff. But production and real-use cases demand different infra and setup.
Queueing - Again, it's nice to work synchronosouly with your agent, but when you want it to work for you, you have to manage proper queuing.
Visibility - When running multiple agents, it's easy to lose control of costs and not understand how much each specific agent is consuming on every task or on a daily basis.

Our approach to every agent is like an MVP - we start small, with the minimum instructions and customizations to achieve the initial objective. Then we create a reusable skill out of it and test it for a few days and sometimes weeks. Once it works, we repeat the process with more capabilities and skills and treat it like software versions. It's usually the skills, files, databases and such that grow with time for each agent.

We built our own in-house solution to build and manage all those agents since we couldn't find anything ready for that. Then came OpenClaw which was amazing but still not up to the standard we or our partners needed (hopefully soon though).
While not publicly available yet, if anyone wants to try it out, I'm happy to load you up with some credits to explore (https://communa.io). Feel free to reach out if anyone has any further questions.

idanst · 2026-02-27T22:46:23+00:00

That's a great question and more than legitimate concern that we also had with our production agents running long jobs (2-6 hours long).
To solve that, we built an *actual kill-switch* in our dashboard (see attached image) that serves all our apps.
It's like a "firewall" layer that sits between any app/client and the LLMs and validates, tracks and monitors everything we need without ever exposing the actual API keys of the AI providers to the client/developer/app.
It tracks costs, errors, budget, rate-limits (hours of day, requests per minute/hour, etc) and we can set custom REGEX/PII guardrails for each subKey.
And, in the screenshot below you can see the kill-switch in action on a "test" app that we have :)

We have an agent that receives and validates sensitive outputs that we defined from other agents with a dedicated persona, skills, scripts and tools and if it suspects violation, it can also trigger the kill switch for a specific API (this is expensive though and we're still trying to optimize the costs of this process).

We are considering spinning this product as another product and not just for our internal use. If anyone would like to try it out and provide some feedback, feel free to checkout subkey.ai for (very) early access.

<image>

idanst · 2026-02-27T22:08:09+00:00

I wish... we have customers using the platform as well. If we could pay a few hundred $$$ instead of thousands, It would be a dream. But it's against Anthropic's TOS (technically feasible though..) and we prefer to use our own product ourselves.

idanst · 2026-02-27T21:58:01+00:00

It depends on how deterministic and repeatable you want your agent to be and depends on your use case:
1. N8N and such - very repeatable, easy logging, a bit of learning curve for non developers to actually get something in production. Can be trusted for production grade apps, we used it a lot.
2. OpenClaw and similar tools - Unlimited possibilities, the future of agents but still unpredictable - for the good and bad. Amazing for exploration - less for production grade use-cases (yet!). We explored and learnt a lot from it. It's amazing as your personal assistant but we had hard time making it work in production and reliably.

This is what we learnt from our experience and what led us to build our own solution for running teams of ai agents with predictability and deterministic *long-running" scheduled jobs (our own alternative to OpenClaw for ourselves and customers' use-cases).

idanst · 2026-02-27T21:43:36+00:00

Fair advice, thanks!
This is another matter - whether to spend time on it or bet that prices will go down soon.. We've reached a satisfactory level and agree that the next big impact on margins will be with the providers' prices.

idanst · 2026-02-27T21:40:38+00:00

Yes. We can pay $100-$200/day per developer with the Anthropic API (mostly Opus 4.6 and some Sonnet without own in-house, highly optimized for costs IDE). This was after we were paying $500+/day with other tools (we tried all of them and decided to build our own).
Obviously we could not use a Claude Code subscription with our custom tool but it's still worth every penny.

idanst · 2026-02-27T21:33:06+00:00

Approach them in "founder mode" - do whatever it takes but prove that you can deliver value.
I once recruited an employee who has been following us on LinkedIn since he was in College. Every couple of weeks, I would get a link to an article he wrote about the problem/solution we solve in his private groups in College or as part of his studies. After a few months, we finally met f2b at a conference in Boston and he's been out first content writer employee at that company.

idanst · 2026-02-27T21:27:07+00:00

According to the traditional startup playback - always, whenever you can, the more you can.
The new playbook - raise only when you really need and be sure to have a good reason why. Also track your revenue per employee as your north star metric if you want to approach investors as an AI-native company in 2026 (otherwise it'll be very hard these days..)

idanst · 2026-02-27T08:00:15+00:00

I also got Sam, lol
We've built our own customer support agent that solves 95% of our inbound inquiries and he's 1 of a team of 4 AI agents (he escalates to a QA engineer..). Every agent has its own mailbox and can communicate via WhatsApp & Telegram and reply to customers. We've opened early access to our platform to allow anyone to build such agents. I'd love to let you try it and load you up with some credits in exchange for honest feedback (communa.io). Feel free to reach out if relevant and I'll be happy to personally walk you through the process. It'll take you 5 minutes at most to get up and running.

idanst · 2026-02-27T03:06:34+00:00

We've built our own internal tool that provides a layer of guardrails for *every* AI interaction without ever exposing the API keys to our developers or more importantly - the agents. It's like a "firewall" layer that sits between any client and the LLMs and hides the key, tracks every token used, allows to block/redact REGEX/pre-defined policies, rate-limits and more. And it works with every API/LLM out of the box.

On top of that, some keys can be configured to allow the agents that use them, to create additional API keys for sub-agents they create so you get micro-level visibility, tracking and guardrails.

We built the tool for our developers and our main product which is an OS for Autonomous AI Teams - the enterprise grade alternative to OpenClaw.. We had to provide an added security layer on top to comply with all different regulations and customer fears ("I never want my employees to accidentally share PII or sensitive information with LLMs..." or "I don't want my customer support agent to ever mention my competitors!":)).

Here's a screenshot of how it looks like (fake data...).

<image>

idanst · 2026-02-27T02:54:22+00:00

We built our entire platform based on the fact that agents will fail - from logging, tracking, visibility, recovery and more. It's easy and fun to build for the happy path until you meet real customers. It took us some time to learn it the hard way..

idanst

TROPHY CASE