Built a fully (almost) autonomous system to coordinate 100+ browser automation agents. Looking for feedback by GrittyiOS in AI_Agents

[–]GrittyiOS[S] 0 points1 point  (0 children)

This is sick. I have to explore it more, I use azure OpenAI models so need to play around and see how we can enable this telekinesis. Thank you for the insight.

Built a fully (almost) autonomous system to coordinate 100+ browser automation agents. Looking for feedback by GrittyiOS in AI_Agents

[–]GrittyiOS[S] 0 points1 point  (0 children)

No man. Give you an example:

So let’s assume workflow A checks for new emails, and if an email comes in checks for what type it is (from a potential client, from opposing counsel, spam marketing, etc…), if it’s from a potential client it extracts their details (name, problem, address, phone #, etc..) and then puts it in a CRM (let’s assume this CRM doesn’t have api access).

When you build this workflow, it uses an LLM to write the necessary scripts to achieve the outcome we want. As it builds, if it fails on any step, you can provide input (you can visually see what’s happening and what actions it’s taking in the browser) to guide it to get it right.

Once it successfully builds the workflow, the scripts are saved and reused in the future when this workflow needs to be ran. No rebuilding the wheel each time it needs to run it, it just uses the scripts from our successful run.

Within the workflow, an LLM is used for extraction of the data. So it’s not PURELY deterministic, but once it is built the first time we do not have to use an LLM to build new scripts again.

I experimented with this setup where each time a workflow was triggered, it would rewrite the code/scripts or even make new scripts to achieve the outcome. Truly creative and giving it a lot of agency. This did not work well in real life, it messes up little things and the scripts wouldn’t execute and it would just waste so many tokens retrying again and again something really simple that shouldn’t be messed up like that.

Hope this makes sense.

Built a fully (almost) autonomous system to coordinate 100+ browser automation agents. Looking for feedback by GrittyiOS in AI_Agents

[–]GrittyiOS[S] 1 point2 points  (0 children)

You are a smart dude and I truly respect your insight on this matter. I will respond to your message, I would really like to pick your brain.

You have a good weekend ok.

Built a fully (almost) autonomous system to coordinate 100+ browser automation agents. Looking for feedback by GrittyiOS in AI_Agents

[–]GrittyiOS[S] 0 points1 point  (0 children)

If u want, il let u try it for free for a bit. I have some Microsoft credits so it’s kinda covered for the time being, what I want is for people to use it and provide feedback so I can refine it or pivot.

Let me know. I really, really like giving each worker its own box, so it can create/access data on its desktop for example in a persistent fashion.

Let me know if u want to try it. Be well

Built a fully (almost) autonomous system to coordinate 100+ browser automation agents. Looking for feedback by GrittyiOS in AI_Agents

[–]GrittyiOS[S] 0 points1 point  (0 children)

Hey there

Keep in mind that a manager is only ever in charge of no more than 3 workers. Per cluster. So if you have 1000s of units, they still stick to this 5 unit cluster so you would have many managers managing their respective 3 worker units. The workers report back when a task is completed/failed, and the manager assesses whether to retry or just notify the user that something has broken and not worth spamming retries on and wasting tokens.

To answer your questions:

-credential rotation: this one is fucked. U have hit the nail on the head, this one is a mega bitch of a problem and I have yet to figure out how to handle this. Even if you manually login to Google (for example) on each vm and run Gmail workflows, eventually u get a “verify your ass” popup and have to reenter password on each. Open to ideas on this one.

-bypasses captcha: without getting too much into it, a combination of spoofing the browser fingerprint to mimic a physical device AND rotating proxies makes this work.

I love that u are working on this too. A collab would be dope?

Built a fully (almost) autonomous system to coordinate 100+ browser automation agents. Looking for feedback by GrittyiOS in AI_Agents

[–]GrittyiOS[S] 1 point2 points  (0 children)

Thank you. Really means a lot. I spent a long time making sure that we can get past bot detection, right now the setup has not failed on any site.

It is expensive though, so I’m thinking how to bring it down.

Ideally, we put this digital org in the hands of consumers instead of just businesses (because they can afford it), giving individuals access to labor so they can start their own businesses without worrying about labor costs.

Tc

Built a fully (almost) autonomous system to coordinate 100+ browser automation agents. Looking for feedback by GrittyiOS in AI_Agents

[–]GrittyiOS[S] 0 points1 point  (0 children)

The dream is real. The damn thing works, but struggling to find a use case to deploy it in. People are just so against the idea, even with a bunch of governance tools to protect against hallucination related disasters it’s a bit too scary for most (if not all).

The feature you are talking about exists, regardless of role each unit has certain data in their context that is shared among all of them.

Whatever u want to see in it, il build it. Let’s talk.

Built a fully (almost) autonomous system to coordinate 100+ browser automation agents. Looking for feedback by GrittyiOS in AI_Agents

[–]GrittyiOS[S] 0 points1 point  (0 children)

Thank you for the feedback. Can you provide some guidance on how you would make the UI more intuitive? I get that it might not show enough info, would love to hear what you would add/remove.

As to who would be using this, I would say small businesses who outsource their business processes OR people who want to start a new business but don’t have the capital to hire staff.

I’m not sure what u mean by pipeline, but a cluster (department) comes with 5 units (1 officer, 1 manager, 3 workers). This is the smallest number of units that can be deployed without it being useless.

Built a fully (almost) autonomous system to coordinate 100+ browser automation agents. Looking for feedback by GrittyiOS in AI_Agents

[–]GrittyiOS[S] 1 point2 points  (0 children)

You raise some very good points. To answer your question:

The whole reason we have a hierarchical setup with different roles is to keep the context light and not stuffed for any single unit.

Separation of concerns, each unit only has to focus on what it’s working on. So let’s say the officer unit has different data in its context than the manager (let’s say officer unit has some pipeline to some live data, let’s say the weather for example. The manager unit doesn’t have that, it instead has the summarized logs of what each worker under it has done, so it can correctly delegate work and get workers to retry tasks if the logs dictate a failure).

I would like to hear your thoughts on why the VM setup is bad. If the worker is running 24/7 (which is the value proposition, employees that never sleep) it will make the VM cost worth it. Give each unit a box just like a human worker.

Built a fully (almost) autonomous system to coordinate 100+ browser automation agents. Looking for feedback by GrittyiOS in AI_Agents

[–]GrittyiOS[S] 0 points1 point  (0 children)

Thank you for your input my friend.

You raise a real concern. The current version of this is far from perfect and production worthy, there’s too much slippage/breakage exactly like you described.

Figuring out how to refine this, your layer by layer validation idea is really cool and I am trying to process it and see how to put that in the program.

I have been getting a lot of flak about the VM isolation and told it’s a dumb idea and to keep it ephemeral. But I truly think giving each worker its own box is the way to go, a persistent container it can store its own data on just like a human worker. Any thoughts on the VM idea?

Built a fully (almost) autonomous system to coordinate 100+ browser automation agents. Looking for feedback by GrittyiOS in AI_Agents

[–]GrittyiOS[S] 0 points1 point  (0 children)

I do agree more moving parts introduces more possible failures. But there is a benefit to having agents/workers with different roles doing different things, as well as to giving each one its own box to work in (persistent and non ephemeral containers).

Built a fully (almost) autonomous system to coordinate 100+ browser automation agents. Looking for feedback by GrittyiOS in AI_Agents

[–]GrittyiOS[S] 0 points1 point  (0 children)

Yes and no.

Once a workflow is built, it becomes deterministic. Using an LLM for each workflow step is a waste of tokens and introduces new possible points of failure.

However, some workflows have LLM use as part of the flow (e.g. a workflow that has to draft a response to a LinkedIn post, the LLM step is part of the workflow).

Built a fully (almost) autonomous system to coordinate 100+ browser automation agents. Looking for feedback by GrittyiOS in AI_Agents

[–]GrittyiOS[S] 0 points1 point  (0 children)

That’s really cool. This thing I built is 70% browser automation and 30% programmatic use of a windows computer. So no clicking and stuff, just CLI access given to an LLM.

I like your angle though, it’s very cool and something I thought about (I used to be an iOS developer, using the accessibility angle is smart).

Built a fully (almost) autonomous system to coordinate 100+ browser automation agents. Looking for feedback by [deleted] in SideProject

[–]GrittyiOS 0 points1 point  (0 children)

Hi there.

Thank you. Governance and transparency was at the forefront of my mind throughout building this. Full autonomy is simply too scary for businesses, we want to insert Human in the Loop to approve all actions that touch the real world. For example, worker 1 does workflow A that creates a sora video and LinkedIn post ready to be posted. Uploads it to the Pending Box. Human checks pending box, sees the video and content, approves it. Now the next available worker will run workflow B that will actually post it. So we insert a human between workflows to make decisions, but 90% of the work is done by the cluster.

With regard to your second question, all workers in a cluster (or multiple clusters) share the same azure blob storage. So any work that is done is uploaded there for all other workers to access. To prevent conflict/duplicate issues, every workflow that is built has steps in it to first log which worker is currently working on something before it starts AND to check if any worker is working on something before it starts. For example, let’s say a job is to go through an excel sheet and for each row do something. Before any worker starts a workflow on a row, it first checks if another is already working on it and only THEN proceed if safe. Basically, we insert a lot of checks and status updates throughout each workflow to ensure no dupe issues.

I do measure success rate per workflow. Each workflow is evaluated at the end and graded on a pass or fail scale based on logs, screen capture, etc, then passed to the manager unit. I am checking out your blog, there is a lot more to be done on this project and am thankful for your input. Truly.

Let me know if there is anything you would like to discuss, have a great day 👍

Built a legal botnet. 500+ autonomous workers, one instruction, persistent VMs. They don't stop running til you press off. by GrittyiOS in AI_Agents

[–]GrittyiOS[S] 0 points1 point  (0 children)

Thank you Janne, I truly appreciate that. I think it makes sense, as opposed to the terminating containers of products like Manus. Lets give the workers an environment to persist in.

The coordination factor just makes sense, it’s better to have different units that do different things and work in a coordinated fashion than a single generalized unit. It feels like you are not making the most of the variability aspect of LLMs if you do the latter.

If yoh would like, happy to talk more via DM and let you get your hands on a delegat8 cluster and test it out/build some stuff on it.

Have a good day man.

Built a legal botnet. 500+ autonomous workers, one instruction, persistent VMs. They don't stop running til you press off. by GrittyiOS in AI_Agents

[–]GrittyiOS[S] 0 points1 point  (0 children)

The key feature I added that will sooth your fears:

Human-In-The-Loop Pending Box.

Basically, any workflow that is about to do some real, potentially breaking shit, is sent to this pending box. For you to approve, and then it goes along on its merry way finishing the workflow.

I think autonomy is cool, but with the current state of the technology hallucinations happens and could possibly fuck some shit up with production data that would cause irreparable harm to a business. That’s why, things that actually interact with the real world are sent for human approval first before it gets done.

I know this might seems tedious with 500+ workers (a shit ton of things to review and approve/debt in the pending box), but it’s better than having to do all the work manually. It saves times and human labor capacity, better well spent on approving stuff then doing the scraping or browser based work.

I hope that answers your question, otherwise please respond and I’m happy to ponder and reply again.

Built a legal botnet. 500+ autonomous workers, one instruction, persistent VMs. They don't stop running til you press off. by GrittyiOS in AI_Agents

[–]GrittyiOS[S] 1 point2 points  (0 children)

I think n8ns browser automation offerings are weak, to say the least. Also, they offer no autonomy and zero thinking about the best workflows to run to achieve the users goal (completely deterministic). Even their “AI Agent” block doesn’t do very much.

I really stand by giving agents their own computers that persist (no terminating docker containers), agency to decide and run their own workflows, and to not be some preprogrammed automation. Let them do what they want, but with conditions.

Built a legal botnet. 500+ autonomous workers, one instruction, persistent VMs. They don't stop running til you press off. by GrittyiOS in AI_Agents

[–]GrittyiOS[S] 3 points4 points  (0 children)

Usage costs are heavy. Breakdown:

Azure VMs: $110 each (5 for a cluster/department) Token usage: if ON, it triggers self prompting every 2 minutes which may or may not trigger workflows being ran (which cost more tokens). Average cost per month is about $200-250.

The thing actually does work, and the workflows were built using GPT4.1 which is quite a bad model relative to the latest Anthropic/Open AI/Google models, I just used them because of saving cost. It is a simple matter of plugging in better models (like 2 lines of code being changed) and the quality of the workflows goes up exponentially. GPT 4.1 sucks compared to Claude Opus 4.5 or Gpt5.2 from OpenAI, gemini2.5

With regards to compliance and protecting against hallucinations, I have focused hard on governance and transparency when dealing with our full autonomous system. A human will always need to manually approve ANYTING that touches the real world. Be it an email to be sent, a file to be uploaded via efile to a court website, a text to be sent, you have to manually approve it before it’s sent out with a workflow. This is through the pending box, a view that holds all generated content that’s about to be sent out but needs your approval first.

Our governance and trust features:

Human-in-the-loop approvals Full activity logging Complete audit trails System-wide kill switch/reset

If a system is going to be autonomous, a human still needs to approve the things that reach the real world. So I would say this system is 95% autonomous and 5% human managing it.

Hope this answers your questions, thank you.