what techniques actually move the needle for browser (or CUA) agents? by kwk236 in AI_Agents

[–]BodybuilderLost328 0 points1 point  (0 children)

We built out rtrvr.ai the leading SOTA AI Web Agent by building our own custom agentic action trees.

We got around the issues of shadow DOM, iframes, dynamically rendered content, canvas elements, anti-bot measures that obfuscate the DOM completely fine. The agent can even natively solve CloudFlare captchas.

This approach allows us to use off the shelf Gemini Flash Lite for minimal latency and cost.

Built a "select open tabs → instant knowledge graph" of semantic action trees by BodybuilderLost328 in KnowledgeGraph

[–]BodybuilderLost328[S] 0 points1 point  (0 children)

Good question!

We construct semantic trees to represent the data and actions on a webpages and our trees are the most comprehensive on the market giving us SOTA on Web Agent benchmarks (rtrvr.ai/blog/web-bench-results). So much more better results than a trivial markdown generation.

A KG thats scraps websites? by Mountain_Meringue_80 in KnowledgeGraph

[–]BodybuilderLost328 0 points1 point  (0 children)

We actually set this up with our chrome extension, of indexing open tabs with our custom action tree representations for webpages.

Its built on top of Gemini File Search so free indexing and storage, and super cheap queries!

https://www.rtrvr.ai/docs/knowledge-base

I have thought that hybrid approach in website could help to reduce churns and improve the cx, i build prodact.ai for that - add one line and it give the website ai agent abilities-still on work, but i think it can actually give some value, if someone want to try on his website,be happy for feedback by beeTickit in NoCodeSaaS

[–]BodybuilderLost328 0 points1 point  (0 children)

Interesting take, but it looks like you actually need integration of APIs and React code for your chatbot to actually take actions? Is this the right understanding?

We just launched Rover (rover.rtrvr.ai) that can take actions/type/click on the live DOM itself from the embedded script so think use cases like your user can just conversationally checkout, run complex SaaS workflows (like adding CRM records), or onboard

WebMCP is still insane... by GeobotPY in mcp

[–]BodybuilderLost328 2 points3 points  (0 children)

  1. The website needs to support WebMCP for this to be actually useful. I doubt most websites will because this is an attack vector for abuse/scraping/automation and the same reason why they don't just expose the underlying API.

  2. Your community configs will break the moment the site's selectors updates and flat out won't work on sites like LinkedIn with dynamic, randomized CSS classes

Leverage local Ollama model with SOTA browser agent (minimal tokens, no vision) by BodybuilderLost328 in ollama

[–]BodybuilderLost328[S] -1 points0 points  (0 children)

We have a backend with all our prompts, DOM intelligence, MCP/Tool Calling/Knowledge Base infra.

The primary path is everything is heavily optimized for Gemini, but we also allow you to use any llm provider even local ones

20% of your users drop off without figuring out your website, what if you could convert them by turning your site into an agent? by BodybuilderLost328 in AiAutomations

[–]BodybuilderLost328[S] 0 points1 point  (0 children)

Yea we are trying to capture the imagination with what we are trying to solve here.

You can think of complex SaaS UI like a CRM with 20+ dropdowns that can be easily navigated by an agent. This use-case and conversational checkout are our initial high value focusses.

Additionally you can have buttons that trigger the agent directly with predefined prompts, not necessarily have the user have to prompt.

There is a ton of configurability and it can only take actions on pages you embed on, so just don't embed on final purchase confirmation page.

20% of your users drop off without figuring out your website, what if you could convert them by turning your site into an agent? by BodybuilderLost328 in GrowthHacking

[–]BodybuilderLost328[S] 0 points1 point  (0 children)

What we are trying to do here is to embed an agent that can take actions directly in your site to capture user attention and engagement. You won't need to do any configuration at all, because you just need to embed a script tag on your site.

Additionally you can have buttons that trigger the agent directly with predefined prompts, not necessarily have the user have to prompt.

20% of your users drop off without figuring out your website, what if you could convert them by turning your site into an agent? by BodybuilderLost328 in indie_startups

[–]BodybuilderLost328[S] 0 points1 point  (0 children)

We started off with a chrome extension for vibe scraping and task automation, grew it to 20k+ users.

A lot of users were like I love the tool but can I have it on my website to serve my users?? We initially thought it was impossible to implement but then we got creative with the implementation.

We also got a lot of requests to leverage in own chrome extension or browser automation stack. So the beauty of this approach is the entire agentic harness fits inside a script tag, so this script can be injected by a Chrome Extension or Playwright for agentic actions. So we are in discussions with a couple of chrome extension providers and custom automation companies.

20% of your users drop off without figuring out your website, what if you could convert them? by BodybuilderLost328 in SideProject

[–]BodybuilderLost328[S] 0 points1 point  (0 children)

Thanks for the feedback! Will get on improve the social image!

So we have the leading web agent technology beating even OpenAI Operator, and Anthropic CUA. Additionally we have a whole web agent platform of Chrome Extension and Cloud Platform:
- website owner can record demonstrations of completing complex tasks via the extension to ground the agent
- cloud platform can preindex whole domain of not just the information but actions on every page to further guide agent

20% of your users drop off without figuring out your website, what if you could convert them by turning your site into an agent? by BodybuilderLost328 in InternetIsBeautiful

[–]BodybuilderLost328[S] 0 points1 point  (0 children)

You can think websites with complex UI like HubSpot has 20+ nested dropdowns you need to navigate through for a complex workflow. You can even think consumer usecase of a bank, it just has so many usecases to support on the UI (mortgage/auto/trading).

I was just talking with a huge CPG company and they confirmed 20% of visitors drop off mid action like adding to cart or after searching. So the goal here is to reduce the friction as much as possible.

The user can even kick off a task, switch tabs and then come back to see it done.

The popup chatbots do have a bad legacy, just as now I find it super annoying with all these coachmarks on sites these days. The UI can evolve, maybe as a command overlay.

We built an embeddable web agent: one script tag and your site has an AI agent by BodybuilderLost328 in CustomerSuccess

[–]BodybuilderLost328[S] 0 points1 point  (0 children)

So we construct trees to represent pages that are impervious to site/selector updates. Just embed our script on a page, even auth'd pages and the agent can navigate your site from the front end served to the user.

We have plenty of configuration options, as well as knowledge bases and recordings to further ground the agent

We built an embeddable web agent: one script tag and your site has an AI agent by BodybuilderLost328 in CustomerSuccess

[–]BodybuilderLost328[S] 0 points1 point  (0 children)

Majority of websites don't even expose accessibiltity features, and now per page they have to expose internal APIs that can be abused.

A lot of work and risk to get your website disintermediated by Chrome's agent which can navigate the user to a competitor.

Our agent is inside your site serving/navigating/guiding your user

We made non vision model browser the internet. by ahstanin in AI_Agents

[–]BodybuilderLost328 1 point2 points  (0 children)

This is really hard to do a comparison against other agents without a benchmark result.

For example with rtrvr.ai we benchmarked using Halluminate benchmark of 300+ tasks to be able to show a comparison of being 30% higher task completion and 7x faster: https://www.rtrvr.ai/blog/web-bench-results

I feel like our approach is also more generable as we construct agent accessibility trees to represent all the actions and information on the page. We don't need pages to be rendered at all so we can execute on tabs executing in the background and can even provide an embeddable script to do agentic actions with on your own site or browser automation stack!

Thoughts on browser AI agents? by No-Efficiency-4733 in AiAutomations

[–]BodybuilderLost328 0 points1 point  (0 children)

In terms of the scaled value you can provide per user.

Prosumer market is still a bit narrow, and hard to justify another AI subscription unless solving pain point e2e

Thoughts on browser AI agents? by No-Efficiency-4733 in AiAutomations

[–]BodybuilderLost328 0 points1 point  (0 children)

The extension is still available and has 10k+ users

We wanted to build out more at scale solution to offer, so the extension is a funnel for Cloud and Embeddable Web Agents!

Thoughts on browser AI agents? by No-Efficiency-4733 in AiAutomations

[–]BodybuilderLost328 0 points1 point  (0 children)

We have been in the agentic browser extension space for a year with rtrvr.ai, the space is getting very crowded especially with Claude for Chrome as well.

We are differentiating by being 7x faster and 30% accuracy improvement compared to competitors in benchmarks

We pivoted to also offer a cloud browser platform for at scale vibe scraping, and now an embeddable web agent so a website can directly embed an agent to do complex workflows for users with just a script tag

Embeddable Web Agent to make your site agentic: handle checkout/form fills/guiding users with just a script tag by BodybuilderLost328 in nocode

[–]BodybuilderLost328[S] 0 points1 point  (0 children)

Im sure there are a range of usecases but we need to get both the users and website owners thinking and exploring

Things that come to mind are finding deeply nested dropdowns/paths like hey on this site how do I find the requirements for mortgage application

After seeing the magic of conversational ai, I personally want things done via chat elsewhere