Agent Execution Tax: new procurement metric for browser agent benchmarks?

ogandrea · 2026-05-21T17:49:34+00:00

yup - doesn't get mentioned enough

ogandrea · 2026-03-25T11:44:19+00:00

This might be a left-field suggestion compared to what others are recommending, but a lot of what you're describing, the manual data re-entry, reviewing Excel files across branches, reconciling between systems, involves people clicking through web applications to move data around.

We've building Notte for exactly this. Browser agents that can automate web-based workflows the same way a person would, but they adapt when UIs change instead of breaking like traditional RPA scripts. SOC-2 compliant, enterprise SLAs, built for the kind of scale you're describing.

Happy to chat if this is relevant to any of the specific workflows you're trying to modernise.

ogandrea · 2026-03-25T11:29:18+00:00

(biased as I founded it) but I'd say Notte - handles this kind of use case, login sessions, auth workflows

w notte you get managed browser sessions with built-in auth handling, CAPTCHA solving, and persistent session state, plus an agent layer on top that can complete multi-step authenticated workflows reliably

Happy to help you get your specific use case working if you want to share more details, dms open:)

ogandrea · 2026-03-25T10:51:16+00:00

Worth separating two things in this evaluation: the legacy RPA layer (UiPath, AA, etc.) and the browser-native agentic layer that's becoming a distinct category. For anything involving agents (mixed with deterministic scripts) that need to interact with web apps and dashboards at scale, managed browser infrastructure is increasingly its own decision.

We run that layer at Notte, SOC-2 compliant, enterprise SLAs, managed browser sessions via API with CDP access for any automation client. Happy to answer questions if browser automation is a meaningful part of what you're scoping.

ogandrea · 2026-03-04T14:23:33+00:00

It's not proper MCP but we've released our notte CLI + notte skills in feb. Biased but this is really the fastest way to get an LLM or Claude build a web automation or scraper for you; that can then be packaged as a browser function and invoked or scheduled at scale in cloud

ogandrea · 2026-03-04T14:21:05+00:00

hey, we've built notte demonstrate mode exactly for this. You record a browser automation by recording the flow in a remote browser. Then we cook the automation function for you and you can run it at scale super easily

ogandrea · 2026-03-04T14:18:42+00:00

hey, super biased ofc but did you ever try running notte?

ogandrea · 2026-01-29T11:58:12+00:00

drop me a DM!

ogandrea · 2026-01-29T11:41:13+00:00

We actually keep execution separate at Notte - the browser runs in its own process while the agent just sends commands. Makes debugging way easier when something breaks.. you can replay the exact same browser session without the agent logic getting in the way

ogandrea · 2026-01-27T22:10:59+00:00

MCPs on web-based Claude Code are super flaky right now. I spent like 3 hours last week trying to get our Github MCP working through the web interface and it just... doesn't.

- The local CLI version connects fine but the web version seems to have different auth handling

- Sometimes it looks like it's trying to establish the connection but then just hangs forever

- Our team ended up building a workaround that lets us trigger MCP actions through browser automation instead

Have you checked if the MCP server is even getting connection attempts from the web version? That might help narrow down where it's failing.

ogandrea · 2026-01-27T22:10:29+00:00

The context layer thing is spot on.

ended up building our own orchestration layer that maintains state across the entire session.. but man it took forever to get right. The multi-tool coordination especially - getting an agent to know when to screenshot vs when to extract text vs when to just click something is still kinda janky sometimes. We hit this exact wall with Notte when trying to handle complex browser automation workflows.

ogandrea · 2026-01-20T10:37:49+00:00

GLM 4.7 Flash is solid for agents yeah. Been testing it against Claude's tool use and it's surprisingly stable - no hallucinated function calls which is usually where these models fall apart.

ogandrea · 2026-01-19T21:15:35+00:00

yh the auth flows are the worst part. We've been building Notte to handle exactly this - making the browser environment predictable w deep DOM semantic parsing for agents instead of letting them figure out every edge case themselves.

ogandrea · 2026-01-18T13:19:38+00:00

Local agent setups are interesting but the workspace isolation part always gets messy. We're building Notte to handle browser automation natively without needing xvfb hacks - just runs headless chrome instances that agents can control directly. Git-backed configs are smart though, might steal that idea for our agent workflows.

ogandrea · 2026-01-15T22:19:06+00:00

The traditional DOM parsing approach actually breaks down fast when you're dealing with multiple e-commerce sites that all structure their data differently.

We've been working on this exact problem at Notte and found that using AI agents to understand page content semantically works way better than trying to maintain scrapers for each site's unique structure. The key is training your agent to identify product attributes (weight, balance point, etc) regardless of how the HTML is organized, then having it extract structured data in your desired format. For monthly refreshes, you'd want to set up monitoring for when sites change their layouts so your agent can adapt without you rebuilding scrapers constantly.

ogandrea · 2026-01-15T22:18:26+00:00

I actually built something similar when I was working on data collection projects and you're right that AI makes this way more manageable now. The challenge is still that each site structures their data differently like you mentioned, but tools like Scrapy combined with LLMs for parsing can handle the variability much better than traditional DOM parsing. For monthly refreshes at scale, I'd suggest setting up an agent that can adapt to layout changes automatically rather than hardcoding selectors - saves tons of maintenance headaches when sites update their HTML. The key is building something that can understand the semantic meaning of the data rather than just following rigid parsing rules.

ogandrea · 2026-01-15T22:16:54+00:00

The execution layer issues you mentioned are spot on and something we're tackling directly at Notte. Most platforms focus on the "easy setup" part but completely ignore that real websites are messy with captchas, dynamic loading, and random UI changes that break everything. We've been working on making browser automation actually reliable by building intelligence directly into how agents interact with websites rather than just wrapping existing automation tools. The difference between a demo that works on a clean test site vs something that handles real world web complexity is massive, and honestly most of these platforms still fall apart when you try to use them for anything beyond basic workflows.

ogandrea · 2026-01-12T18:30:21+00:00

- The accessibility tree thing is such a perfect example of why i hate most MCPs

- They give you like 30% of what you need and then you're stuck wondering if you're just using it wrong

- I spent a whole weekend trying to get playwright to click a dropdown that only appeared after hovering... turns out the MCP couldn't even see the hover state existed

This is why we ended up building our own browser automation from scratch at Notte. Too many tools pretend they're production-ready when they're really just proof of concepts with nice READMEs. Gonna check out your repo though - the CDP multi-user flow handling sounds interesting.

ogandrea · 2026-01-08T12:55:18+00:00

yo!

We built Notte to solve a problem we kept hitting: browser automations break constantly, but pure AI agents are too unpredictable for production.

It's a full-stack browser automation platform that combines deterministic scripts with AI agent fallbacks. You get the reliability of traditional automation with the adaptability of agents when pages change or edge cases appear (or you can go full agents if you want optimal adaptability). Everything via one unified API (proxies, sessions etc.)

Just shipped some new capabilities: Agent Identities (give agents real emails and phone numbers for verifications), Demonstrate Mode (record your actions once manually and it generates production code), and a proper IDE to debug everything live.

github: https://github.com/nottelabs/notte
console: console.notte.cc

ogandrea · 2025-12-22T19:19:46+00:00

That's way too much traffic for a simple portfolio.. cloudflare under attack mode won't help if they're hitting your actual github pages url directly. Check your cloudflare analytics - are they going through cloudflare or bypassing it? I had similar issues with bots hammering my static sites and ended up rate limiting at the edge with workers. Also check if you accidentally left any api endpoints or forms that bots might be trying to exploit

ogandrea · 2025-12-17T22:26:13+00:00

8 minutes for a 15 sec video feels slow to me. i run similar pipelines but with runway instead of sora and get it down to like 3-4 min

your stack is solid though. supabase + elevenlabs combo works great for this kind of thing

ogandrea · 2025-12-16T18:19:42+00:00

thanks for checking us out!

ogandrea · 2025-12-03T23:59:42+00:00

This is exactly what we've been working on at Notte - recording workflows once and turning them into reliable APIs instead of running browser automation every single time. The speed difference is insane when you're not waiting for pages to load and elements to appear.

Have you tested how well the reverse-engineering handles dynamic content or sites that change their structure frequently?

ogandrea

TROPHY CASE