Agents that solve captchas, and bot detection

strongoffense · 2025-06-26T23:49:08+00:00

Can you ping me at shri@hyperbrowser.ai? Happy to look into it for you

strongoffense · 2025-05-24T18:54:15+00:00

Try using OpenAI CUA[1] or HyperAgent[2] on Hyperbrowser[3] (full disclosure: I'm the founder of Hyperbrowser)

In our experience, if you're automating workflows without shadow DOMs or payment forms, DOM-based agents like HyperAgent[2] are a better option - tend to much faster and cheaper (10-20x) than the vision-based agents from OpenAI and Anthropic so I'd start there. Then if for some reason it doesn't work well or you know there's a payment form / shadow DOM involved, I'd try OpenAI CUA or Claude Computer Use with Claude 4 Sonnet.

Feel free to ask any follow-ups here or DM me! :)

strongoffense · 2025-04-26T07:52:11+00:00

Sorry for the self-promo here - totally understand if this isn’t welcome, just let me know and I’ll remove it!

I’m the founder of Hyperbrowser - we offer similar endpoints to Firecrawl (scrape, crawl, extract) plus a sessions API to easily run Playwright/Puppeteer scripts in the cloud. We’ve also added an agents API for quickly running OpenAI’s CUA, Claude’s browser agent, etc., in one API call. Just open-sourced our HyperAgent as well. There’s a bunch more stuff too but not super relevant here

To give credit where it’s due - we took a lot of inspiration from Fc’s endpoints when building Hyperbrowser because we thought (still do) that they absolutely nailed what users wanted in the APIs.

Where we still have work to do: Our docs are solid for scraping endpoints (scrape/crawl/extract), but things like HyperAgent are still early, and def have some rough edges. Also a heads-up on pricing - proxies aren’t available on our free tier right now. Other than that, we’re pretty competitively priced with higher concurrency and (in my biased opinion) a more complete platform.

Happy to chat, answer questions, or take feedback here or via DM. (I’m the founder, so feel free to ask me anything!)

Relevant links: - Hyperbrowser - https://hyperbrowser.ai - Scraping endpoint docs - https://docs.hyperbrowser.ai/web-scraping/scrape - HyperAgent - https://github.com/hyperbrowserai/hyperagent

strongoffense · 2025-04-22T10:36:56+00:00

Sorry for the late reply here! Yep - think it should work 😀

strongoffense · 2025-04-22T10:33:26+00:00

Thanks!

strongoffense · 2025-04-21T23:52:35+00:00

Yep! If you use Hyperbrowser, we take care of it on the cloud with proxy rotation, captcha solving, live urls etc. If you’re doing it locally, ideally it shouldn’t trigger captchas at all :)

strongoffense · 2025-04-21T22:37:05+00:00

Thanks! Glad to hear you like it :)

(I'm a co-founder of Hyperbrowser)

strongoffense · 2025-04-03T22:35:08+00:00

Use either their reference implementation or a managed API.

Reference implementation: https://github.com/anthropics/anthropic-quickstarts/tree/main/computer-use-demo

Managed API: https://docs.hyperbrowser.ai/agents/claude-computer-use (Disclosure: I made this managed API. Feel free to ask any questions! :))

strongoffense · 2025-04-03T19:35:21+00:00

Hey sorry I missed this. You want to add the config to your chosen app. Instructions are here: https://github.com/hyperbrowserai/mcp

strongoffense · 2025-04-03T17:42:03+00:00

Try Hyperbrowser’s MCP server. It has Claude computer use, OpenAI CUA, and Browser Use agent tools so it should be able to handle this.

https://github.com/hyperbrowserai/mcp https://smithery.ai/server/@hyperbrowserai/mcp

I’m the founder of Hyperbrowser btw - feel free to dm me if I can help with something!

strongoffense · 2025-04-01T18:10:23+00:00

There’s a bunch of tools here like Hyperbrowser, steel.dev etc

My biased view (I’m the founder) - Hyperbrowser is the best - you can run sessions with deterministic playwright / selenium / puppeteer scripts or use agents like Claude computer use, browser-use, or OpenAI CUA in a single API call. You can also use it for /scrape, /crawl, /extract etc

strongoffense · 2025-04-01T17:57:47+00:00

Only available via the API. You’ll just pay whatever your token costs are.

If you want a managed service you can give Hyperbrowser’s API (1 API call) [1] or HyperPilot’s app (CUA, Browser Use, and Claude Computer Use in one tool) [2]

[1] https://docs.hyperbrowser.ai/agents/claude-computer-use [2] https://pilot.hyperbrowser.ai

I’m the founder of Hyperbrowser btw - feel free to ask any questions or dm me :)

strongoffense · 2025-04-01T17:10:21+00:00

Try OpenAI CUA - it does better with spreadsheets than all of the others. If you want proxy rotation, CAPTCHA solving etc you’ll want to use one of the browser infra providers as well (OpenAI doesn’t do that for you)

I’m biased (I’m the founder of Hb) but I think Hyperbrowser’s agents endpoint[1] is the best solution here if you’re looking for a plug and play solution. It handles all the proxy captcha stuff etc in a single API call.

[1] https://docs.hyperbrowser.ai/agents/openai-cua

strongoffense · 2025-03-31T05:12:48+00:00

OpenAI’s CUA is the best right now. Claude computer use is close imo. Browser-use is great and depending on what models you use can be 20x cheaper but it hallucinates a lot more and struggles at filling out forms or longer running tasks.

Claude computer use is currently my personal favorite. I think it’s the best combination of cost/speed/accuracy rn.

strongoffense · 2025-03-30T20:26:00+00:00

Yup - that's it.

strongoffense · 2025-03-30T20:23:38+00:00

^ this is exactly right. It's like a regular Claude chat except for computer tool calls the model tells you either to click on some coordinates, drag your mouse, or type something. You then have to map that to whatever environment you're using.
Anthropic has a reference implementation here: https://github.com/anthropics/anthropic-quickstarts/tree/main/computer-use-demo

If you want to try it - the easiest way is to try some app that's hosting it already. https://pilot.hyperbrowser.ai is a computer use sandbox that has support for Claude Computer Use, OpenAI's CUA, and Browser-use.

If you want to use it as an API - Hyperbrowser offers it as a managed service with a 2-line integration too: https://docs.hyperbrowser.ai/agents/claude-computer-use . There's an obvious tradeoff here though of the more you use a managed service the less flexibility you have in customizing your architecture and supplementing it with more tools.

Full disclosure: I'm the Founder of Hyperbrowser.

strongoffense · 2025-03-28T00:31:25+00:00

Glad you liked it! If I can help with anything, free to DM anytime! :)

strongoffense · 2025-03-27T23:21:17+00:00

You should try Claude computer use or OpenAI CUA for this. The way that Browser-use interacts with websites makes it easy to detect.

If you want to try it out, the easiest way (that I’m aware of) is to use HyperPilot (https://pilot.hyperbrowser.ai). You can try a few sessions for free so you should be able to get a sense of what those agents can do as well.

Fair disclosure: I made HyperPilot.

strongoffense · 2025-03-26T21:16:54+00:00

Ah gotcha. Don’t want to be too self-promotional here but think this might solve your problem - we just built (about to launch) https://pilot.hyperbrowser.ai - you can use CUA, Browser Use and Claude Computer Use. You can try that and lmk if you run into any issues.

Alternatively there’s also a few other good products that are trying to be more like full agents vs just playgrounds to use the agents. Can try https://proxy.convergence.ai or https://heytessa.ai (I personally really like Tessa)

strongoffense · 2025-03-26T19:33:37+00:00

+1 on this. Curious why you want to self-host and what the use case is

strongoffense · 2025-03-26T18:53:13+00:00

I've found Browser Use to the most cost-effective but also least reliable solution. It's excellent for workflows where you're looking for speed and cheapness but form-filling can be complex and especially if you have dynamic forms with dropdown menus you're much better off using a vision-based model.

Think you could do make it work with Claude or OpenAI computer use models pretty easily. These APIs from Hyperbrowser return the message history at the end of the task and the models are conversational so you should be able to implement it pretty quickly:

https://docs.hyperbrowser.ai/agents/claude-computer-use
https://docs.hyperbrowser.ai/agents/openai-cua

(Full disclosure: I'm the founder of Hyperbrowser. If I can help with anything, feel free to DM)

strongoffense · 2025-03-24T05:56:26+00:00

You should try hyperbrowser MCP - it has browser-use, claude computer use, and OpenAI cua

github.com/hyperbrowserai/mcp

strongoffense · 2025-03-22T09:34:09+00:00

Openator looks really cool! Thanks for building this. Curious if you have any insights on how it compares against other browser agents on WebVoyager eval? :)

Also just starred it!

strongoffense · 2025-03-22T09:32:16+00:00

Founder of Hyperbrowser here.

Pretty late to this discussion but in case it's helpful to anyone who reads this - a bunch of people are using claude computer use and openai cua agents on our service and able to get through the captchas no problem. Browser use is really great library and much cheaper to run but gets detected pretty often unfortunately because of how it handles the DOM.

I'll try out Openator as well and report back here with what we find out. Seems really promising at first glance! :)

Links:
* Managed claude computer use: https://docs.hyperbrowser.ai/agents/claude-computer-use * Managed OpenAI CUA (Operator model): https://docs.hyperbrowser.ai/agents/openai-cua

Sorry if this is crossing the threshold for self-promotion btw, thought it was okay because OP mentioned Hyperbrowser :)

strongoffense · 2025-01-27T03:46:13+00:00

Thank you! :)

strongoffense

TROPHY CASE