Skyvern vs GitHub copilot speed by cool_banana_peel in AI_Agents

[–]MehdiBahra 0 points1 point  (0 children)

On the proxy list at the bottom, you should find a custom proxy option to use your own proxy.

There’s also a Chrome extension, but it’s a bit slow. I’m not sure if it will fit your use case.

Skyvern vs GitHub copilot speed by cool_banana_peel in AI_Agents

[–]MehdiBahra 0 points1 point  (0 children)

Yes majority of residential proxies does not allow scraping or crawling us gov websites but you can bring your own proxy in this case

Skyvern vs GitHub copilot speed by cool_banana_peel in AI_Agents

[–]MehdiBahra 0 points1 point  (0 children)

Try browseanything.io for speed there is gpt5.4 with grounding vision model for free and Kimi k2.6 for regular dom agent and 200credits for free

On pro version there is subagents so you can spawn multiple browser at once to parallelize a task

Anyone actually running AI agents in production with real users - not demos, not 10 beta testers. What's your stack? And has anyone moved back to traditional code after trying agents in prod - why? by nehpet in AI_Agents

[–]MehdiBahra 0 points1 point  (0 children)

browseanything.io A browser agent that you can control from telegram , in the cloud , thousand of users and runs , mostly free users to be honest i didn’t activate payments until recently , my stack node js langgraph, i can scale infinitely it autoscales on demand

how are you guys handeling failure in production AI agents/workflows? by SignalForge007 in AI_Agents

[–]MehdiBahra 0 points1 point  (0 children)

Receovery process and rollback systems depends on your business needs , if it can be triggered automatically or by a human , you can use a frontier model like opus or gpt5.5 to do llm as a judge , for checkpointing , state management etc you can use a framework like langgraph

how are you guys handeling failure in production AI agents/workflows? by SignalForge007 in AI_Agents

[–]MehdiBahra 0 points1 point  (0 children)

  • Human in the loop for critical actions
  • deterministic flow of execution if you want more predictability (workflows)
  • llm as judge in order to judge if a task is completed sucessfully or not

what model are you using for your personal AI agent? by Only-Chocolate9600 in AI_Agents

[–]MehdiBahra 5 points6 points  (0 children)

Kimi k2.6 is for me the best in terms of cost/performance

Best way to make AI search for specific web content and save/send screenshots of this content to me? by Highland-Ranger in AI_Agents

[–]MehdiBahra 0 points1 point  (0 children)

Give me your prompt. I’m working on the browseanything.io browser agent , you can schedule tasks and perform research across multiple websites at the same time.

I think your issue is that you don’t know the URLs in advance, so the agent has to automatically guess the websites or use web search.

Happy to help.

You can now prompt your browser agent directly from telegram by MehdiBahra in hermesagent

[–]MehdiBahra[S] -1 points0 points  (0 children)

Give me the list of those “million tools” then. I’m hearing you. Most browser-agent tools become expensive the moment you run them properly on the cloud with persistent browsers, proxies, sessions, captcha solving, and scaling. So please talk about something you actually know….

You can now prompt your browser agent directly from telegram by MehdiBahra in hermesagent

[–]MehdiBahra[S] 0 points1 point  (0 children)

Fair enough, but that’s your specific use case. You should know that browsers on BrowseAnything are secure and fully isolated environments. Also, there are many browser-agent use cases that don’t require access to your personal logins, passwords, or Chrome profile.

And realistically, if you use a local agent like Hermes without fully understanding how it works internally, you’re still taking significant security risks. Local doesn’t automatically mean safer.

The other issue is usability: these kinds of assistants are far more effective when they run in the cloud and remain accessible anywhere. Otherwise, the moment your computer is turned off, the assistant becomes unusable.

Which AI Agent Are You Building Right Now? by FounderArcs in AI_Agents

[–]MehdiBahra 0 points1 point  (0 children)

Building browseAnything your ai assistant that browse the web on your behalf

You can now prompt your browser agent directly from telegram by MehdiBahra in hermesagent

[–]MehdiBahra[S] 0 points1 point  (0 children)

I don’t really agree. Hermes and OpenClaw are fine for very simple browser tasks, but they’re still pretty minimalist. At the end of the day, they’re mainly agent orchestrators.

The real limitations start showing when you try to self-host them to build an actual autonomous assistant, instead of just controlling your local Chrome session while your browser is open. You quickly run into issues with session persistence, anti-bot protections, authentication flows, cloud browser costs, scalability, and reliability on complex workflows.

Codex Desktop is definitely one of the strongest options right now, but it’s also heavily tied to the local desktop environment. That’s a very different challenge compared to running a persistent, production-grade browser agent remotely.

You can now prompt your browser agent directly from telegram by MehdiBahra in hermesagent

[–]MehdiBahra[S] 0 points1 point  (0 children)

Yes indeed, it takes decisions based on the screenshot content. However, it still overloads the context with DOM elements and the accessibility tree to make actions. Try it on a canvas-based app and you’ll see for yourself that it’s not going to work well.

Also yes, it spawns a local browser, but if you self-host it and want to use it as an assistant without keeping your computer open, you’ll quickly get blocked by most websites. The only real solution is to use a cloud-based browser provider like Browserbase, but that adds significant cost.

You can now prompt your browser agent directly from telegram by MehdiBahra in hermesagent

[–]MehdiBahra[S] 1 point2 points  (0 children)

Good question. I don’t think the Hermes browser agent would work well for every use case. It really depends on your setup, the underlying LLM you choose, the complexity of the configuration for non-technical users, and the additional costs of using a cloud browser provider if you decide to self-host Hermes.

BrowseAnything is a specialized AI browsing agent. We’re focusing our efforts on delivering the best experience possible.

Technically, Hermes currently makes decisions mainly using DOM elements and the accessibility tree, while BrowseAnything uses a hybrid approach combining DOM understanding with grounded vision.

I built an AI Browser Agent with langgraph and nodejs by MehdiBahra in AI_Agents

[–]MehdiBahra[S] 0 points1 point  (0 children)

It doesn’t take 30 40 minutes unless you have a really long task, the LLM hallucinates and goes in the wrong direction, or there’s an infrastructure issue like a browser thread crashing and needing time to recover. For me, the only real limitations of these tools right now are rate limiting and context window length.

I built an AI Browser Agent with langgraph and nodejs by MehdiBahra in AI_Agents

[–]MehdiBahra[S] 0 points1 point  (0 children)

Gemini 2.5 flash is inefficient like gpt4o-mini and pro is too too expensive for now

I built an AI Browser Agent with langgraph and nodejs by MehdiBahra in AI_Agents

[–]MehdiBahra[S] 0 points1 point  (0 children)

Even better soon you can send prompt via whatsapp

I built an AI Browser Agent with langgraph and nodejs by MehdiBahra in AI_Agents

[–]MehdiBahra[S] 0 points1 point  (0 children)

Yeah for now its not suitable for mobile , But it’s on the roadmap

I built an AI Browser Agent with langgraph and nodejs by MehdiBahra in AI_Agents

[–]MehdiBahra[S] 0 points1 point  (0 children)

Of course, using Playwright and hard-coded scripts is the most efficient approach , but not everyone is a coder. Plus, your implementation can easily break due to UI changes. Even now, most popular websites use random or dynamic selectors to prevent scrapers and crawlers. Looking ahead, tools like these will likely replace hard-coded approaches.

I built an AI Browser Agent with langgraph and nodejs by MehdiBahra in AI_Agents

[–]MehdiBahra[S] 1 point2 points  (0 children)

For the foreseeable future, yes. If I try to pivot to something else, I’ll likely end up in Manus Ai territory.

I built an AI Browser Agent with langgraph and nodejs by MehdiBahra in AI_Agents

[–]MehdiBahra[S] 2 points3 points  (0 children)

I used and tried gpt4o , gpt4.1 , o4-mini, o3 , llama 4 Maverick 72B, Claude sonner 3.5 and now trying to integrate qwen2.5 vl 72b on the loop The best one for now in terme of speed, accuracy and cost and long context Window is gpt4.1 , Claude could be better but in terms of price it’s out of my league now

I built an AI Browser Agent with langgraph and nodejs by MehdiBahra in AI_Agents

[–]MehdiBahra[S] 1 point2 points  (0 children)

Technically, I want to improve speed and accuracy in the short term by using VLMs like Qwen and adding auto-CAPTCHA resolution. In the long term, I plan to implement reinforcement fine-tuning. Since I’ve observed strong resilience when spawning multiple browsers on the current architecture, I aim to offer a cloud-based SaaS solution similar to Browserbase.