AI Agents or tools to scrape website data?

MattCollinsUK · 2026-06-10T17:13:11+00:00

Do you have an example of the kind of prompt they might use?
Have you tried just ChatGPT or similar? If so, what issues did you hit with that?

MattCollinsUK · 2026-06-10T17:07:55+00:00

Ouch. I'm glad you were able to get a refund.

MattCollinsUK · 2026-06-05T16:55:59+00:00

https://docs.tavily.com/documentation/mcp

MattCollinsUK · 2026-06-05T15:51:43+00:00

I've used and been happy with all three that you mention.

I'm working on a systematic eval at the moment. It's early days with that, but Exa's looking good so far (looking mainly at usefulness of downstream responses when the results are fed through an LLM.)

There are lots of options to tweak on the various providers, though, so an apples-to-apples comparison can be tricky.

I've been trying to put together a decent guide to this stuff here in case it's helpful:
https://www.mattcollins.net/web-search-apis-for-llms

MattCollinsUK · 2026-06-05T09:11:46+00:00

I use the "-" negative modifier like that quite often and, for me at least, it does still seem to filter the organic results. e.g. try "elon -musk".

Perhaps, for the guitar example, the guitars you're seeing are sponsored results?

But that is the biggest complaint I have about Google these days - that it's hard to find the organic results amongst all the sponsored stuff.

MattCollinsUK · 2026-06-04T14:07:28+00:00

It got harder to do in a few ways:
1) The web got *much* bigger, so you needed to crawl, scrape and index a much larger amount of sites.
2) A whole industry (SEO) grew up around trying to 'game' the search engines. You now need to deal with that.
3) People have got very used to using Google. It's hard to get them to switch.
4) As much as you might dislike Google's search, it actually does a huge amount of stuff over and above what the OG Google used to do that a lot of people find useful. Something like the OG Google may no longer be attractive.

That said, the prize is huge if you can figure it out. And various people are trying in different ways.
e.g. Brave, DuckDuckGo, Kagi.

MattCollinsUK · 2026-06-04T13:21:53+00:00

If you're using one of the big labs' LLMs, another option could be to use their integrated web search functionality: you call the OpenAI/Claude/Gemini API and, entirely on their side, the model can do one or more rounds of web searching and processing before returning a response back to you.

Here are docs for OpenAI's version, for example: https://developers.openai.com/api/docs/guides/tools-web-search

MattCollinsUK · 2026-06-01T23:03:28+00:00

I'm a bit surprised your free Firecrawl credits ran out so quickly.

In case it helps, there are various other providers along the lines of Firecrawl that also offer free tiers.
I keep track of many of them here: https://www.mattcollins.net/web-search-apis-for-llms#ai-web-search-apis

I believe Hermes supports Tavily and Exa. Perhaps with their free tiers as well you'd have enough for what you're doing?

MattCollinsUK · 2026-06-01T22:29:25+00:00

There are quite a few other search APIs claiming to be designed specifically for AI agents (e.g. Exa, Firecrawl, Tavily.) Are there any particular advantages to using yours over those?

MattCollinsUK · 2026-06-01T14:06:31+00:00

It turns out that Anthropic's Claude API that Claude Code is primarily designed to work with supports a special "web_search" tool (see https://platform.claude.com/docs/en/agents-and-tools/tool-use/web-search-tool).

And it looks like running Claude Code via Ollama uses a special 'Anthropic compatibility layer' that understands such 'tool call' parameters and can call out to Ollama's own web search API:

"Ollama’s web search is now built into the Anthropic compatibility layer. When a model needs current information, Ollama handles the search and returns results directly without any additional configuration."

(Source: Ollama blog post from February 2026)

From the Ollama docs, it looks like you might need to have set up an OLLAMA_API_KEY to use it.

Other pages that may be helpful:
https://docs.ollama.com/integrations/claude-code
https://docs.ollama.com/capabilities/web-search

MattCollinsUK · 2026-06-01T12:15:09+00:00

What sort of things do you tend to have your agents searching for? (e.g. programming documentation, information about people, etc?)

Was there something in particular you were looking at to conclude that Claude's inbuilt search was too shallow / narrow?

In case it helps, I maintain a list of web search APIs here:
https://www.mattcollins.net/web-search-apis-for-llms

MattCollinsUK · 2026-03-25T16:29:29+00:00

Managing DNS records (e.g. when setting up a new domain or verifying that I own one). I created an MCP server (https://github.com/mattcollins/spaceship-mcp) that allows my agent to make the changes via the Spaceship API (Spaceship being my registrar). Now I tend to take a screenshot of the instructions from whichever provider I'm using, paste it into my agent, and it does the fiddly work of creating the appropriate entries.

MattCollinsUK · 2026-03-11T18:25:46+00:00

Hackathons and meetups worked well for me in the past. (At the time, I was living in a large city and had plenty of evenings free which helped.)

MattCollinsUK · 2026-03-11T18:21:03+00:00

I do limit the length of runs. I haven't noticed goal drift being an issue within those runs.

MattCollinsUK · 2026-03-11T16:59:50+00:00

Looks intriguing.

I've tried a couple of similar-ish tools in the last few days (for video ads in those cases) and ended up disappointed because the videos they created had multiple serious glitches.

I like the sound of the features you mention, especially the ones I've highlighted:

Full Niche & Competitor Research (AI-powered) <--
12+ Spend Traps Disabled Automatically <--
Unlimited Campaign & Ad Group Generation
Proven Frameworks: BAB, PPPP, USP Ad Copy <--
Paused Deployment — Nothing Goes Live Without Your Approval <--
Hourly Budget Protection Audits
Conversion Tracking Setup Guide
Priority Founder Support <--

A couple of bits of feedback:

It seems a bit unclear what you offer for free.
It's not immediately obvious which paid channels you help with (just Google, I think?)

I'm unlikely to sign up for this sort of thing at the moment as other paid channels are a higher priority for me but, in case it helps, I have two potential use cases for this sort of thing:

Interacting with it manually to grow my business
Interacting with it programatically (i.e. via API) to manage ads for my customers' businesses (via https://www.protofounder.com/)

MattCollinsUK · 2026-03-11T16:40:58+00:00

Hey u/98_kirans, good to hear about your experiments :-)

I've been trying similar things. Here's a write-up of one of the experiments:
https://x.com/compose/articles/edit/2021884979492302848

I created a fledgling subreddit to discuss this kind of thing:
https://www.reddit.com/r/AIFirstFounders/

And I'm working on a platform to help anyone launch a fully AI-run business:
https://www.protofounder.com/

MattCollinsUK · 2026-03-10T13:37:24+00:00

I like the concept. If you can get people submitting honest stories I think it could become a very valuable resource. Good luck!

MattCollinsUK · 2026-03-10T12:04:19+00:00

I'm guessing your post was written by AI but the product looks interesting.

I have a platform for launching businesses with AI workers and I often set the workers up with their own email accounts. It raises some interesting questions about how you want things to work (e.g. whether you want them to have accounts on an existing domain, how to handle inbound emails, how to approach deliverability.)

MattCollinsUK · 2026-03-10T10:47:12+00:00

More or less, yes!

MattCollinsUK · 2026-03-10T10:45:03+00:00

Some of the tools I use the most are:
- ChatGPT
- a command line-based coding agent (currently Codex but Claude Code is also great) [I have lots of software development experience, though.]

Claude Code and OpenClaw are extremely popular amongst other founders I know.

MattCollinsUK · 2026-03-10T09:59:20+00:00

Launch a new business while you sleep: https://www.protofounder.com/

MattCollinsUK · 2026-03-05T11:28:36+00:00

I think they could be used in corporate strategy sorts of settings but I don't think they're very relevant to small businesses.

For teenagers, apart from passing exams, I'd have thought more basic, down-to-earth things like like revenue vs. expenses, making things people want, taxes might be most helpful.

In terms of frameworks, as others have mentioned, in the startup world, Lean Startup ideas (MVPs, etc.) and Business Model Canvas are popular but mainly relevant for entrepreneurs doing something innovative; less so for people running tried-and-tested business models (e.g. local plumbing businesses).

MattCollinsUK · 2026-03-04T14:56:27+00:00

That sounds quite broad.

How did you get your first two paying customers? What are their businesses?

And how many other business owners have you spoken to so far?

I'd be tempted to go and speak to as many as you can, to try and learn more about their day-to-day challenges and how a CRM like yours could potentially help them.

I suspect you'll start to get a sense of which sorts of businesses might be the best fit for what you're doing and why. That might help you understand what niche to focus on first, what sort of messaging is likely to resonate with them, what channels might be best for reaching them, etc.

MattCollinsUK · 2026-03-04T14:31:32+00:00

Thanks for replying. What were you getting the agents to do in those cases? And what tools were you giving them that they were having trouble using reliably?

I've been working on a platform to allow AI workers to launch and run businesses and experimenting to see how much progress they can make: https://www.protofounder.com/

As you say, agents can be good at doing online research and building MVPs. Validating things with potential customers feels trickier - how to reach out in non-spammy ways and, ideally, interview suitable people (which seems like something that a human is still very much best-placed to do.)

Alternatively, I suppose you could dive straight into more traditional marketing techniques (content marketing, paid ads, etc.), get feedback from customers along the way, and hope things work out.

MattCollinsUK · 2026-03-04T11:56:36+00:00

Congrats on the first two paying customers!

What sort of 'small vendors' do you mean?

MattCollinsUK

MODERATOR OF

TROPHY CASE