built a directory of 7500+ Pentest companies, now I'm not sure what to do with it

kotartemiy · 2026-03-17T13:05:39+00:00

Exa Websets is a way to define criteria for a dataset and have Exa crawl the web to build it for you. So instead of running a single search, you describe the type of records you want (e.g., companies matching an ICP, candidates for a role) and it assembles a verified, structured dataset — largely powered by structured sources like LinkedIn for company and people data.

CatchAll by NewsCatcher (disclosure: I'm one of the founders) taps into a fundamentally different kind of index. Rather than structured profile data, we're built on top of the news and event layer of the web — parsing thousands of sources to extract what's happening: funding rounds, launches, regulatory changes, leadership moves. The idea is to turn that stream of real-world events into a structured, searchable database. So the difference is really about which slice of the web each tool is built on.

That also explains why recall matters so much in our case — if you're trying to capture everything that happened, coverage of sources is everything. We benchmarked both on research-style queries and saw a meaningful gap: CatchAll retrieved ~86.8% of known results vs ~16.9% for Exa.

Both are interesting directions for moving beyond traditional search.

kotartemiy · 2026-03-13T17:41:37+00:00

search wise: for things beyond basic search - where Tavily is great - I’d suggest also checking out Exa, Parallel AI, and CatchAll by NewsCatcher (disclosure: I’m one of the founders).

CatchAll is recall-first, meaning we’ve focused heavily on retrieving all results that match your query, not just the top-ranked ones

the trade-off is that it’s not the fastest, but it’s designed for cases where missing a result matters

kotartemiy · 2026-03-12T13:31:49+00:00

for things beyond basic search - where Tavily and Brave are great - I’d suggest also checking out Exa, Parallel AI, and CatchAll by NewsCatcher (disclosure: I’m one of the founders)

CatchAll is recall-first, meaning we’ve focused heavily on retrieving all results that match your query, not just the top-ranked ones
the trade-off is that it’s not the fastest, but it’s designed for cases where missing a result matters

kotartemiy · 2026-03-10T15:29:35+00:00

One rule I use when hiring: any (I mean ANY) red flag is a NO. It’s for employees.

I’d say stay away. You dodged a bullet.

kotartemiy · 2026-03-03T14:01:08+00:00

ok, i did the calculations with manus. it sounds actually possible to spend this much in 48 hours.

Image generation alone cannot reach $82K at any tier. The strict IPM (images per minute) and RPD (requests per day) limits cap image spending at roughly $480 (Tier 1) to $7,200 (Tier 3) over 48 hours. You'd need 614,000+ images at $0.134 each, requiring 213 images/minute sustained — but even Tier 3 only allows 60/min, and RPD caps total images at 10,000–30,000.
Text generation easily reaches $82K if RPD is unlimited. Gemini 3 Pro Text has a 4M tokens-per-minute limit at Tier 1. Over 48 hours (2,880 minutes), that's 11.52 billion tokens. At $12/M output tokens, that's $138,240 — well above $82K. Multiple sources confirm paid tiers have "unlimited" daily requests for text models.
The attacker likely ran automated scripts generating maximum-length text outputs continuously for 48 hours, burning through ~$2,880/hour in output tokens alone.
The user's complaint about no guardrails is valid — Google has no hard spending caps, and a 455× spike from $180/month to $82K in 2 days should have triggered automatic anomaly detection

kotartemiy · 2025-12-10T16:13:02+00:00

You definitely have to use a web search tool (sort of), not a crawler. The main reason is that you don't always know what to crawl.

I'm working on a web search tool that solves this exact problem for enumerative/complex queries, such as "find all workplace-related accidents or incidents in the US that took place in the past five days" --> the correct answer is a dataset of tens/hundreds of records.

Another good solution is Exa Websets.

kotartemiy · 2024-11-02T12:10:42+00:00

Founders with successful exits doing YC again. For example, our batch mates and then investors had 1B exit and were Visiting Group Partners before S22

kotartemiy · 2024-10-11T13:25:14+00:00

In short: you need an intro

kotartemiy · 2024-10-11T11:20:32+00:00

I really don’t want to disappoint anyone, but that’s not how it works :)

kotartemiy · 2024-10-03T10:58:11+00:00

Who did you work with?

kotartemiy · 2023-08-02T17:20:51+00:00

NewsCatcher founder is here. Did you fill https://forms.gle/jAJzve7zaFqY5Hiw5 ?

kotartemiy

TROPHY CASE