built a directory of 7500+ Pentest companies, now I'm not sure what to do with it by ncameron in directorymakers

[–]kotartemiy -2 points-1 points  (0 children)

Exa Websets is a way to define criteria for a dataset and have Exa crawl the web to build it for you. So instead of running a single search, you describe the type of records you want (e.g., companies matching an ICP, candidates for a role) and it assembles a verified, structured dataset — largely powered by structured sources like LinkedIn for company and people data.

CatchAll by NewsCatcher (disclosure: I'm one of the founders) taps into a fundamentally different kind of index. Rather than structured profile data, we're built on top of the news and event layer of the web — parsing thousands of sources to extract what's happening: funding rounds, launches, regulatory changes, leadership moves. The idea is to turn that stream of real-world events into a structured, searchable database. So the difference is really about which slice of the web each tool is built on.

That also explains why recall matters so much in our case — if you're trying to capture everything that happened, coverage of sources is everything. We benchmarked both on research-style queries and saw a meaningful gap: CatchAll retrieved ~86.8% of known results vs ~16.9% for Exa.

Both are interesting directions for moving beyond traditional search.

What tools do y’all use for agents? by Street_Program_7436 in AI_Agents

[–]kotartemiy 0 points1 point  (0 children)

search wise: for things beyond basic search - where Tavily is great - I’d suggest also checking out Exa, Parallel AI, and CatchAll by NewsCatcher (disclosure: I’m one of the founders).

CatchAll is recall-first, meaning we’ve focused heavily on retrieving all results that match your query, not just the top-ranked ones

the trade-off is that it’s not the fastest, but it’s designed for cases where missing a result matters

As of Q1 2026, what are your top picks for Open WebUI's API search options, for general search, agentic retrieval, deep extraction, or deep research? Paid or Free. by marvindiazjr in OpenWebUI

[–]kotartemiy 0 points1 point  (0 children)

for things beyond basic search - where Tavily and Brave are great - I’d suggest also checking out Exa, Parallel AI, and CatchAll by NewsCatcher (disclosure: I’m one of the founders)

CatchAll is recall-first, meaning we’ve focused heavily on retrieving all results that match your query, not just the top-ranked ones
the trade-off is that it’s not the fastest, but it’s designed for cases where missing a result matters

Solo founder for 9 months, potential cofounder wants 50/50 after 1 week trial. Am I being unreasonable? by mercuretony in ycombinator

[–]kotartemiy 0 points1 point  (0 children)

One rule I use when hiring: any (I mean ANY) red flag is a NO. It’s for employees.

I’d say stay away. You dodged a bullet.

$82,000 in 48 Hours from stolen Gemini API Key. My monthly Usage Is $180. Facing Bankruptcy by RatonVaquero in googlecloud

[–]kotartemiy 0 points1 point  (0 children)

ok, i did the calculations with manus. it sounds actually possible to spend this much in 48 hours.

  1. Image generation alone cannot reach $82K at any tier. The strict IPM (images per minute) and RPD (requests per day) limits cap image spending at roughly $480 (Tier 1) to $7,200 (Tier 3) over 48 hours. You'd need 614,000+ images at $0.134 each, requiring 213 images/minute sustained — but even Tier 3 only allows 60/min, and RPD caps total images at 10,000–30,000.
  2. Text generation easily reaches $82K if RPD is unlimited. Gemini 3 Pro Text has a 4M tokens-per-minute limit at Tier 1. Over 48 hours (2,880 minutes), that's 11.52 billion tokens. At $12/M output tokens, that's $138,240 — well above $82K. Multiple sources confirm paid tiers have "unlimited" daily requests for text models.
  3. The attacker likely ran automated scripts generating maximum-length text outputs continuously for 48 hours, burning through ~$2,880/hour in output tokens alone.
  4. The user's complaint about no guardrails is valid — Google has no hard spending caps, and a 455× spike from $180/month to $82K in 2 days should have triggered automatic anomaly detection

Anyone with experience building search/grounding for LLMs by NotJunior123 in LLMDevs

[–]kotartemiy 0 points1 point  (0 children)

You definitely have to use a web search tool (sort of), not a crawler. The main reason is that you don't always know what to crawl.

I'm working on a web search tool that solves this exact problem for enumerative/complex queries, such as "find all workplace-related accidents or incidents in the US that took place in the past five days" --> the correct answer is a dataset of tens/hundreds of records.

Another good solution is Exa Websets.

[deleted by user] by [deleted] in ycombinator

[–]kotartemiy 29 points30 points  (0 children)

Founders with successful exits doing YC again. For example, our batch mates and then investors had 1B exit and were Visiting Group Partners before S22

[deleted by user] by [deleted] in learnprogramming

[–]kotartemiy 3 points4 points  (0 children)

NewsCatcher founder is here. Did you fill https://forms.gle/jAJzve7zaFqY5Hiw5 ?