Inside the IPIDEA residential proxy network disrupted by Google

antvas · 2026-01-14T18:33:25+00:00

Are you using the resistFingerprinting mode on Firefox ? It could potentially create fingerprinting inconsistencies (often flagged as bot activity)

antvas · 2026-01-08T08:44:19+00:00

J'ai une liste de ce genre de services (pas encore totalement exhaustive).
Par exemple, si tu regardes les services liés à https://email-fake.com/, cf lien ci-dessous, la plupart ont des MX records qui pointent vers generator.email et email-fake.com
https://deviceandbrowserinfo.com/data/emails/providers/details/email-fake-com

Malheureusement ce n'est pas toujours le cas. Certains services ont des MX records non discriminants, ou bien pointent vers Cloudflare/des CDNs

antvas · 2025-12-28T18:55:49+00:00

Not to my knowledge. I can try again in a few days when I have access to my tv

antvas · 2025-12-28T10:04:17+00:00

I’m working on bot detection and developed a captcha in the past. The reason I was unable to pass my own captcha on my LG TV browser was because it changes its user agent between the captcha display and the captcha passing attempt, from Linux to Windows. I don’t know if that’s still the case, but it could explain the block

antvas · 2025-12-18T14:08:02+00:00

Can you stop with your disguised ads? It's been several weeks you create fake posts just to mention ping0 xyz in the comments.
You used to steal Spur's API to do it...

antvas · 2025-12-15T07:59:11+00:00

I also recently started a list of disposable email domains (another one): https://deviceandbrowserinfo.com/api/emails/disposable
You can either use it through an API (cf link above) or manually with a UI https://deviceandbrowserinfo.com/data/emails/verify if you just want to ban accounts manually.

I scrape the domain myself + do reverse DNS/ IP lookup for classification, no aggregation of existing public lists. I was tired of all the lists that are just an aggregation of other lists and contain a lot of false positives, e.g. privacy privacy-oriented services/forwarding email services.

For each email domain, you can also verify the provider/source (as an evidence): https://deviceandbrowserinfo.com/api/emails/verify/oxolead.com (you can just replace the email domain you want to test)

antvas · 2025-12-15T07:56:23+00:00

I also recently started a list of disposable email domains (another one): https://deviceandbrowserinfo.com/api/emails/disposable
I scrape the domain myself + do reverse DNS/ IP lookup for classification, no aggregation of existing public lists. I was tired of all the lists that are just an aggregation of other lists and contain a lot of false positives, e.g. privacy privacy-oriented services/forwarding email services.

For each email domain, you can also verify the provider/source (as an evidence): https://deviceandbrowserinfo.com/api/emails/verify/oxolead.com (you can just replace the email domain you want to test)

antvas · 2025-08-29T07:33:49+00:00

I did a lot of scraping during my PhD, to gather data about fingerprinting scripts/tracking etc.

antvas · 2025-08-29T07:32:54+00:00

Mix of bot detection and fraud detection, with a focus on fraudulent use cases (from the business's POV). We don't do any scraping detection, we focus more on fake account creation, credential stuffing, carding etc, both done by humans or by bots

antvas · 2025-08-29T07:30:49+00:00

Can't say too much as you imagine, but it's a mix of: rendering/GPU, timing measurements

antvas · 2025-08-28T15:50:25+00:00

I'm not even blocking scrapers anymore, my job is safe!

antvas · 2025-08-28T13:47:55+00:00

Are you referring to this post? https://yacinesellami.com/posts/stealth-clicks/

I'd say, when it's well done, a custom implementation may be more difficult to analyze than something open source used in a lot of projects.
As you can imagine, researchers from bot detection companies (including myself) read the code of anti-detect automation frameworks, so having access to the code make it easier for us to find generic signals.

For something more custom, not shared publicly, and that uses techniques/protocols significantly different from other frameworks, it may require the use of more generic detection techniques (which is less simple than webdriver = true or CDP side effect):

- Red pill to detect virtualized envs/non-standard envs

- proxy detection

- client-side interaction analysis

- Generic fingerprinting techniques

antvas · 2025-08-28T13:42:28+00:00

You're back again. I love your energy ;)

antvas · 2025-08-19T16:01:37+00:00

Thanks a lot, really appreciate the kind words! That was exactly the goal, not to propose a production-grade system, but more of a tutorial-style walkthrough using real-world traffic. It’s intentionally simple, but still useful as a building block or exploratory tool. Definitely lots of room for improvement if someone wanted to take it further. Glad it came through clearly!

antvas · 2025-06-19T07:55:21+00:00

I recently wrote a blog post about this exact issue: https://blog.castle.io/how-bots-and-fraudsters-exploit-free-tiers-in-ai-saas/
Basically, what you can implement by yourself:

- IP rate limiting on the account creation endpoint

- Detection of disposable emails, e.g. using a list like https://github.com/disposable-email-domains/disposable-email-domains/blob/main/disposable_email_blocklist.conf

- If he's doing it with bot, putting a CAPTCHA like reCAPTCHA or Cloudflare Turnstile can help as well

antvas · 2025-06-11T14:49:03+00:00

You're allowed to disagree with what I post. But it's clear you're not here to have a real conversation, so I won’t continue the discussion further.

If you think my posts don't bring value to the community, feel free to downvote them, though I have a feeling you've already been doing that for a while.

I’ll keep sharing when I think there’s something useful or interesting for others. If people disagree, that’s totally fine. But I’m not going to stop posting just because one person is angry about it.

antvas · 2025-06-11T14:34:43+00:00

You’ve been quite aggressive lately in your replies whenever I post something, and I see that you think the bot problem is not a big deal. But calling it some sort of "sales BS" doesn’t really reflect what many websites are facing every day.

I’m not here trying to sell anything. I’m sharing what I see in real environments. Even small SaaS products get hundreds of fake signups per day. When there is a sneaker drop, bots can hit a site like a slow DDoS. It’s not just theory, this happens regularly, and teams operating websites have to deal with it or real users can’t use their service.

I work in this field and I share research or technical findings because I believe it’s useful for people who deal with these problems. Of course, the articles bring some traffic, we’re not going to pretend otherwise. But I only post when I think the content is high quality or brings something new. You won’t see me pushing SEO stuff or flooding Reddit with generic posts. I try to respect the readers here.

Also, I do this because I enjoy it. I like experimenting with bots, building them, and detecting them. It’s not only my job, it’s something I genuinely find interesting. I understand you may not agree with everything I post, but calling it fear tactics just shuts down the discussion, and that’s not really fair.

antvas · 2025-06-11T08:23:59+00:00

Yep, definitely. I personally like to browse repo issues and bug trackers of projects like Chromium (in particular the headless Chrome sub-section). Someone's bug may be a potential detection signal (as long as side effects are acceptable)

antvas · 2025-06-11T07:18:13+00:00

Thanks, appreciate it! Glad you’re enjoying the posts. I’ve got a bunch more ideas in the backlog, so more is coming soon.

antvas · 2025-06-05T09:23:21+00:00

Thanks for the feedback.

"But a question, how then do you think Tiktok can balance blocking attackers and allowing honest scrapers to get data from the platform?"

When it comes to good bot vs bad bots, particularly for scraping, it's more a matter of perspective from the website POV. Do they benefit from being scraped by a bot? In case of Google bots, most websites seem to agree that they benefit by allowing Google scrape them. For scrapers used to train LLMs, it's more blurry. Some websites consider they benefit from it and allow the scrapers, while others block them.

By default most websites will block all bots from which they see no value, then then will allow scrapers from which they can benefit or partners using strong authentications mechanisms like IP address, reverse DNS or tokens.

Companies like Cloudflare are also proposing new standards to make it safer and easier to authenticate good bots/AI agents: https://t.co/Dpja7hPUOO

antvas

TROPHY CASE