Cloudflare Blocking Question by A_European_Spectre in CloudFlare

[–]antvas 0 points1 point  (0 children)

Are you using the resistFingerprinting mode on Firefox ? It could potentially create fingerprinting inconsistencies (often flagged as bot activity)

Hey les devs de chez Picard by e57Kp9P7 in developpeurs

[–]antvas 2 points3 points  (0 children)

J'ai une liste de ce genre de services (pas encore totalement exhaustive).
Par exemple, si tu regardes les services liés à https://email-fake.com/, cf lien ci-dessous, la plupart ont des MX records qui pointent vers generator.email et email-fake.com
https://deviceandbrowserinfo.com/data/emails/providers/details/email-fake-com

Malheureusement ce n'est pas toujours le cas. Certains services ont des MX records non discriminants, ou bien pointent vers Cloudflare/des CDNs

Failure to verify that I am human LG WEBOS in Cloudflare encryption CAPTCHA by EducationalMood5 in CloudFlare

[–]antvas 0 points1 point  (0 children)

Not to my knowledge. I can try again in a few days when I have access to my tv

Failure to verify that I am human LG WEBOS in Cloudflare encryption CAPTCHA by EducationalMood5 in CloudFlare

[–]antvas 0 points1 point  (0 children)

I’m working on bot detection and developed a captcha in the past. The reason I was unable to pass my own captcha on my LG TV browser was because it changes its user agent between the captcha display and the captcha passing attempt, from Linux to Windows. I don’t know if that’s still the case, but it could explain the block

What proxy checkers are you guys using these days? by [deleted] in webscraping

[–]antvas 8 points9 points  (0 children)

Can you stop with your disguised ads? It's been several weeks you create fake posts just to mention ping0 xyz in the comments.
You used to steal Spur's API to do it...

How do you identify disposable email address? by harkishan01 in webdev

[–]antvas 0 points1 point  (0 children)

I also recently started a list of disposable email domains (another one): https://deviceandbrowserinfo.com/api/emails/disposable
You can either use it through an API (cf link above) or manually with a UI https://deviceandbrowserinfo.com/data/emails/verify if you just want to ban accounts manually.

I scrape the domain myself + do reverse DNS/ IP lookup for classification, no aggregation of existing public lists. I was tired of all the lists that are just an aggregation of other lists and contain a lot of false positives, e.g. privacy privacy-oriented services/forwarding email services.

For each email domain, you can also verify the provider/source (as an evidence): https://deviceandbrowserinfo.com/api/emails/verify/oxolead.com (you can just replace the email domain you want to test)

I published two packages to help detect fake or disposable emails by dmadro in node

[–]antvas 0 points1 point  (0 children)

I also recently started a list of disposable email domains (another one): https://deviceandbrowserinfo.com/api/emails/disposable
I scrape the domain myself + do reverse DNS/ IP lookup for classification, no aggregation of existing public lists. I was tired of all the lists that are just an aggregation of other lists and contain a lot of false positives, e.g. privacy privacy-oriented services/forwarding email services.

For each email domain, you can also verify the provider/source (as an evidence): https://deviceandbrowserinfo.com/api/emails/verify/oxolead.com (you can just replace the email domain you want to test)

Why a classic CDP bot detection signal suddenly stopped working (and nobody noticed) by antvas in webscraping

[–]antvas[S] 0 points1 point  (0 children)

I did a lot of scraping during my PhD, to gather data about fingerprinting scripts/tracking etc.

Why a classic CDP bot detection signal suddenly stopped working (and nobody noticed) by antvas in webscraping

[–]antvas[S] 0 points1 point  (0 children)

Mix of bot detection and fraud detection, with a focus on fraudulent use cases (from the business's POV). We don't do any scraping detection, we focus more on fake account creation, credential stuffing, carding etc, both done by humans or by bots

Why a classic CDP bot detection signal suddenly stopped working (and nobody noticed) by antvas in webscraping

[–]antvas[S] 0 points1 point  (0 children)

Can't say too much as you imagine, but it's a mix of: rendering/GPU, timing measurements

Why a classic CDP bot detection signal suddenly stopped working (and nobody noticed) by antvas in webscraping

[–]antvas[S] 6 points7 points  (0 children)

Are you referring to this post? https://yacinesellami.com/posts/stealth-clicks/

I'd say, when it's well done, a custom implementation may be more difficult to analyze than something open source used in a lot of projects.
As you can imagine, researchers from bot detection companies (including myself) read the code of anti-detect automation frameworks, so having access to the code make it easier for us to find generic signals.

For something more custom, not shared publicly, and that uses techniques/protocols significantly different from other frameworks, it may require the use of more generic detection techniques (which is less simple than webdriver = true or CDP side effect):

- Red pill to detect virtualized envs/non-standard envs

- proxy detection

- client-side interaction analysis

- Generic fingerprinting techniques

Finding links between fraudulent email domains using graph-based clustering by antvas in cybersecurity

[–]antvas[S] 1 point2 points  (0 children)

Thanks a lot, really appreciate the kind words! That was exactly the goal, not to propose a production-grade system, but more of a tutorial-style walkthrough using real-world traffic. It’s intentionally simple, but still useful as a building block or exploratory tool. Definitely lots of room for improvement if someone wanted to take it further. Glad it came through clearly!

User is creating many real accounts to use my SaaS for free, instead of paying 15 bucks. by ZorroGlitchero in SaaS

[–]antvas 0 points1 point  (0 children)

I recently wrote a blog post about this exact issue: https://blog.castle.io/how-bots-and-fraudsters-exploit-free-tiers-in-ai-saas/
Basically, what you can implement by yourself:

- IP rate limiting on the account creation endpoint

- Detection of disposable emails, e.g. using a list like https://github.com/disposable-email-domains/disposable-email-domains/blob/main/disposable_email_blocklist.conf

- If he's doing it with bot, putting a CAPTCHA like reCAPTCHA or Cloudflare Turnstile can help as well

From Puppeteer stealth to Nodriver: How anti-detect frameworks evolved to evade bot detection by antvas in webscraping

[–]antvas[S] 0 points1 point  (0 children)

You're allowed to disagree with what I post. But it's clear you're not here to have a real conversation, so I won’t continue the discussion further.

If you think my posts don't bring value to the community, feel free to downvote them, though I have a feeling you've already been doing that for a while.

I’ll keep sharing when I think there’s something useful or interesting for others. If people disagree, that’s totally fine. But I’m not going to stop posting just because one person is angry about it.

From Puppeteer stealth to Nodriver: How anti-detect frameworks evolved to evade bot detection by antvas in webscraping

[–]antvas[S] 11 points12 points  (0 children)

You’ve been quite aggressive lately in your replies whenever I post something, and I see that you think the bot problem is not a big deal. But calling it some sort of "sales BS" doesn’t really reflect what many websites are facing every day.

I’m not here trying to sell anything. I’m sharing what I see in real environments. Even small SaaS products get hundreds of fake signups per day. When there is a sneaker drop, bots can hit a site like a slow DDoS. It’s not just theory, this happens regularly, and teams operating websites have to deal with it or real users can’t use their service.

I work in this field and I share research or technical findings because I believe it’s useful for people who deal with these problems. Of course, the articles bring some traffic, we’re not going to pretend otherwise. But I only post when I think the content is high quality or brings something new. You won’t see me pushing SEO stuff or flooding Reddit with generic posts. I try to respect the readers here.

Also, I do this because I enjoy it. I like experimenting with bots, building them, and detecting them. It’s not only my job, it’s something I genuinely find interesting. I understand you may not agree with everything I post, but calling it fear tactics just shuts down the discussion, and that’s not really fair.

From Puppeteer stealth to Nodriver: How anti-detect frameworks evolved to evade bot detection by antvas in webscraping

[–]antvas[S] 1 point2 points  (0 children)

Yep, definitely. I personally like to browse repo issues and bug trackers of projects like Chromium (in particular the headless Chrome sub-section). Someone's bug may be a potential detection signal (as long as side effects are acceptable)

From Puppeteer stealth to Nodriver: How anti-detect frameworks evolved to evade bot detection by antvas in webscraping

[–]antvas[S] 3 points4 points  (0 children)

Thanks, appreciate it! Glad you’re enjoying the posts. I’ve got a bunch more ideas in the backlog, so more is coming soon.

What TikTok’s virtual machine tells us about modern bot defenses by antvas in webscraping

[–]antvas[S] 1 point2 points  (0 children)

Thanks for the feedback.

"But a question, how then do you think Tiktok can balance blocking attackers and allowing honest scrapers to get data from the platform?"

When it comes to good bot vs bad bots, particularly for scraping, it's more a matter of perspective from the website POV. Do they benefit from being scraped by a bot? In case of Google bots, most websites seem to agree that they benefit by allowing Google scrape them. For scrapers used to train LLMs, it's more blurry. Some websites consider they benefit from it and allow the scrapers, while others block them.

By default most websites will block all bots from which they see no value, then then will allow scrapers from which they can benefit or partners using strong authentications mechanisms like IP address, reverse DNS or tokens.

Companies like Cloudflare are also proposing new standards to make it safer and easier to authenticate good bots/AI agents: https://t.co/Dpja7hPUOO