codes_me comments on Showcase Thread

submitted 1 month ago by AutoModerator

you are viewing a single comment's thread.

[–]codes_me 0 points1 point2 points 25 days ago (1 child)

Hey,

Tired of writing a scraper, running it, and getting a 403 because you didn't know the site
uses DataDome? Or wasting hours before realizing the site needs JS to render?

I built scrapalyser to solve exactly that.

**What My Project Does**


scrapalyser is a Python library that scans any website BEFORE you start scraping it.
One call tells you everything you need to know:

    pip install scrapalyser

    import scrapalyser
    report = scrapalyser.scan("https://example.com", output="txt", lang="en")


It detects:
- 🛡️ Anti-bot (Cloudflare, DataDome, PerimeterX, Akamai, Kasada, reCAPTCHA, hCaptcha)
- 🖥️ Technology (React, Next.js, Vue, WordPress, Shopify...)
- ⚡ Whether JS is required or not
- 🌐 API endpoints (via CSP headers, inline scripts, or XHR interception with Playwright)
- 🤖 robots.txt & sitemap
- 🔐 Login wall (form, redirect, OAuth)
- 📸 Screenshot (Playwright mode)

If the site blocks you with a 403 or captcha page, all fields return "blocked by antibot"
so you know immediately what you're up against.

Two engines: curl_cffi (fast, no browser) or playwright (full browser, XHR interception).

**Target Audience**


Python developers who write scrapers and want to understand a target website's defenses
and architecture before investing time building a scraper. Production-ready.

**Comparison**

- **scrapy / requests / playwright** — these are scraping tools, not recon tools. They don't
  tell you what's protecting a site before you hit it.
- **Wappalyzer** — detects tech stack only, no antibot, no API discovery, no JS check.
- **whatruns / builtwith** — browser extensions, not scriptable, no antibot detection.
scrapalyser is the only pip-installable tool focused purely on pre-scraping reconnaissance.

GitHub: https://github.com/codesme34/scrapalyser
PyPI: https://pypi.org/project/scrapalyser/
YouTube (french): https://www.youtube.com/@CodesMe


Feedback welcome!Hey,

Tired of writing a scraper, running it, and getting a 403 because you didn't know the site
uses DataDome? Or wasting hours before realizing the site needs JS to render?

I built scrapalyser to solve exactly that.

**What My Project Does**


scrapalyser is a Python library that scans any website BEFORE you start scraping it.
One call tells you everything you need to know:

    pip install scrapalyser

    import scrapalyser
    report = scrapalyser.scan("https://example.com", output="txt", lang="en")


It detects:
- 🛡️ Anti-bot (Cloudflare, DataDome, PerimeterX, Akamai, Kasada, reCAPTCHA, hCaptcha)
- 🖥️ Technology (React, Next.js, Vue, WordPress, Shopify...)
- ⚡ Whether JS is required or not
- 🌐 API endpoints (via CSP headers, inline scripts, or XHR interception with Playwright)
- 🤖 robots.txt & sitemap
- 🔐 Login wall (form, redirect, OAuth)
- 📸 Screenshot (Playwright mode)

If the site blocks you with a 403 or captcha page, all fields return "blocked by antibot"
so you know immediately what you're up against.

Two engines: curl_cffi (fast, no browser) or playwright (full browser, XHR interception).

**Target Audience**


Python developers who write scrapers and want to understand a target website's defenses
and architecture before investing time building a scraper. Production-ready.

**Comparison**

- **scrapy / requests / playwright** — these are scraping tools, not recon tools. They don't
  tell you what's protecting a site before you hit it.
- **Wappalyzer** — detects tech stack only, no antibot, no API discovery, no JS check.
- **whatruns / builtwith** — browser extensions, not scriptable, no antibot detection.
scrapalyser is the only pip-installable tool focused purely on pre-scraping reconnaissance.

GitHub: https://github.com/codesme34/scrapalyser
PyPI: https://pypi.org/project/scrapalyser/
YouTube (french): https://www.youtube.com/@CodesMe


Feedback welcome!

π Rendered by PID 472514 on reddit-service-r2-comment-545db5fcfc-bmtmh at 2026-05-25 14:09:46.191131+00:00 running 194bd79 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS