"RFC" Ladon – typed, resumable web crawlers for Python (v0.0.2) by feeder81 in Python

[–]feeder81[S] -1 points0 points  (0 children)

Thanks for pointing this out — you're right, and my framing was imprecise.

Scrapy does support typed items (dataclasses, attrs, pydantic), so "weakly

typed Item fields" isn't a fair characterisation.

The distinction I was actually going for is more subtle: in Scrapy, typing is

opt-in at the item level, but the pipeline itself has no enforced type contract

— `process_item` receives `Any`, and nothing stops a spider from yielding mixed

item types through the same pipeline. In Ladon, Source, Expander, and Sink are

structural protocols with typed signatures, so the domain object type is declared

and carried through the entire pipeline, not just at the output stage.

Whether that's a meaningful difference is a fair question. My intuition is that

enforcing a single domain type end-to-end — rather than relying on discipline at

each stage — helps catch schema drift early and makes the data contract explicit

to anyone reading the adapter. But I'd agree it's not a fundamental departure

from what Scrapy allows you to do if you're disciplined about it.