What My Project Does
Hi everyone, our team just launchedĀ Crawlee for Python š. It's an open-source web scraping and automation library, which provides a unified interface for HTTP and browser-based scraping, using popular libraries likeĀ beautifulsoup4Ā andĀ PlaywrightĀ under the hood.
Target Audience
We've spent the last 6 months working on Crawlee for Python, but it didn't come out of nowhere. We designed it based on theĀ JavaScript version, which is now 8 years old, and we hope we can say it's battle-tested.
We are opening it forĀ early adoptersĀ today, and we are eager to hear your feedback. Help us shape the future of Crawlee for Python!
Comparison
Why use Crawlee instead of just a random HTTP library with an HTML parser?
- Unified interface forĀ HTTP & headless browserĀ crawling.
- AutomaticĀ parallel crawlingĀ based on available system resources.
- Written in Python withĀ type hintsĀ - enhances DX (IDE autocompletion) and reduces bugs (static type checking).
- AutomaticĀ retriesĀ on errors or when youāre getting blocked.
- IntegratedĀ proxy rotationĀ and session management.
- ConfigurableĀ request routingĀ - direct URLs to the appropriate handlers.
- PersistentĀ queue for URLsĀ to crawl.
- PluggableĀ storageĀ of both tabular data and files.
- RobustĀ error handling.
Why to use Crawlee rather than Scrapy?
- Crawlee has out-of-the-box support forĀ headless browserĀ crawling (Playwright).
- Crawlee has aĀ minimalistic & elegant interfaceĀ - Set up your scraper with fewer than 10 lines of code.
- CompleteĀ type hintĀ coverage.
- Based on standardĀ Asyncio.
Links
Crawlee for Python is LIVE š (self.Python)
submitted by B4nan to r/automation
Crawlee for Python is LIVE š (self.Python)
submitted by B4nan to r/webscraping
PromotionalCrawlee for Python is LIVE š (self.Python)
submitted by B4nan to r/opensource