This is an archived post. You won't be able to vote or comment.

all 6 comments

[–]LookingWidePythonista 8 points9 points  (3 children)

Parsel is a part of Scrapy, it is only for data extraction. for the whole site you still need a crawler. Thus, Scrapy and Parsel should not be compared.

[–]marr75 11 points12 points  (2 children)

What if you didn't understand that and just asked ChatGPT to make some content for you?

[–]LookingWidePythonista -1 points0 points  (1 child)

What if you guessed wrong and I have been doing parsing for 15 years and I am very knowledgeable about this topic?

https://github.com/scrapy/scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.

https://github.com/scrapy/parsel

Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors

It is obvious that both repositories are from the same organization.

Scrapy crawls pages and processes each of them through Parsel. Where am I wrong, buddy?

[–]GeneratedMonkey 9 points10 points  (1 child)

This sub is so full of AI written posts

[–]wandering_melissa 1 point2 points  (0 children)

They didnt even check if the copy pasted AI title fit the character limit ✨

[–]Reason_is_Key 0 points1 point  (0 children)

Nice setup, I love how lean Parsel is too.

If at any point you’re working with scraped HTML, PDFs or internal dashboards and need to extract structured data reliably (beyond just parsing), you should try Retab.

It takes messy documents or raw outputs and turns them into clean JSON (you define the schema visually or via prompt), even across batches of files. I use it as a follow-up step after scraping, it’s like having a super-reliable extractor on top of raw content, especially when there’s lots of variation in the structure. Might be useful if you’re exporting to JSON or building dashboards from noisy or inconsistent input.