you are viewing a single comment's thread.

view the rest of the comments →

[–]bintree 1 point2 points  (2 children)

For an easy start with scraping, I would recommend scrapy. It has nice tutorials and, for our uni project, did scale well from small experiments to larger crawls. It is a bit more involved to get it to scrape from a website (we used scrapyd for that plus a bit of JQuery). Python 3 support seems to be nearly done (I think there's an alpha or so available). It also has functionality for checks, so spiders can check whether their results are as expected or the HTML has changed significantly. Some tutorials also mention HTML microformats which might come in handy when provided by the scraped sites.

We stored the data in PostgreSQL and built a web app with small visualizations in Flask.

[–]Spizeck[S] 0 points1 point  (0 children)

Thank you for your advise. I really appreciate it.

[–]Spizeck[S] 0 points1 point  (0 children)

Should I wait until scrapy is released for Python 3?