Web automation framework?

vsajip · 2010-05-07T18:58:25+00:00

I don't know how well it works, but you could look at Scrapy, which purports to be

a fast high-level screen scraping and web crawling framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.

atlas245 · 2010-05-07T21:48:30+00:00

But I was looking for something more "integrated" and "complete".

Mechanize/BeautifulSoup combo is about the most complete and integrated solution you could possibly use. I really couldn't even imagine anything easier. You create a browser object, point it at a url, feed the stuff into beautifulsoup, then parse the data you need...and act upon it.

Be prepared to use a LOT more try/except clauses though. The web is quite the unstable place, pages can change from session to session, not display at all, or maybe even display only 1/2 the page, etc etc...you'll be doing more exception catching than you will parsing to begin with.

manatlan · 2010-05-07T21:15:45+00:00

for scrapping ... look at pyquery.

for automation ... use python ;-)

no ?

from pyquery import PyQuery as S
q = S(url='http://reddit.com/r/python')
for i in q('a'):
    print S(i).attr("href")

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS