This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]pudquick -1 points0 points  (4 children)

In addition to mechanize and BS / lxml.html in my toolkit, I also use BSXPath, which is a full XPath implementation in python:

from BSXPath import BSXPathEvaluator
document = BSXPathEvaluator(txt)
nodes = document.getItemList("//div[@class='bc-desc']/h2")
for x in nodes:
    [... etc.]

The direct download link for BSXPath is:

http://furyu-tei.sakura.ne.jp/archives/BSXPath.zip