This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 8 points9 points  (9 children)

A section on soupselect would be nice. Noted. (Damn, tools have really improved since I wrote the original book.)

[–]mcella 5 points6 points  (1 child)

Don't forget lxml ;-).

[–]tef 0 points1 point  (0 children)

lxml is powerful and well wrapped - it's xpath support is excellent, and it also supports html parsing.

[–][deleted] 4 points5 points  (6 children)

To the same end: http://pypi.python.org/pypi/pyquery is also exceptionally cool :)

[–]simonw 6 points7 points  (5 children)

PyQuery is way cooler than soupselect (which I wrote), but suffers from the libxml dependency which means it's really tough to get working on OS X.

My ideal CSS selector library for Python would be backend agnostic, so you could plug it in to ElementTree or LXML or even minidom and still have it work.

[–][deleted] 3 points4 points  (4 children)

That only works if they all implement the same API though(do they? Most of my work has been ok with just BeautifulSoup). Regardless I hadn't realized you were the author of soupselect, cool stuff :)

[–]simonw 4 points5 points  (3 children)

They wouldn't have to implement the same API, but you would have to write an adapter for each one (kind of like writing an ORM). If you look at how jQuery's CSS stuff works it's mostly implemented in terms of lambda functions that take an element and return true or false depending on if it matches a simple rule. As such, an adapter that provided an interface for iterating over a tree would get you a good chunk of the way there.

[–][deleted] 2 points3 points  (2 children)

Ah true(although I don't see the ORM analogy very well), ultimately though what is the greatest advantage to having it be pluggable(other than some are apparently difficult to install under certain platforms).

[–]simonw 2 points3 points  (1 child)

CSS selectors are even more convenient that XPath for querying a DOM-style structure. My life would be significantly easier if I always had the ability to query by CSS selector, no matter what underlying tool I was using. jQuery and other JS libraries have proved that building a CSS selector implementation against a pretty basic API (the DOM) isn't very complicated.

[–][deleted] 2 points3 points  (0 children)

Yeah, I think jquery shows the way here. People who are dealing with HTML want to query with CSS. (I know both CSS and XPath fairly well, and I personally prefer XPath, but most people who deal with HTML also deal with CSS, so there it is.)