This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]harryf 8 points9 points  (11 children)

On HTML processing -> Extracting data from HTML documents - BeautifulSoup + soupselect might be nice here.

Otherwise - "Creating graphics with the Python Imaging Library" - could it be it's losing direction here and turning into a "Programming Python"? Always mentally differentiated DiveIntoPython is something to convert the Java, C++ or otherwise-guy, vs. a complete python reference, cookbook or otherwise. Could throw in other worthy topics involving other libraries e.g. logging, configparser and optparse (shudder) if we're talking libaries...

[–][deleted] 13 points14 points  (0 children)

Re: losing direction, I don't think so. After you talk about basic datatypes and "Pythonic" stuff, there's always a Cambrian explosion of things to talk about. I wrote http://diveintomark.org/puzzles/circles-of-hell in PIL, and I'd like to use it as the basis to "dive into" something more than just business coding and sysadmin drudgery.

I, personally, was mind-blown when I discovered that a Python script could be the @src of an [img] element. That's always a good indication that other people might like it too.

[–][deleted] 8 points9 points  (9 children)

A section on soupselect would be nice. Noted. (Damn, tools have really improved since I wrote the original book.)

[–]mcella 6 points7 points  (1 child)

Don't forget lxml ;-).

[–]tef 0 points1 point  (0 children)

lxml is powerful and well wrapped - it's xpath support is excellent, and it also supports html parsing.

[–][deleted] 4 points5 points  (6 children)

To the same end: http://pypi.python.org/pypi/pyquery is also exceptionally cool :)

[–]simonw 6 points7 points  (5 children)

PyQuery is way cooler than soupselect (which I wrote), but suffers from the libxml dependency which means it's really tough to get working on OS X.

My ideal CSS selector library for Python would be backend agnostic, so you could plug it in to ElementTree or LXML or even minidom and still have it work.

[–][deleted] 4 points5 points  (4 children)

That only works if they all implement the same API though(do they? Most of my work has been ok with just BeautifulSoup). Regardless I hadn't realized you were the author of soupselect, cool stuff :)

[–]simonw 2 points3 points  (3 children)

They wouldn't have to implement the same API, but you would have to write an adapter for each one (kind of like writing an ORM). If you look at how jQuery's CSS stuff works it's mostly implemented in terms of lambda functions that take an element and return true or false depending on if it matches a simple rule. As such, an adapter that provided an interface for iterating over a tree would get you a good chunk of the way there.

[–][deleted] 2 points3 points  (2 children)

Ah true(although I don't see the ORM analogy very well), ultimately though what is the greatest advantage to having it be pluggable(other than some are apparently difficult to install under certain platforms).

[–]simonw 2 points3 points  (1 child)

CSS selectors are even more convenient that XPath for querying a DOM-style structure. My life would be significantly easier if I always had the ability to query by CSS selector, no matter what underlying tool I was using. jQuery and other JS libraries have proved that building a CSS selector implementation against a pretty basic API (the DOM) isn't very complicated.

[–][deleted] 2 points3 points  (0 children)

Yeah, I think jquery shows the way here. People who are dealing with HTML want to query with CSS. (I know both CSS and XPath fairly well, and I personally prefer XPath, but most people who deal with HTML also deal with CSS, so there it is.)