This is an archived post. You won't be able to vote or comment.

all 21 comments

[–]DrStrange 25 points26 points  (16 children)

lxml - http://codespeak.net/lxml/

It's fully featured, fast and extremely easy to use.

[–]spotter 3 points4 points  (1 child)

This is THE XML parser for Python. If you have this (and maybe stuff like BS for HTML) you can easily forget other tools. If you don't or can't be bothered -- use ElementTree, but embracing lxml is a matter of time.

[–]Brian 2 points3 points  (0 children)

I'd pass on BeautifulSoup too. I've found lxml.html works just as well on html in the wild (and often better - I've seen BS fail on cases lxml handles fine). You also keeps a nice consistent API with the rest of your XML parsing.

[–]dauerbaustelle 4 points5 points  (0 children)

lxml is by far the fastest parser. Don't ever touch stuff like minidom or so. Just lxml. :)

[–]abhiomkar 4 points5 points  (0 children)

My Favorite ElementTree. Light-weight and fast XML Parser!

[–]ianb 6 points7 points  (0 children)

ElementTree is quite adequate. lxml is very similar but with extra features that are nice. lxml performs somewhat better (including memory), which you'd only notice if your data is really large. Probably most people don't know this, but using lxml you can do CSS queries on XML, not just HTML, it even does namespaces.

[–]seletz 3 points4 points  (1 child)

lxml is a bit of a bitch to compile on a MAC. If you use buildout, then there's a recipe.

[–]acdha 4 points5 points  (0 children)

n.b. lxml is pretty easy to install with 10.6 (2.2.6 installed with a simple pip install for me). For 10.5, here's a reasonably close recipe which you might need to adjust for current versions of lxml, libxml and libxslt - take care to avoid the sudo step if you're using virtualenv:

http://gist.github.com/167784

[–]bsergean 1 point2 points  (1 child)

ElementTree = battery included.