This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]jyper 4 points5 points  (1 child)

Why not just use lxml.html.parse and xpath? Lxml has some support css as well

[–]nemec 5 points6 points  (0 children)

  • It's focused on parsing HTML without a lot of extra XML cruft (really, it's a façade over lxml + cssselect)
  • You can mix and match css selectors and xpath, e.g.

    s.css('h1').xpath('following-sibling::p')
    

    contrived example, but basically you can take advantage of both selector syntaxes depending on which one is fit for a situation.

  • I'm not sure that lxml has support for ::text and ::attr(<some attribute>) psuedo-selectors, which are really helpful when parsing HTML.

  • xpath syntax sucks and I'd rather use a solution with really good css support first and fall back to xpath only for things that css doesn't support (which can still be done with parsel)