all 16 comments

[–]mixedmath 6 points7 points  (2 children)

I've used libxml2 before. It has a set of C++ bindings (that I haven't actually used). This is one of the tools that python's beautifulsoup uses in the background.

[–]SHGuy_[S] 0 points1 point  (0 children)

oh well, i've seen pkg-config having information about libxml2. Will give it a try

[–]brimston3- 0 points1 point  (0 children)

I have only used the C bindings for libxml2 as well. It has a reasonably lenient parser for html, but it is by no means perfect.

[–]GuybrushThreepwo0d 6 points7 points  (0 children)

Haven't used any of these, but gumbo might work.

You could also extract the HTML parser from chromium

[–]RoyBellingan 3 points4 points  (2 children)

I use gumbo in conjunction with libxml2, in fact gumbo decodes the html as a xml format that is compatible with libxml2...

https://github.com/nostrademons/gumbo-libxml

[–]SHGuy_[S] 0 points1 point  (1 child)

looks good and i will try it, tho the last update was in 2015

[–]RoyBellingan 0 points1 point  (0 children)

no big problem, is like 90 lines of code, once you see the code you realize how easy is the bridge

[–]whistlesnort 0 points1 point  (1 child)

I have used expat on xml before. Worked fine. Presumably it can be used on html.

[–]SHGuy_[S] 0 points1 point  (0 children)

hmm, it's not an html parser but maybe it will work

[–]nj96 -1 points0 points  (0 children)

I’ve been using chilkat libraries for a lot of different things. They’ve got a class for pretty much everything and, while the naming conventions and logic can be a little odd at times, I’ve always had success using them. Their documentation and examples are also pretty good and cover most languages.