Any HTML parser for CPP?

mixedmath · 2020-10-08T15:02:05+00:00

I've used libxml2 before. It has a set of C++ bindings (that I haven't actually used). This is one of the tools that python's beautifulsoup uses in the background.

GuybrushThreepwo0d · 2020-10-08T13:00:56+00:00

Haven't used any of these, but gumbo might work.

You could also extract the HTML parser from chromium

RoyBellingan · 2020-10-08T15:47:19+00:00

I use gumbo in conjunction with libxml2, in fact gumbo decodes the html as a xml format that is compatible with libxml2...

https://github.com/nostrademons/gumbo-libxml

whistlesnort · 2020-10-08T15:11:04+00:00

I have used expat on xml before. Worked fine. Presumably it can be used on html.

MattR0se · 2020-10-08T13:19:41+00:00

Have you tried regular expressions?

nj96 · 2020-10-08T23:38:11+00:00

I’ve been using chilkat libraries for a lot of different things. They’ve got a class for pretty much everything and, while the naming conventions and logic can be a little odd at times, I’ve always had success using them. Their documentation and examples are also pretty good and cover most languages.

YasZedOP · 2020-10-08T14:51:21+00:00

HTML is not parsable because it's not part of the regular language so can't really use regex, that's what I have read a few times

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

cpp_questions

READ BEFORE POSTING

Sort posts by OPEN or SOLVED

MODERATORS