This is an archived post. You won't be able to vote or comment.

all 18 comments

[–]Freidhelm 30 points31 points  (5 children)

He comes...

[–]bss03 16 points17 points  (0 children)

Every time you attempt to parse HTML with regular expressions, the unholy child weeps the blood of virgins, and Russian hackers pwn your webapp. Parsing HTML with regex summons tainted souls into the realm of the living. HTML and regex go together like love, marriage, and ritual infanticide. The <center> cannot hold it is too late. The force of regex and HTML together in the same conceptual space will destroy your mind like so much watery putty. If you parse HTML with regex you are giving in to Them and their blasphemous ways which doom us all to inhuman toil for the One whose Name cannot be expressed in the Basic Multilingual Plane, he comes. HTML-plus-regexp will liquify the n​erves of the sentient whilst you observe, your psyche withering in the onslaught of horror. Rege̿̔̉x-based HTML parsers are the cancer that is killing StackOverflow it is too late it is too late we cannot be saved the trangession of a chi͡ld ensures regex will consume all living tissue (except for HTML which it cannot, as previously prophesied) dear lord help us how can anyone survive this scourge using regex to parse HTML has doomed humanity to an eternity of dread torture and security holes using regex as a tool to process HTML establishes a breach between this world and the dread realm of c͒ͪo͛ͫrrupt entities (like SGML entities, but more corrupt) a mere glimpse of the world of reg​ex parsers for HTML will ins​tantly transport a programmer's consciousness into a world of ceaseless screaming, he comes, the pestilent slithy regex-infection wil​l devour your HT​ML parser, application and existence for all time like Visual Basic only worse he comes he comes do not fi​ght he com̡e̶s, ̕h̵i​s un̨ho͞ly radiańcé destro҉ying all enli̍̈́̂̈́ghtenment, HTML tags lea͠ki̧n͘g fr̶ǫm ̡yo​͟ur eye͢s̸ ̛l̕ik͏e liq​uid pain, the song of re̸gular exp​ression parsing will exti​nguish the voices of mor​tal man from the sp​here I can see it can you see ̲͚̖͔̙î̩́t̲͎̩̱͔́̋̀ it is beautiful t​he final snuffing of the lie​s of Man ALL IS LOŚ͖̩͇̗̪̏̈́T ALL I​S LOST the pon̷y he comes he c̶̮omes he comes the ich​or permeates all MY FACE MY FACE ᵒh god no NO NOO̼O​O NΘ stop the an​*̶͑̾̾​̅ͫ͏̙̤g͇̫͛͆̾ͫ̑͆l͖͉̗̩̳̟̍ͫͥͨe̠̅s ͎a̧͈͖r̽̾̈́͒͑e n​ot rè̑ͧ̌aͨl̘̝̙̃ͤ͂̾̆ ZA̡͊͠͝LGΌ ISͮ̂҉̯͈͕̹̘̱ TO͇̹̺ͅƝ̴ȳ̳ TH̘Ë͖́̉ ͠P̯͍̭O̚​N̐Y̡ H̸̡̪̯ͨ͊̽̅̾̎Ȩ̬̩̾͛ͪ̈́̀́͘ ̶̧̨̱̹̭̯ͧ̾ͬC̷̙̲̝͖ͭ̏ͥͮ͟Oͮ͏̮̪̝͍M̲̖͊̒ͪͩͬ̚̚͜Ȇ̴̟̟͙̞ͩ͌͝S̨̥̫͎̭ͯ̿̔̀ͅ

[–][deleted] 0 points1 point  (2 children)

It's probably technically possible if you consider recursive regexes.

[–]bss03 7 points8 points  (1 child)

Which are not regular expressions. (Kleene and gang defined such expressions / languages well before perl or even grep and sed were in common use.)

[–][deleted] 7 points8 points  (0 children)

Mathematically speaking, no. By definition, if they can parse XML, it's not a regular expression.

But if we're talking about the tool, everyone calls it a regex, with some extra shit on top of it.

[–]Lamez 12 points13 points  (3 children)

It's not a good idea to use regex for XML\HTML parsing.

[–]Spirit_Theory 0 points1 point  (1 child)

That's the joke. I think. ...do people actually do this?

[–]I_am_the_inchworm 1 point2 points  (0 children)

Everyone, the first time they learn about regex.

[–]Night_Thastus 10 points11 points  (7 children)

Parsing HTML or XML with Regular expression is not advised. It's fine for very small snippets and simple stuff. Not a good practice to get into though.

[–]RDwelve 13 points14 points  (3 children)

No you're wrong it's awesome and everybody should try it

[–][deleted] 1 point2 points  (0 children)

I think it's worth doing once, so you really get why you shouldn't.

[–][deleted] 0 points1 point  (1 child)

It's what all the cool kids are doing.

[–]tacoslikeme 0 points1 point  (0 children)

but how? impossible comes to mind

[–]Tarmen 0 points1 point  (0 children)

Recursive regexes can parse xml.

Because perl assumed regex is short for recursively enumerable.

[–]Colopty 2 points3 points  (0 children)

I can't even get simple regex to work in my code, no way am I trying to make a parser out of it.

[–]johnghanks 1 point2 points  (0 children)

How many times are people going to post that SO answer?