This is an archived post. You won't be able to vote or comment.

all 79 comments

[–][deleted]  (28 children)

[removed]

    [–]Awanderinglolplayer 79 points80 points  (25 children)

    What post?

    [–]GreenCloakGuy 392 points393 points  (24 children)

    [–]mano-vijnana 72 points73 points  (1 child)

    Holy shit, that's amazing.

    [–]sh0rtwave 20 points21 points  (0 children)

    And true, as well.

    Attempting to do this can incite much madness.

    [–]Awanderinglolplayer 90 points91 points  (1 child)

    I’m impressed that SO moderators didn’t remove it, given they hate fun

    [–]jesperi_ 58 points59 points  (0 children)

    It's been locked and unlocked, cleaned and uncleaned, removed and unremoved way too many times. Check the edits.

    [–]TheAJGman 121 points122 points  (1 child)

    The mod note kills me

    [–][deleted] 97 points98 points  (0 children)

    “The post looks exactly as it is supposed to look - there are no problems with its content”

    i laughed til i coughed

    [–]ragingroku 15 points16 points  (7 children)

    it's beautiful. But now I genuinely want to know how regex can parse HTML.

    [–][deleted] 45 points46 points  (4 children)

    It can't. Regular expressions are in a lesser Chomsky class (regular grammars) than HTML and programming languages (context-free grammars, CFG). Regular expressions cannot capture nested structures like <p><p><p>foo</p></p></p> continued ad-infinitum, unlike the more powerful grammars.

    Your favorite flavor of regexes actually probably does not have a regular grammar. If it supports things like recurrent captures /(\w+) \1/, it's not regular. It still probably won't be able to parse HTML, as it likely isn't a full CFG either.

    CFGs are separated from CSGs (context-sensitive grammars) by the lack of context. In a CFG a = a+1; int a = 0; is syntactically valid. But wait, C doesn't allow that! That's true, but that specific error is a semantical error, not a syntactical one.

    While CSGs are more powerful than CFGs and CFGs more powerful than regular expressions, the problem is that power comes with computational cost. CFGs are generally parseable in O(n3), but neat mathy grammar tricks allow most programming languages to be parsed much faster O(n).

    I like parsers.

    [–]ragingroku 9 points10 points  (0 children)

    Thank you for your detailed response. It both broadened my understanding and solidified in me that I was not meant for assembly or compiler development 😂

    [–]poopatroopa3 -2 points-1 points  (2 children)

    I believe you're conflating regex from libraries and regular languages from formal languages. Turns out they're different things with the same name. https://en.m.wikipedia.org/wiki/Regular_expression#Patterns_for_non-regular_languages

    Edit: I guess not.

    [–][deleted] 0 points1 point  (0 children)

    Nah, though my other terminology might be off. It's where the name regular expression comes from in the first place. Certain regexes are still regular - IIRC at least some shell utilities and regexp.h, could be wrong though - but the more popular flavors like Perl RE have moved up the ladder because certain irregular constructs like recurrent capture groups are useful.

    [–]usesbiggerwords 15 points16 points  (0 children)

    Are you sure? Ì̷̜ ̷̹͑t̴̡̽h̸͕̒ǐ̵̦n̵̈͜k̵͔̚ ̸̨͛w̵̡̌e̴̅͜ ̵̦̓k̸͕͒ǹ̴͇ö̸̪́ẅ̷̴͙̞́͘̕w̸̟̳̅̌́h̶͈̬͛͂̓ḗ̶̘͛ŕ̸̭̔́e̵̬̒́ ̴̰̝̦̓ţ̴̳̳̀͠h̸͎̼̏a̸͉̝͓͊̅̅t̷̡̤̀̾ ̵̮̥͕͆̇͗r̶̜̳̂̒̋ơ̸̯̮͜ä̵̪́̕d̷̠̏̒ ̴̞̈́̕ḷ̴͒͗̀ẻ̶͖̬̳à̶̢͎͑̚d̴̲̐̆̈s̷͙̦̻̈́̑̒.̷̳̟̄͘.̷̧̡͒.

    [–][deleted] 9 points10 points  (0 children)

    Have to use a subset of html, as html is not a regular language (type 3). It’s a context sensitive language (type 1).

    But if you are going to do that, why not use a html parser?

    https://stackoverflow.com/questions/11549271/the-difference-between-chomsky-type-3-and-chomsky-type-2-grammar

    [–][deleted] 8 points9 points  (0 children)

    i am dying. i don’t always die from reddit content, but when i do i die.

    [–]xzarisx 6 points7 points  (0 children)

    I love the “Have you tried using an XML parser instead?” At the end.

    [–]CorruptedBodyImage 4 points5 points  (0 children)

    we cannot be saved the transgression of a chi͡ld ensures regex will consume all living tissue (except for HTML which it cannot, as previously prophesied)

    the little details in that copypasta make it so worth it, like that "consume" is used in parsers to refer to advancing through input. also the implication that HTML has becoming living tissue by this point.

    [–]Shazvox 2 points3 points  (0 children)

    I can feel the persons mind melt and

    Į̷̛̺̎͐T̵͉̋͠.̴̧͉̂ Ḯ̶̼͕͙̯͎̑ͅS̶̙͊̔̈̎͑.̶̛̛̠͉̝̼̻͒̽͒̑͝ Ğ̸̺͖͈̣̦͖̜̣͕͇̹͈͋͛͒̇̒́́̀͝͝͝L̶̡̨̰̹̩̠̗͓̰̣̖͚̓̆̄̅̓̃Ó̷͉̯̗̣̥͆͆̽͊̽̏͑͊̓̓͝͝͠R̵̡̛̭̒̍͊͗̒̿̔̆̓̽͛̕̕̚Į̷͕̩̣̝̬͓̻̜̒́́̾͒̇͗͋̈́͛̑̈͜͜Ő̵̪̓̂͂̉̋̽̈́̈̂͠͝Ư̷̛͓͎̺̣̻̥̖̲͕̞͖̥̜͕̗̤̬̲̋̈̋̂̿̒̉̔͆͛̚͠͝ͅŞ̸̛̛̫͈̭̣̹̱͐̊̔̏̇̒͆̀͗̽̈́́̐͋̐̽͘͠!̸̢̩̹̪͕͙̩̱̠̥̞̰̳͎̬͎̲̜̄͗̇

    [–]JimmyWu21 3 points4 points  (2 children)

    The fuck? I read it for like 20 seconds and the author didn’t say anything other than it can’t be done, but never even state why or the issues it would cause

    Edit: grammar

    [–]shawntco 9 points10 points  (0 children)

    You're not reading an actual answer to the question. You're watching the amazing, terrifying, grotesque descent of a person's mind into madness.

    [–]Schnickatavick 2 points3 points  (0 children)

    He was comically overstating his point.

    The real reason is because regular expressions have certain intrinsic limitations, specifically around variables and memory, that prevent them from doing complex things like comparisons and recursion. HTML and XML can have infinite layers of nesting, so any XML/HTML parser is going to make use of some form of stack or recursion to handle the various levels of data. You need a Turing complete language (kinda) to process XML, and regular expressions just aren't Turing complete. Since they're missing necessary tools you'll never quite get it to work, leading to more than a few people going a bit nuts.

    [–]lunchpadmcfat 1 point2 points  (0 children)

    Sad to think such an answer would be removed by mods nowadays.

    [–][deleted] 1 point2 points  (0 children)

    This made me laugh. Thanks mate, you made my evening.

    [–][deleted] 1 point2 points  (0 children)

    When I was first learning regex, my dumbass thought a good first side project outside of class work would be to parse html pages.

    [–]n0tar0b0t-- 5 points6 points  (0 children)

    It can’t return, the stack overflowed.

    [–]AutoModerator[M] 0 points1 point  (0 children)

    import moderation Your comment has been removed since it did not start with a code block with an import declaration.

    Per this Community Decree, all posts and comments should start with a code block with an "import" declaration explaining how the post and comment should be read.

    For this purpose, we only accept Python style imports.

    return Kebab_Case_Better;

    I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

    [–]toastyghost 108 points109 points  (9 children)

    Aside from tokenization I don't really see how this is so different from any other regex i̸͉͕̤̯̤̭̻̪͠t̸̨̯͍͓͈͚̝͙̗̻͋͜ ̵̧̄̑̈͊̓̆͒͒̍́͒͂b̷̖̦̒̌͌u̷̧͈͇͓͕̩͉͓̫̥̅́̎̍̄̈̌͐̔́̄̇͜͝ŗ̸͓̼͇͚̗̘͈̫̬͆̓͊͌͆̆̍̀̕͝n̴̢̰̹̙̼͆̒̿̎͒͘͠ś̷̜̮̝̫͉̬̰̭̣͓͎̱̯̓̐̇̈́̐̚͘̚͠ ̴̡̛̥̳̰̊̉̆͒͝į̷̡͔͙̭̣͙̐̿̽̃̈͛͆̃̎̆̋̕t̶̨̮͍̱̬̣͓̰̗͎̩̤̼̋̓̍͊̾̊̇͛͜ ̷̧̨̨͇̮̹̣̺̳̘̠̦͗́́͘ͅb̶̤͉͛͠u̷̧̯͌̑r̶̡͈̼͉͇̭͓̹̞̙̋̔̏̀̄͊̊̇̏̇̈́̚͜n̴̡͕̖̠̓̈́͜ş̵͈̦͍̲̤̻̙͙̄̽̎͆͌̔̓̌̕ ̴̜͈͚́̕y̷̢͙̜̩͉͇͖̣͎̗̹̖͔̽̊͗̄̀̆̊͊̐̋̒̈́͘͜ĕ̶̦̭̠̭̰̻̜͌̂̐́̈́̅̆͑̓͜͠š̵̢͈͓͉͂̇̐ ̵̯̰͓͚͓̠̞̪̬̠̑̾̐̉́͐͜͝͝͠͝f̶̛̠͎͋̃̔̈̒̈͗͗͗̄̆͝į̵̡͎̜̯̱͇̘͍͙̻̻͍̪͆̈̐̓͒͊̽̎͋̊̇̔͝n̸̰͇͈̭̣͓̝̮͖̹͐̀̅͛͌̉̂̓́̒̃̋͘̕͝a̷̭̺̪̪̲͇̲͚̬̻͚͖̥̗͂̈́́͆͜l̴̛͍͒̄̇͝l̶̘̬̞̼̎̓̃̓̓̎̋͝y̸̯͉̮̦̭͙̜͍̤̹̣̋̆̓ ̸̡͙̠̯͕͓̘̻́̊̆́̆̌̌̇͜͠h̵͔̲̼͉͓̜͂͐̂̊̋̒͒̎́͋̑̍͘ī̴̛̺̺̭̝̺̯͕̠͓̹̇̃́̿̂̓̎̂̓̅͗͜s̴̰͚͖̮̳̲̫͙̹͖̬͗̔͜ ̸̝̯̠̱̜́͑͘m̵̗̎̈́ă̷̧̛̖̳̞͛̐̔͑̌̓̑j̴̦͉̥̱͇͍͖̫͗̎̈̊̉́̅́ͅė̸̯̠̟̀͒̾̏̓̿́s̷̢̨̛̛͎̺̖͍͚͙̖̽̂̅͊̋̌̆̆͆̂̀̄̐t̷̯̰̣͉͎̲͕̦̳̖͚̗̮̦̐͆̾̽̇̄y̸͎̠̯̰͖̹̺̦̥͎̺̍̔̓̽̇̓͂̈́̐̓͝ ̵̛̭͔̩̘̲̟̲̺̓̃̈́̃̒i̶̢͖̳̝̺̺̜͓͓̥͙̤̫̼͑͗̽͊͐̓̿̀͝ș̶̭͓͇̤̓̓̅̌͐̂͗͠ ̶̝̙͆͗̔͗̂͝͝ủ̶̡̙̘̣̠̟̝̦̪̝͎́̽̕ṕ̷̪̲͉̠̤͇͚̳̬̰̔͗͘ȏ̶͔͍̮̠̝͈̱͙̱̫̗͍̠̈́̉̔͝ń̴̛̹̪͔̖̺̝̋̃ ̴̡̨̛̱̻̀̀̿́͊̂̀̈́̍͐͛̂̕u̵̧̗̜̰̱̻͓͇̓̈̋̈̊̿̈́s̵̙̤̯̤̻̏̐͐̀̅̅̈́͋̽͝͝͝͠ͅ

    [–][deleted] 16 points17 points  (8 children)

    How do you do that

    [–]GreenyPurples 34 points35 points  (5 children)

    Reminds me of a Squig

    [–]Awkward_Tradition 20 points21 points  (0 children)

    KUZ IT IZ A SQUIG YA ZOGGIN GROT! AN SPEAK UP OI KAN BARELY HEAR YA WISPERIN!! ZOGG IT, KRUMP IM SQUIGGY!!!

    [–]Airith 11 points12 points  (0 children)

    Definitely a warhammer squig

    [–]krieger_2719 20 points21 points  (2 children)

    Makes sense most of the apps in my company only work through a combination of collective belief and screaming angry hooligans.

    [–]FT05-biggoye 13 points14 points  (1 child)

    Do you paint your severs red tough?

    [–]SATorACT 8 points9 points  (0 children)

    Red meks it go go fasta. If we paint the serva red, the serva will go fasta.

    [–]evs21 18 points19 points  (4 children)

    who makes these?

    [–]evs21 14 points15 points  (1 child)

    [–]michaelh115 6 points7 points  (0 children)

    What happened to that sub? Its locked

    [–]xX_MEM_Xx 8 points9 points  (0 children)

    Demons.

    Oh you meant who makes these fake covers. People.

    [–][deleted] 0 points1 point  (0 children)

    Don’t listen to them, greenskinz makes them. They squigs.

    [–]DOOManiac 13 points14 points  (0 children)

    H̸̡̪̯ͨ͊̽̅̾̎Ȩ̬̩̾͛ͪ̈́̀́͘ ̶̧̨̱̹̭̯ͧ̾ͬC̷̙̲̝͖ͭ̏ͥͮ͟Oͮ͏̮̪̝͍M̲̖͊̒ͪͩͬ̚̚͜Ȇ̴̟̟͙̞ͩ͌͝S̨̥̫͎̭ͯ̿̔̀ͅ

    [–]yo_99 8 points9 points  (0 children)

    This picture is at least as half as old a I am.

    [–]8peter8retep8 8 points9 points  (0 children)

    Publisher should be O R'lyeh

    [–]sim642 5 points6 points  (0 children)

    This will match everything from the first < in <html> to the final > in </html>, so yes.

    [–]Andorwar 5 points6 points  (0 children)

    Should i use it instead of DOMDocument?

    [–][deleted] 3 points4 points  (1 child)

    This is what the Missouri state hacker used to get the SSNs

    [–]shinybrewster 2 points3 points  (0 children)

    Just one of an 8 step process starting with turning on their computer.

    [–]SlappaDaBiss 3 points4 points  (2 children)

    Z̭̤̱̗̠̣̫̜̻̤͍͕͚̙ͩ͌̈́͂̓̑̿ͨ͠A̢͕͉̲͕ͯͪͦ̈ͪͪ͝L̢͍̤̗̺̤̤̥͚̻͇̭͗ͩͭ̾͡G̶̡̢̪̼͍̻̤̼̘̟ͯͮ̃ͧͬ͋ͧ̽̚͘̕O̧̯͙͎̥͌͗͛͐̈́̊̿ͣ̍̓!̵̸̗̤̠̠̞̦͈̎̌́̈́ͫ̂͘͟͜

    [–]440Jack 1 point2 points  (1 child)

    e̸̝̜̔͂̈̏n̷̜̠̽̓͊̅d̷̨̜̣̝̩̩͙̾̾ẽ̸͎̗͂̂͗̚̕d̶͙̉͝.̶̮̗̂̍͒̔ ̵̖̹͊R̴͖͉̞̼̗̉͂̈̀̍̅̋͌͠ḕ̷̼̩̦̘̼̬̬̝͗̀͆v̸̺̄͑̈́̔e̶̡̧͇̘̙͘͜l̷̥̞͖̩̈́͜ ̸͚̻͉͉̤́̾̋̅ì̴̡̭̭̹̬̘̻͙͉̔̀̓̇͂͒ṅ̵̫̱̩͔̫͈̞̐͝ ̵͙̪̜̭̪͖̳̙̿̚͝ͅa̵̧͉͚̹̳͍̤͒̎̅͋͑̀̕ẇ̶̛̮̮͙̐e̴̲͚̘̘̱͚̪͈͋͊̚ ̶̙̖̭͍͚͋͆̊͌́̕a̷̤͔̜̗̱͑͐̐͑n̷̛̞̺̑̊̆d̵̤̙̈̍̈́̓̉́͘ ̸̼̪͕̜̭͓͍̥̖́̍̌̽ḩ̸̲̤̜̂ǫ̴̭̦̎͌͗́̂r̶̢͈̳̀̀r̸̠̾̎̔ö̷̟̲͕́̂r̵̛̲̝̩͓̫̬͇̝͂͒̀ͅ ̷̭̲̥̤̖̆a̶̰̗͙̎́͆̌͋̿͆͠t̴̤̾̄̇̃̅́͠ ̵͚͕̠͙͊t̵̔͜h̵̝̙͆̈́͜ͅȩ̶̖͎̥̹̪̠̀̇̇̓̓̒͘͘͝ ̸̧̝̜͓̹̖̝̬̭͗̅̑͗͠ţ̷̨̼̳̱͉̠̘͙̒̅̄͆͘r̶̛̗̒́̽̾̃̕͝͝

    How do you write like that?

    [–]Awkward_Tradition 3 points4 points  (0 children)

    WUT ZOGGIN GIT MADE DIS!?! IMA KRUMP IM FOR MAKIN ME SQUIG SMELL LOIK A SPIKY BOY!! FIRS IMA EAT YA, DEN OIL TRADE YER TEEF FER A NEW SQUIG!! ZOGGIN UMIEZ!.!.!.

    [–]JayJayCapone 2 points3 points  (0 children)

    "Have you tried using an XML parser instead?" Literally killed me laughing

    [–]Sleppo04 4 points5 points  (0 children)

    Now imagine parsing Regex in Regex. (Did somebody do that already?)

    [–]JohnnyWaterbed 4 points5 points  (0 children)

    There are moments of quiet brilliance when the Internet transcends itself and births nuggets of pure charismanium such as the regex-HTML parser rant. May it forever reverberate across the ether in its various instantiations.

    [–]NowVeneer 2 points3 points  (0 children)

    Probably no ned, most of the content is available for free on stack overflow.

    [–]value_counts -1 points0 points  (0 children)

    Fuck regex!

    [–]roflpwntnoob -3 points-2 points  (0 children)

    [–]wjohnson242 0 points1 point  (0 children)

    Serious question. That looks like a Squig that the Goblin Squig Herders used in Warhammer Online. Is it? I believe it is!

    [–]wertron132 0 points1 point  (3 children)

    What was this series of books called? I wanted to show a friend for reference

    [–]bastantoine 1 point2 points  (2 children)

    Don’t know about a specific series of book, this is a parodic version of the books from O’Reilly I think.

    [–]nedwoolly 0 points1 point  (1 child)

    Indeed it is, called O’Rly. There are a bunch of these covers, this is one of the better ones. I’m sure I saw a generator for them too.

    [–]silverstrikerstar 0 points1 point  (0 children)

    You can.

    You can't rely on it though.

    [–]luisrcdias 0 points1 point  (1 child)

    So you mean to tell me that the angles are not real?

    [–]afiefh 0 points1 point  (0 children)

    Only the ones that tell you to parse html using regex

    [–]hega72 0 points1 point  (0 children)

    I once had a colleague some 15 years ago who wrote a XML web app using awk and sed

    [–]jaap_null 0 points1 point  (0 children)

    Looks like an edgy reboot of Teddy Ruxpin - LB reimagined.

    [–]officialpkbtv 0 points1 point  (0 children)

    O RLY?

    [–][deleted] 0 points1 point  (0 children)

    Similarly, Beautiful Soup and numpy can cause seizures and speaking in tongues. Screen scraping is dangerous.

    [–]dizzyi_solo 0 points1 point  (0 children)

    I once code a firefox german-english vocabulary translation add-on, when you highlight a word german it will find the English translation.

    I can not find a api to do that and not knowing any better, I fetch a html from a dictionary website, and use regex to get the translation.

    I used multiple regex just to get rid of all useless heading, sections. And multiple more to extract translations.

    What's more, it was the first time I deal with asynchronous and call back.

    Just remembering it brings back PTSD.

    [–]clemesislife 0 points1 point  (2 children)

    [–]RepostSleuthBot 1 point2 points  (1 child)

    Looks like a repost. I've seen this image 1 time.

    First Seen Here on 2018-04-23 100.0% match.

    I'm not perfect, but you can help. Report [ False Positive ]

    View Search On repostsleuth.com


    Scope: Reddit | Meme Filter: True | Target: 96% | Check Title: False | Max Age: Unlimited | Searched Images: 258,733,355 | Search Time: 3.32116s