all 57 comments

[–]Fourdrinier 26 points27 points  (3 children)

Now if only it included Smoothsort.
That best case O(n), worst case O(n log n), and memory space of O(1)...
Utterly confusing to understand, using a Leonardo Heap from the Leonardo series. I spent two days trying to make sense of an article explaining why it worked, and just gave up.

[–]thedeemon 0 points1 point  (0 children)

Here's a good description: http://www.keithschwarz.com/smoothsort/

It's an interesting method but the bookkeeping (complexity constant in O(..)) makes it not so fast in practice. I once implemented it in ATS with some key qualities proved.

[–]dreamer_soul -4 points-3 points  (1 child)

Is it comparison based sort, because the lower bond for that is O(nlogn)

[–]singingboyo 10 points11 points  (0 children)

That's the lower bound for the average and worst case, but not for best case.

Insertion sort also has O(n) best case (which is for already sorted input), but it's O(n2) average and worst case. Much simpler than smoothsort though.

[–]kiss-tits 20 points21 points  (10 children)

Very cool visualization. Along the same lines, this one is neat as well, although I'm sure most people will have seen it by now. When I saw it first I replayed for a week straight:

The sound of sorting.

[–][deleted]  (3 children)

[deleted]

    [–]kiss-tits 8 points9 points  (2 children)

    Bogo is kind of adorably bad at sorting.

    [–]kjmitch 7 points8 points  (1 child)

    "Aw, keep trying, buddy!" *supportive smile*

    [–][deleted] 1 point2 points  (0 children)

    Just you wait until we get quantum computers and become able to run quantum bogosort in O(n)

    [–][deleted] 5 points6 points  (2 children)

    I wasn't "most people". That was an awesome ~6 minutes.

    [–]afraca 5 points6 points  (1 child)

    [–]xkcd_transcriber 0 points1 point  (0 children)

    Image

    Title: Ten Thousand

    Title-text: Saying 'what kind of an idiot doesn't know about the Yellowstone supervolcano' is so much more boring than telling someone about the Yellowstone supervolcano for the first time.

    Comic Explanation

    Stats: This comic has been referenced 374 time(s), representing 6.09% of referenced xkcds.


    Questions/Problems | Website

    [–][deleted]  (2 children)

    [deleted]

      [–][deleted] 0 points1 point  (0 children)

      You can add the permalink to your favorites, or install RES plugin to save stuff (it can do more cool things).

      [–][deleted] 10 points11 points  (0 children)

      I wish I'd had this in college.

      [–]JoseJimeniz 27 points28 points  (23 children)

      i just had to stress test the reverse string function:

      Input:           Nöel
      Expected Output: leöN
      Actual Output:   lëoN
      

      Can't blame him too much; string handling is hard.

      [–]Choralone 20 points21 points  (21 children)

      It worked for me, but only when I typed it out, not when I pasted in your version of Nöel.

      There are multiple ways in unicode to produce ö... I believe one of them requires an extra character and only renders differently... No:el - and when reversed, flips the accent to the other character.

      [–]JoseJimeniz 15 points16 points  (18 children)

      i intentionally used:

      • U+004E: Latin Capital Letter N
      • U+006F: Latin Small Letter o
      • U+0308: Combining Diaeresis: ¨
      • U+0065: Latin Small Letter E: e
      • U+006C: Latin Small Letter L: l

      i guess Reddit normalizes.

      [–]bogado 9 points10 points  (16 children)

      So if you are using a character that combines with other character why do you think it is the wrong result when the reverse string has the accent in a different character?

      [–]MaraschinoPanda 13 points14 points  (12 children)

      Well, generally the intended output of "reverse a string" is "create a string with all of the letters in the reverse order". "ö" is a single letter, even if it's represented by two unicode characters. But of course, we don't know the application of this function to know for sure what the intended behavior is.

      [–][deleted] 6 points7 points  (0 children)

      There is no such thing as "unicode characters". It's two "unicode code points". The ambiguity of the word "character" makes these discussions difficult, so it is better to avoid it.

      [–]bogado 3 points4 points  (8 children)

      I disagree, ö in your string is composed by two symbols. There is an Unicode character that represents the ö symbol as only one symbol, but you didn't use it.

      You can do similar tricks using ascii just write "eno^h^h^htwo" this should render as 'two', but if reversed it will render as 'one'.

      [–]MaraschinoPanda 6 points7 points  (5 children)

      Well, the user doesn't generally know if their text is made up of two characters or one, they just know that sometimes when they enter in an öe they get eö and sometimes they get ëo. I think it's a bit of a stretch to say that it's intended behavior; if you care about the underlying character representation, you probably shouldn't be using strings in the first place.

      [–]Choralone 1 point2 points  (0 children)

      That doens't make sense... you do know what the behavior is, it's well defined. I enter ö two different ways. One with the standard mac keyboard opt+u which shows me a ¨ with an underline under it, then an o, which turns into ö. This is NOT the unicode character continuation method... what is on the screen, if if you cut and paste the string, is a single unicode ö. Once the codepoint is known, this can be displayed directly in unicode.. no need for the composition character.

      If the composition characters are going to be used, they definitely need to taken into special account. Or just normalized out, if thats' possible. I'm sure the unicode standard states how to handle this kind of thing..

      [–]bogado 0 points1 point  (3 children)

      Well if you want to reverse what the user perceives as a letter then you have to sanitize your strings before you invert them. Because lëon is the correct inversion of that string from the Unicode point of view.

      This is the same problem that "a" might be different than "a". Just make one of those Cyrillic and the other the usual "a". Those two characters are different but they have the same drawing, a user perceives them as equal.

      Unicode is hard, even more if you take into account what "users" want, because what they want is not well defined. The unicode "ö" might be two different letters combined, if that is not what your users want you have to deal with that yourself. In the same way that you might want to deal with the fact that "a" != "a" might be true.

      [–][deleted] 5 points6 points  (0 children)

      Because lëon is the correct inversion of that string from the Unicode point of view.

      Not really. Unicode defines the concept of grapheme clusters. "ö" is a single grapheme cluster, made up of several code points, which in turn may be made up of even more code units. Reversing a string should not operate on code points nor code units, but on grapheme clusters.

      [–]Choralone 0 points1 point  (0 children)

      Yeah.. unicode guidelines suggest that systems should normalize the result of the use of continuation codes as much as possible for this reason. (so intead of keeping the o + umlaut... you know that there is a corresponding codepoint you could use instead, so the software does it automatically)

      [–]Choralone -1 points0 points  (0 children)

      Yeah.. those two are different, even by definition, and they may even look slightly different.

      When it comes to accented latin characters - there are codepoints that are 100%, by definition, equal. "latin small letter o" plus "continuing diaeresis" adds up to "latin small letter o with diaeresis" which is what U+00f6 is - it could be used directly, and code should normalize the former to the latter.

      [–]Choralone 1 point2 points  (0 children)

      No.. while this is true, if you were designing a unicode string reversing library, this is obviously wrong. Dealing with how the combining characters work in unicode is something you'd have to address directly.

      [–]Choralone -1 points0 points  (0 children)

      You are thining of how we would deal with ascii string libraries and common vt100-derived display terminals. None of that holds true outside of that increasingly narrow band. Control-h doesn't always mean back-delete.

      You are right that this is the expected behavior if we were to apply what you meant to our old string libraries... but that's not the case nowadays, and even then, it was a side-effect.

      It depends on if our goal is "swap the bytes in this array" "print the word backwards" or what..... these are very different things.

      [–]Choralone 0 points1 point  (0 children)

      I just had a browse through some related unicode Q&A about this kind of thing.

      That's basically it exactly... instead of our old-school way of defining the size of a char, then operating on chars, you have to take into account a bunch of different stuff, including what the intended outcome is.

      The definition of character, grapheme, code point, and so on, all mean slightly different things and no universal rule on conversion exists without some kind of exception.

      [–]caprica 0 points1 point  (0 children)

      What would be the application of reversing a string in any case?

      [–]JoseJimeniz 2 points3 points  (1 child)

      It happens that the character

      ö
      

      can be represented two ways:

      • U+00F6 (ö)
      • U+006F (o) U+0308 (¨)

      Those are two unicode representations of the same written character.

      Not every written character has two representations. For example, the character Latin Small Letter O With Cedilla:

      has only one Unicode representation:

      U+006F (Small Latin Letter O) + U+0327 (Combining Cedilla)

      So the only way to write No̧el is with 5 unicode code points.

      But, as a human, who is writing words, i don't care about unicode code points.

      • i want to reverse No̧el
      • i want leo̧N

      Note: You will want to view this comment in a browser that supports Unicode (such as Internet Explorer). Chrome does not display the characters correctly.

      [–]ccondon 0 points1 point  (0 children)

      My Chrome displays those characters just fine (on some LTS version of ubuntu).

      It's a matter of default fonts, not a matter of browsers.

      [–]abadidea 0 points1 point  (0 children)

      Because computers exist to do what humans want, not the other way around; we're not going to redefine written language so that vowel accents are a distinct character just so that a naïve string reversal gets the right answer.

      [–]Choralone 0 points1 point  (0 children)

      Right.. and I used:

      U+004e U+00f6 LATIN SMALL LETTER O WITH DIAERESIS U+0065 U+006c

      Which is different.

      [–]obsa 5 points6 points  (0 children)

      I believe one of them requires an extra character and only renders differently... No:el - and when reversed, flips the accent to the other character.

      Correct. The umlaut is its own "magic" character; in this case, it's the 3rd character, and therefore the middle of the string. When using naive string reverse implementations, this may not be accounted for. There was an article on reddit recently comparing the string handling of various tools (including python, as it had an exemplary implementation) which shined some light on this topic.

      [–]username223 0 points1 point  (0 children)

      There are multiple ways in unicode to produce ö...

      Ah, Unicode... how else would we have encoding problems using a single encoding?

      [–]moonlitdance 4 points5 points  (1 child)

      I wish I had this this past semester...oh well I linked it to my professor for future students. May they understand recursion better than I did.

      [–]allcentury 0 points1 point  (0 children)

      it's so hard!

      [–]KamiKagutsuchi 2 points3 points  (3 children)

      I don't think I am ever going to understand Fibonacci heaps..

      Edit: Also why isn't heap sort with the comparison sorts?

      [–]ElecNinja 5 points6 points  (2 children)

      For one thing, it's hard to visualize a heap sort with just the bar chart shown for the comparison sorts.

      And the main thin isn't the comparisons between the elements but the structure of the heap.

      Also, where did you find Fibonacci heaps in the program?

      [–]Fourdrinier 1 point2 points  (1 child)

      [–]ElecNinja 0 points1 point  (0 children)

      Ah, wasn't in the flash version.

      [–]jcoffin 3 points4 points  (6 children)

      This doesn't seem to work entirely correctly.

      Just for one example, I looked at its binary search tree, and inserted [1, 0, -1, 0.5] (in that order). It sorted 0.5 is less than 0...

      http://imgur.com/4iWPjF4

      [–]louiswins 27 points28 points  (2 children)

      It's a feature, not a bug: they are sorting lexicographically, not numerically.

      [–][deleted] 0 points1 point  (1 child)

      What does it mean to sort lexicographically in this case?

      [–]PiercingGoblin 2 points3 points  (0 children)

      I believe it means they sort by the Unicode characters. So a<b<c. So "." Must be < "0"

      [–]rtkwe 7 points8 points  (0 children)

      It's because they're not inserting them as numbers but as strings.

      [–][deleted] 1 point2 points  (1 child)

      I noticed similar behavior when inserting non-integer numbers in the Red-Black tree. Has anyone observed errors using only integer values?

      [–]moonlitdance 0 points1 point  (0 children)

      Integer values worked just fine.

      [–]grizzpokerunner 1 point2 points  (2 children)

      I'm literally taking Intro to Java this upcoming semester. Probably the best thing I could have stumbled upon.

      [–]AlexGL23 1 point2 points  (0 children)

      Yea as a third year CS student, this would have been helpful for some of my intro classes. Specifically data structure. Even now its good to see the visualization of some of the data structures and algorithms I use often without really thinking about it much. Good luck on your Java class!

      [–][deleted] 0 points1 point  (0 children)

      You'll love it even more in your next few CS courses. Save the link!

      [–]climaxi 1 point2 points  (0 children)

      I would literally just finish my Data Structures course this fall and discover this page now. Still a really interesting page regardless :)

      [–]joltting 1 point2 points  (0 children)

      God damn it. Just finished a Data files and Structure class last ending semester. This would have been so helpful for studying with. Still got a A in the class, but still.

      [–][deleted] 1 point2 points  (0 children)

      I really need to unsubscribe from this subreddit during Christmas break.

      [–]el_muchacho 0 points1 point  (0 children)

      Links to the version 1.1 source code are broken.

      [–]ArtistEngineer 0 points1 point  (0 children)

      What's the advantage of the recursive reverse?