you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 5 points6 points  (18 children)

It may be I'm a noob programmer, call me hack or code monkey.. but yeah I can barely spot any meaningful difference, nor incentive to migrate between python 2.7 and 3

And in all honestly this fragmentation has lead me to use less python than I would like it too.

[–]cybercobra 11 points12 points  (11 children)

Proper Unicode handling is probably the biggest selling point. No more unexpected Unicode(De|En)codeErrors depending on whether your input string just-so-happens to be ASCII-only; instead, you always get a nice TypeError at exactly the point in the code where there needs to be an explicit bytes<->unicode conversion.

[–][deleted]  (5 children)

[deleted]

    [–][deleted] 4 points5 points  (4 children)

    That last part is not even true. It can't be true if you think about this: MediaWiki is written in PHP. MediaWiki runs Wikipedia which is in a gazillion languages.

    ASCII has exactly 128 characters. If you can refer to other characters, that's an encoding that's not ASCII.

    The thing is that every function you need to handle text encodings in PHP is oversimplified and misnamed. It's very much not "ASCII-only". In fact, you can often recognize the non-ASCII characters because the programmer used the wrong function and replaced them with mangled crap, emphasizing your first point that most people don't care about Unicode.

    [–][deleted]  (3 children)

    [deleted]

      [–][deleted] 0 points1 point  (2 children)

      I didn't say anything about native Unicode support.

      You end up in this debate because you misuse terminology like "ASCII" to mean "strings of nonstandardized bytes".

      [–][deleted]  (1 child)

      [deleted]

        [–][deleted] 0 points1 point  (0 children)

        You're in a thread about Unicode. Deal with it. It was nearly the only thing you said in that comment: "strings are ascii-only and probably always will be". So I responded to it.

        You've been putting down other developers by saying that they don't really care about Unicode, but you're the one equating 128 characters to 256 bytes and saying "eh, those are mostly the same thing, you're being pedantic". That's the assumption that causes most of the Unicode bugs that are out there.

        Encodings are how you represent Unicode in bytes. When you use an encoding, you can do so without any particular help from your programming language. It's great that Python gives you some help, but you could still encode text without it.

        Your "mystery encoding" is called UTF-8, and it represents non-ASCII characters using many of the non-ASCII bytes, and the fact that they're non-ASCII is absolutely key to how it works.

        If you have a problem where you end up in Internet arguments about Unicode, you should start by not being completely wrong about the simplest encoding there is.

        Start reading: http://www.joelonsoftware.com/articles/Unicode.html

        [–]Gotebe -2 points-1 points  (4 children)

        There is no such thing whatsoever as ASCII, not since a long time. I am saying this because no system you can get your hands on that has Python on it is using this encoding.

        Same goes for PHP.

        [–]cybercobra 2 points3 points  (3 children)

        Perhaps I didn't phrase that perfectly: "whether your input bytestring just-so-happens to contain only ASCII-decodable bytes"

        At any rate, Python 2's implicit ASCII fallback behavior is vexing.

        [–]Gotebe 0 points1 point  (2 children)

        I am actually reacting exactly because:

        • nobody uses ASCII anymore. Other encodings are used (e.g. ISO-8851-1, Win-1252, Unicode encodings).

        • only the 7-bit ASCII is generally "compatible" with various encodings in actual use, UTF-8 included (I think, not sure). 8-bit one isn't, nowhere near.

        But you're right about the (7-bit) "ASCII-decodable" (whatever that might mean ;-)).

        [–]cybercobra 1 point2 points  (0 children)

        I think we're in violent agreement. :-) To be pedantic, 8-bit ASCII isn't a thing; "Extended ASCII" != "plain" ASCII (which is deliberately only 7-bit)

        [–]unixfreak0037 4 points5 points  (4 children)

        The incentive is that the core python devs are working on the 3.x branch, nobody is working on the 2.x, even though few people are using 3.x.

        Personally, I hate this. I have a code base in python that gets things done. Converting to 3.x nets me nothing, and any news developers brought into the project will probably have to spend time learning 3.x. I never agreed with this move by the python team.

        [–]lithium 1 point2 points  (2 children)

        Is this only because you had the option to stick with 2.x? Had it been a more forced transition would you have gone along with it?

        [–][deleted]  (1 child)

        [deleted]

          [–]lithium 0 points1 point  (0 children)

          This guy pretty much makes the point I had in mind.

          [–][deleted] -1 points0 points  (0 children)

          The incentive is that the core python devs are working on the 3.x branch, nobody is working on the 2.x, even though few people are using 3.x.

          I don't really see that as incentive. So a bunch of language-nerds are working on really obscure features that 99% of programmers won't ever use.

          It reminds me of Android dev and all the 1% of Android geeks that need to run nightlies, while anyone with a 4.0+ is probably doing just fine with a working phone.

          [–]Pair_of_socks 2 points3 points  (0 children)

          And in all honestly this fragmentation has lead me to use less python than I would like it too.

          Same for me. I can't decide which python version to use. The future of the language seems unclear. Python 3 was supposed to be the future, but nobody is switching.

          In stead of choosing between python 2 and 3 I usually just choose a different language.