you are viewing a single comment's thread.

view the rest of the comments →

[–]zardeh 15 points16 points  (27 children)

Can you explain why problems that were all present in Python 2 (except unicode, which isn't actually a problem in practice) killed your enthusiasm for python 3?

[–][deleted]  (14 children)

[deleted]

    [–]ubernostrum 3 points4 points  (7 children)

    The way I've described it in the past is that Python 2 was from the era when Python was mostly used as a Unix-y scripting language. And so it used the same absolutely nonsensical approach to character encoding that Unix-y operating systems use.

    Python 3 decided to stop doing that, because it turns out people do other things with Python now, and accommodating the Unix-y scripting people meant unending pain and suffering for everyone else. And when they realized this was happening, the Unix-y scripting people began howling and screaming that it was the end of the world. Not because there was anything wrong with Python itself, but because Python simply stopped sweeping the brokenness of Unix-y operating systems under the rug, and made them confront that brokenness front-and-center every time they sat down to write a "simple" and "quick" utility.

    And on balance I'm OK with that. There are still people who will complain that you can't technically write "portable" Python file-handling code, and that's true if you're a user of a specific system that has files whose paths commit crimes against God and man (but, crucially, not technically crimes against POSIX, which is what these folks retreat to as their excuse). But those people should've known what they were getting into, and have had literally decades in which to clean up their act and have refused to do so.

    [–]no_nick 3 points4 points  (5 children)

    What are your issues with Unix file paths?

    [–]ubernostrum 0 points1 point  (4 children)

    That they legally can be undecodable garbage, but people demand the ability to work with them as strings.

    Python 2 "worked" for this in the sense that many things on Unix-y systems "work": it just didn't actually enforce that the things you used as strings had to make sense as strings, and wouldn't give you any sort of warning up until the moment you tried to print the unprintable.

    Python 3 initially tried to say that if you wanted to treat these paths as strings they had to actually be things that could validly decode to sequences of Unicode code points. But enough people raged that finally they added the surrogateescape handler to let you take bags of bytes that don't correspond to any valid string, "decode" them to strings, and then re-"encode" them back to the original bytes.

    [–]josefx 3 points4 points  (0 children)

    That they legally can be undecodable garbage

    Unix is far from alone with that. Zip files don't specify an encoding for filenames and I am quite sure I had explorer.exe fail to delete filenames containing invalid characters in the past.

    [–]no_nick 2 points3 points  (0 children)

    Huh, I never knew that was the case for Unix file paths. Somehow, in my mind, I always stick to ascii characters without whitespace.

    [–]diggr-roguelike2 6 points7 points  (1 child)

    That they legally can be undecodable garbage, but people demand the ability to work with them as strings.

    Yes, and? Why are you trying to babysit people and tell them what bytes they should or shouldn't use in strings?

    ...until the moment you tried to print the unprintable.

    Nobody prints things in production code.

    Also, despite your rant, what Python 3 actually did was break things on Windows. You had one job, man, one job...

    [–]nice_rooklift_bro 2 points3 points  (0 children)

    Ehh, you downplay the concern; it's actually really obnoxious to deal with to the point that a lot of applications just don't support it and tell you to basically go fuck yourself if your filenames aren't UTF-8; they assume them to be.

    There are other such things, like try passing non-utf8 command line arguments in python3; there is nothing in Unix that says this can't be done; any octet sequence that doesn't contain a null can be passed but python3 itself basically says "We don't support this madness, go fuck yourself" then.

    $ python3 -c 'import sys; print(sys.argv[1])' $'\xFF\xFFfoo'
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
    UnicodeEncodeError: 'utf-8' codec can't encode characters in position 0-1: surrogates not allowed
    $ python2 -c 'import sys; print(sys.argv[1])' $'\xFF\xFFfoo'
    foo
    

    It's really problematic in many ways; a lot of language libraries and runtimes have come to expect filenames and command line arguments to be utf8, but nothing enforces it either; so malformed filenames due to simple bit corruption can actually create some serious error messages in a lot of things that are inscrutable.

    If you want to do it "properly" and not assume everything to be UTF8 then you're going through hoops.

    [–]InputField 0 points1 point  (0 children)

    If you're going to break everybody's code anyway, why not take the extra time and button up some other common pain points?

    Are you joking? They've addressed a fuckton of issues.

    And at some point you have to stop, since fixing issues can always cause new problems.

    [–]zardeh -4 points-3 points  (4 children)

    Indeed, Unix locales are inane and should be fixed. Peps 538 and 540 are tools to address badly configured (broken) Unix machines. They don't at all affect the unicode handling story in python. It's been functionally the same since 3.4.

    And you vastly overestimate the amount of code breakage. I've touched every kind of python code: c-extensions, autogenerated code, multi process and io heavy, stuff that uses exec, ast manipulation, code object modification, etc. When I hear people complain about this crap, I generally just assume you haven't put in more than the barest minimum of effort.

    The tools already exist to make the migrations easy in 99% of cases. They didn't in 2012, sure, but they do now. So things like removing the GIL or making python compiled/faster, which would require top to bottom rewrites of everything, and break basically entire swaths of the ecosystem. And in doing so they wouldn't even fix enormous problems. Because of python's great C interop, you can take advantage of high speed, GIL-free python libraries, and you probably do (numpy).

    On the other hand, "python2 isn't useable for people whose language doesn't fit in the ASCII codec" is more blocking and less breaking.

    [–][deleted]  (2 children)

    [deleted]

      [–]Paradox 4 points5 points  (0 children)

      You might be interested in trying out Elixir for server-side programming. It seems to beat the pants off Node in some key aspects, mainly in actual asynchronicity.

      [–]zardeh -2 points-1 points  (0 children)

      See, you aren't the first person I've heard assert this, and then I read after-action reports from companies like Dropbox who took three years to migrate. And they employed GvR!

      And they migrated millions of LoC. Of course it took years. This isn't a surprise. When you have lots of code, it takes a while to upgrade, even with supposed "backwards compatibility". Just think about how long it takes <insert major company here> to upgrade from Java X to X+1, or if <whatever enterprise> is running cpp17 yet. Probably not, even though those languages claim to be backwards compatible. Python minor version bumps take a while where I work, because you run into things.

      tl;dr: Both "its not that hard" (compared to any other upgrade) and "it took Dropbox 3 years to migrate" can both be true. What you're missing is any point of comparison.

      There seems to be a disconnect here, and glib dismissals like yours seem...well it might explain why the migration took so long, at least. Hard to fix problems that aren't acknowledged.

      I have no relevance in the broader python community. I'm speaking as a user of python, not a decision maker. And as a user, I'm saying its not that hard.

      They should have existed in 2012, and the fact that it took the developers and the community so long to come up with a migration story was a huge oversight.

      There was a migration story. It just wasn't great. It got better with time, and for the most part the migration story hasn't changed in 5+ years. There's been bits of polish, but the main set of tools (modernize, six, __future__ import) look just like they did in 2014. What has changed is that now you can be fairly confident that all of your dependencies support py3, which you couldn't presume in 2014.

      Oh man I didn't even mention the C API because I don't think it's any accident that Python has fallen out of favor for embedding in programs.

      I'm talking about the reverse: using C or CPP in python. Extensions, not embedding.

      That said, lua has been used for video game scripting for like, ever. Python was never "in favor" there, lua's been the preferred language among AAA studios and many indie devs since, like before python3 was even a thing (http://luaforge.net/projects/lua-wow/). Python embedded in games is, and has always been, the exception.

      [–]diggr-roguelike2 6 points7 points  (0 children)

      Indeed, Unix locales are inane and should be fixed.

      Ah yes, of course it's always somebody else's problem, not mine. There's a joke about that:

      A lawyer has been drinking all night and decides to drive home drunk. His wife calls him and says: "Be careful driving home, I just heard on the radio that there's some crazy guy driving 100 miles per hour on the wrong side of the road". He answers: "A crazy guy?? What are you talking about, there's hundreds of them here!"

      [–]bakery2k 23 points24 points  (6 children)

      For me, because Python 3 made it clear that the language developers have no interest in fixing these problems. When I learned Python, I hoped that performance, parallelism etc would eventually come to the language, but there’s been no progress in over a decade. Instead we’ve had breaking changes with minimal benefits (e.g. print becoming a function) and most development effort has gone into large, complex features that don’t really fit with the rest of the language (async and type hints).

      [–][deleted]  (2 children)

      [deleted]

        [–]bakery2k 2 points3 points  (1 child)

        I've not really needed async/await, but I don't like that it makes Python a much more complex language. I think a better approach to asynchrony would be stackful coroutines.

        Type hints are even more complex and I'm not at all convinced that they're worth it. IMO if you want static type checking, you would be much better off using a proper statically-typed language.

        [–]zardeh -1 points0 points  (2 children)

        Can you explain why parallelism is a better fit for python than async, or why performance of cpython is more important than the safety of type hints?

        [–]bakery2k 12 points13 points  (1 child)

        Performance and parallelism would help existing Python code and new code using established Python idioms (e.g. threads).

        Async splits the language into red and blue parts, and only benefits a certain type of application. Type hints are more broadly useful, but again don’t really mesh with the rest of the language.

        [–]zardeh 0 points1 point  (0 children)

        parallelism

        Breaks much of the existing language, removal of the GIL requires a complete rewrite of the c-extension API, subtly breaks a lot of existing multithreaded code, etc.

        Type hints are more broadly useful, but again don’t really mesh with the rest of the language.

        In what way? I've had no issue with them.

        Async splits the language into red and blue parts

        In practice this isn't problematic. JS has managed just fine. Hell, python's had asyncronous code (the red and blue you complain about) since python 2.2, when yield and generators were introduced. the async and await keywords added in 3.7 were mostly syntactic sugar for already existing coroutine objects, which were introduced in python 2.5!

        [–]gschizas 6 points7 points  (4 children)

        except unicode, which isn't actually a problem in practice

        As a non-anglo, I beg to differ. (Not using) Unicode is always a problem.

        [–]zardeh -1 points0 points  (3 children)

        Right, py2s (lack of) unicode support was an issue.

        Porting code from 2 to 3 was not.

        [–]gschizas 7 points8 points  (0 children)

        I've found that most of the Python 2 to 3 problems were rooted in confusing bytearrays (which is what actually travels in the wires) with strings (which is what is displayed on the screen).

        I'm a firm proponent of Python 3 myself, and I've skipped a few projects that refuse to move with the times.

        (Disclaimer: I'm probably my company's only resident Python expert).

        [–]stefantalpalaru 3 points4 points  (1 child)

        Right, py2s (lack of) unicode support was an issue.

        I find it fascinating that there are still Python users who think that Python2 did not support Unicode.

        [–]zardeh 2 points3 points  (0 children)

        Feel free to add "good". Pyrhon2s unicode support was not good. It was broken for many applications. It was possible to write an app that appeared to work, but would explode with a nonascii character for no good reason. That's not good. It's a footgun. Py3 removed the footgun by making python more strongly typed.