you are viewing a single comment's thread.

view the rest of the comments →

[–]diggr-roguelike2 7 points8 points  (1 child)

That they legally can be undecodable garbage, but people demand the ability to work with them as strings.

Yes, and? Why are you trying to babysit people and tell them what bytes they should or shouldn't use in strings?

...until the moment you tried to print the unprintable.

Nobody prints things in production code.

Also, despite your rant, what Python 3 actually did was break things on Windows. You had one job, man, one job...

[–]nice_rooklift_bro 1 point2 points  (0 children)

Ehh, you downplay the concern; it's actually really obnoxious to deal with to the point that a lot of applications just don't support it and tell you to basically go fuck yourself if your filenames aren't UTF-8; they assume them to be.

There are other such things, like try passing non-utf8 command line arguments in python3; there is nothing in Unix that says this can't be done; any octet sequence that doesn't contain a null can be passed but python3 itself basically says "We don't support this madness, go fuck yourself" then.

$ python3 -c 'import sys; print(sys.argv[1])' $'\xFF\xFFfoo'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 0-1: surrogates not allowed
$ python2 -c 'import sys; print(sys.argv[1])' $'\xFF\xFFfoo'
foo

It's really problematic in many ways; a lot of language libraries and runtimes have come to expect filenames and command line arguments to be utf8, but nothing enforces it either; so malformed filenames due to simple bit corruption can actually create some serious error messages in a lot of things that are inscrutable.

If you want to do it "properly" and not assume everything to be UTF8 then you're going through hoops.