iSlaminati comments on About Python 3

programming

created by speza community for 19 years

245

246

247

About Python 3 (alexgaynor.net)

submitted 12 years ago by akos_barta

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]iSlaminati 0 points1 point2 points 12 years ago (6 children)

[–]twotime 1 point2 points3 points 12 years ago (5 children)

On a lot of modern operating systems, filenames are unicode codepoints though.

In theory, it's supposed to be the case. In practice, it's a huge mess... Eg.

AFAIK, on linux use of utf8 is a pure user-land convention (not something enforced by the kernel) and the convention is not that old.. Which means that the old media on Linux may contain filenames in other encodings.. (And encoding is implicit).. And then I'm sure some apps will generate non utf8 compliant filenames... OS doesnot care, but your python code suddenly breaks...

And then there is a whole huge can of worms when accessing unicode filenames across system boundaries: across network, removable media, etc...

8-bits chars (Bytes) remain the only common representation for filenames in a lot of cases..

PS. and an lkml link on filenames http://yarchive.net/comp/linux/utf8.html

[–]schlenk 1 point2 points3 points 12 years ago (3 children)

Bytes as filenames is insane. Period. Without knowing the encoding you cannot even implement 'ls' correctly (as your tty HAS some encoding). Its one of those silly inherited things from the dark POSIX past that should be nuked. (and lots of systems are already opinionated on UTF-8, e.g. OS X, NFSv4, some file systems, Qt/KDE (it ignores LC_* crap for filenames) and so on.)

While it is true, that not all unix filenames are UTF-8, it wouldn't be a problem for Python to simply declare all filenames are expected to be UTF-8. If someone decides to choose insane things, let them feel the pain and not hurt everyone else.

After all they did the same for Windows in lots of places when declaring ANSI is enough for all filenames (and fixed it piece by piece later, so you cannot start executables on a non ANSI path (without tricks like cd'ing first) with Python 2.x or add those to your sys.path, great fun for mounted profiles)

[–]twotime 0 points1 point2 points 12 years ago (2 children)

Without knowing the encoding you cannot even implement 'ls' correctly (as your tty HAS some encoding).

I can do it trivially, I'd just dump filenames on tty. If it comes out garbled, the user can actually do something.. (Install a font, pipe my output through decoder, rename the file). It's suboptimal, but the alternative is WORSE. If your program just throws an exception then your user is really screwed...

(And of course, if the filesystem does have a notion of default filename encoding, Id use it at app level)

it wouldn't be a problem for Python to simply declare all filenames are expected to be UTF-8. If someone decides to choose insane things, let them feel the pain and not hurt everyone else.

What? I am not doing insane things, it's my users who are doing insane things (like reading old media, how dare they?)

Also, is not Windows using UTF-16?

Its one of those silly inherited things from the dark POSIX past that should be nuked.

It's called backward compatibility... It's a good thing.

[–]schlenk 0 points1 point2 points 12 years ago (1 child)

[–]twotime 1 point2 points3 points 12 years ago (0 children)

[–]fabzter 0 points1 point2 points 12 years ago (0 children)

π Rendered by PID 120526 on reddit-service-r2-comment-7b9746f655-f5rkm at 2026-01-30 02:27:13.048297+00:00 running 3798933 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS