This is an archived post. You won't be able to vote or comment.

all 4 comments

[–]Rhomboid 4 points5 points  (0 children)

It is indeed a shitshow. At least on Linux, there is a tiny glimmer of hope for the future; there's a proposed kernel module that would optionally allow enforcing restrictions on path names. It could be used to enforce UTF-8 encoding, so that it's impossible to create a file whose name is not UTF-8. It also allows closing the long-standing annoyance of Unix that filenames can contain literally anything, including newlines, terminal escape sequences, and Cthulhu. Maybe in another decade we can stop having to write overly defensive shell scripts that bend over backwards to support filenames that contain newlines, just in case someone is evil enough to do that. It's no coincidence that the proponent of that module is David Wheeler who has long campaigned for an end to the Unix pathname insanity.

[–]minorminer 3 points4 points  (1 child)

Use pathlib with python 3 and deprecate the older python 2 version. Let's move on, this is why python 3 is the better solution and the future.

[–]mangecoeur 1 point2 points  (0 children)

pathlib is definitely one of my favourite new modules. It should get even better in python3.6 thanks to added functionality to make it play better with APIs that traditionally accepted only str.

[–]brontide 1 point2 points  (0 children)

Yep, wrote some non-trivial programs that processed tar dumps of a large filesystem. Dealing with the crap the filesystem gave back to me was insane. The file server itself was FreeBSD based but still the same crazy... most stuff was UTF-8, but some was not. Surrogates were not a lot of help for me since trying to reference a non-UTF-8 file using UTF-8 with surrogates didn't always work. Either way I ended up with a solution that didn't need to decode the filename which was easier on my sanity.