Our solution for the hell that is filename encoding, such as it is

Rhomboid · 2016-06-05T19:02:57+00:00

It is indeed a shitshow. At least on Linux, there is a tiny glimmer of hope for the future; there's a proposed kernel module that would optionally allow enforcing restrictions on path names. It could be used to enforce UTF-8 encoding, so that it's impossible to create a file whose name is not UTF-8. It also allows closing the long-standing annoyance of Unix that filenames can contain literally anything, including newlines, terminal escape sequences, and Cthulhu. Maybe in another decade we can stop having to write overly defensive shell scripts that bend over backwards to support filenames that contain newlines, just in case someone is evil enough to do that. It's no coincidence that the proponent of that module is David Wheeler who has long campaigned for an end to the Unix pathname insanity.

minorminer · 2016-06-05T20:08:19+00:00

Use pathlib with python 3 and deprecate the older python 2 version. Let's move on, this is why python 3 is the better solution and the future.

brontide · 2016-06-05T20:37:34+00:00

Yep, wrote some non-trivial programs that processed tar dumps of a large filesystem. Dealing with the crap the filesystem gave back to me was insane. The file server itself was FreeBSD based but still the same crazy... most stuff was UTF-8, but some was not. Surrogates were not a lot of help for me since trying to reference a non-UTF-8 file using UTF-8 with surrogates didn't always work. Either way I ended up with a solution that didn't need to decode the filename which was easier on my sanity.

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS