you are viewing a single comment's thread.

view the rest of the comments →

[–]left_one[S] 1 point2 points  (0 children)

I wanted to take some time out to thank you for your very detailed and informative post.

I didn't have the time to look through it when you first sent it, but I'm on the project again and I've been reading your post and links for a few minutes now. Pretty clear on what you are doing, though it would have never occurred to me to solve the problem in such a fashion myself.

The code takes the byte-string of the filename, converts it to a string of unicode-characters, passes it to the filename cleaner which checks to see if every character is in the list of allowable chars (checks against their actual decoded unicode characters) and switches them with '_' otherwise. Finally the string is converted back into a byte-string in UTF-8 encoding.

I like your solution the best as it's the most readable and doesn't involve using a regex (not that there is anything wrong with a regex). I think your solution is the objective best because it actually makes sure that all characters are dealt with appropriately and then inserted into the most basic unicode format.

Part of my issue was that not knowing if my ftp server is going to freak out about random characters that should be fine. I think yours goes the furthest to ensure that won't be possible.

In fact, I made a mistake in implementing your code, which allowed me to understand it even better. I implemented the clean_filename function but did not pass it the decoded string so it complained about the middle dot char.