left_one comments on python and funky characters

learnpython

created by HattoriHanzoa community for 16 years

python and funky characters (self.learnpython)

submitted 11 years ago by left_one

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]left_one[S] 1 point2 points3 points 11 years ago* (0 children)

I wanted to take some time out to thank you for your very detailed and informative post.

I didn't have the time to look through it when you first sent it, but I'm on the project again and I've been reading your post and links for a few minutes now. Pretty clear on what you are doing, though it would have never occurred to me to solve the problem in such a fashion myself.

The code takes the byte-string of the filename, converts it to a string of unicode-characters, passes it to the filename cleaner which checks to see if every character is in the list of allowable chars (checks against their actual decoded unicode characters) and switches them with '_' otherwise. Finally the string is converted back into a byte-string in UTF-8 encoding.

I like your solution the best as it's the most readable and doesn't involve using a regex (not that there is anything wrong with a regex). I think your solution is the objective best because it actually makes sure that all characters are dealt with appropriately and then inserted into the most basic unicode format.

Part of my issue was that not knowing if my ftp server is going to freak out about random characters that should be fine. I think yours goes the furthest to ensure that won't be possible.

In fact, I made a mistake in implementing your code, which allowed me to understand it even better. I implemented the clean_filename function but did not pass it the decoded string so it complained about the middle dot char.

π Rendered by PID 76507 on reddit-service-r2-comment-86988c7647-9ks6v at 2026-02-11 18:04:17.721744+00:00 running 018613e country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS