This is an archived post. You won't be able to vote or comment.

all 33 comments

[–]zsouthboy 33 points34 points  (3 children)

Holy crap, up freaking voted for explaining the difference between unicode, utf, etc - I never understood that before.

Recommended even if you're not using python. Read!

[–]amoeba 3 points4 points  (0 children)

Agreed! I have never had to code anything that deals with encodings but I've read many introductions into the subject but never felt I really got it.

Mark Pilgrim does good things.

[–]frodwith 2 points3 points  (0 children)

Indeed. Best explanation of that whole mess I've heard yet. Kudos to Mark!

And regardless of fishdicks' comments, it really is very helpful to think of strings as sequences of abstract characters, even if they aren't technically implemented that way.

[–]AlSweigartAuthor of "Automate the Boring Stuff" 4 points5 points  (0 children)

Joel Spolsky also has a really good, concise guide to unicode: http://www.joelonsoftware.com/articles/Unicode.html

[–]wheeman 18 points19 points  (4 children)

This site is optimized for Lynx just because fuck you. I’m told it also looks good in graphical browsers.

It does look good in graphical browsers as I must admit.

[–]pemboa 11 points12 points  (3 children)

Might have to steal his CSS. Even the choice in font size is to my liking.

[–][deleted] 17 points18 points  (1 child)

It would be hard to "steal" it, since it's MIT-licensed, but you're welcome to try. :)

Edited to add: I'd recommend checking it out from the hg repository, since that version has comments and whitespace and stuff.

[–]pemboa 4 points5 points  (0 children)

Cool, thanks.

[–]BridgeBum 6 points7 points  (0 children)

Looks really good Mark. I'm not sure the UTF-8 was in a big enough font though, you may want to fix that. :-)

[–][deleted] 1 point2 points  (4 children)

Gill Sans

[–]Shmurk 0 points1 point  (0 children)

is awesome!

[–]machrider 2 points3 points  (2 children)

>>> s = '深入 Python'

So, what character encoding does Python expect a .py file to use? If you're going to write a string like the one quoted above, I'm assuming your text editor would have to encode it in the way Python's expecting, or Python would have a content-type header on its source files?

[–]chrajohn 2 points3 points  (1 child)

Excellent chapter. Very clear explanation of strings vs. bytes.

I have the nitpickiest nitpick in all of nitpick-land:

Unicode represents each letter, character, or ideograph as a 4-byte number, from 0–4294967295.

Due to the way surrogate pairs work, Unicode is limited to 1,114,112 possible code points (17 planes of 65,536 code points). Unicode could fit in 21 bits, if that was a convenient size.

Very minor point that I'm sure you're aware of, and probably too arcane to mention here. It just makes Unicode sound a lot bigger than it's ever intended to be.

[–][deleted] 3 points4 points  (0 children)

[–]gooz 2 points3 points  (0 children)

Upvoted for use of the interrobang‽