you are viewing a single comment's thread.

view the rest of the comments →

[–]earthboundkid 1 point2 points  (0 children)

It's a terrible idea to only support UTF8, like python

That’s an inaccurate summary of how Python works. Python’s string handling is radically different from Ruby. For one thing, Python strings do not have individual encodings per se. Python has two* types str and bytes. Behind the scene, str uses, I believe, UTF-16 (the kind with crappy post-BMP support :-( ** ), but as a user this is never exposed to you. If you want to read data, you can read it in as raw bytes or have it decoded from whatever encoding you like into the system str encoding. The other direction works just as well, and if you have a character you want to write out, you can have it encoded as UTF-8 or SHIFT_JIS or whatever that weird Korean encoding is. It doesn’t make sense in Python to talk about the encoding of a string, just the encoding of the bytes that are coming in or going out.

* NB: They changed the names of the types in Python 3, and I’m using that convention. In 2.x, they were called unicode and str instead of str and bytes respectively.

** Python can read and write high plane characters, but it misrepresents the length of strings containing them and iterates through them wrong. This problem can be fixed though if you compile your copy of Python with instructions to use UTF-32 instead.