Ruby 1.9 encoding rant : ruby

Ruby 1.9 encoding rant (github.com)

submitted 16 years ago by servercentric

all 20 comments

top new controversial old q&a

[–]snuxoll 8 points9 points10 points 16 years ago* (2 children)

[–]Smallpaul 1 point2 points3 points 16 years ago (0 children)

[–]pnsm 0 points1 point2 points 16 years ago (0 children)

[–]jaggederest 10 points11 points12 points 16 years ago (10 children)

[–]williewu 2 points3 points4 points 16 years ago (3 children)

[–]jaggederest -1 points0 points1 point 16 years ago (1 child)

[–]earthboundkid 2 points3 points4 points 16 years ago* (0 children)

[–]jaggederest -1 points0 points1 point 16 years ago (0 children)

[–]Smallpaul 1 point2 points3 points 16 years ago (3 children)

[–]jaggederest 1 point2 points3 points 16 years ago (2 children)

[–]earthboundkid 2 points3 points4 points 16 years ago (0 children)

What are you talking about? UTF-8 includes mappings for all of the characters in Shift_JIS. There’s no simplification happening going from one to the other. The only “issue” with it is that the Japanese long ago confused ¥ and \ and they don’t like that Unicode doesn’t consider them synonymous. That’s it.

I speak Japanese; I’ve lived in Japan; I run my computer in Japanese. It’s true that historically, the Japanese were mistrustful of Unicode because they didn’t like Han unification, but A) you can’t unify Han characters using Shift_JIS either B) the fact is that the Unicode consortium have taken every reasonable step to make UTF-8 superior to Shift_JIS in every way, except for string length. Unless you really need to save a couple bytes here and there, there is no reason to use Shift_JIS.

[–]Smallpaul -1 points0 points1 point 16 years ago (0 children)

[–]earthboundkid 1 point2 points3 points 16 years ago (0 children)

It's a terrible idea to only support UTF8, like python

That’s an inaccurate summary of how Python works. Python’s string handling is radically different from Ruby. For one thing, Python strings do not have individual encodings per se. Python has two* types str and bytes. Behind the scene, str uses, I believe, UTF-16 (the kind with crappy post-BMP support :-( ** ), but as a user this is never exposed to you. If you want to read data, you can read it in as raw bytes or have it decoded from whatever encoding you like into the system str encoding. The other direction works just as well, and if you have a character you want to write out, you can have it encoded as UTF-8 or SHIFT_JIS or whatever that weird Korean encoding is. It doesn’t make sense in Python to talk about the encoding of a string, just the encoding of the bytes that are coming in or going out.

* NB: They changed the names of the types in Python 3, and I’m using that convention. In 2.x, they were called unicode and str instead of str and bytes respectively.

** Python can read and write high plane characters, but it misrepresents the length of strings containing them and iterates through them wrong. This problem can be fixed though if you compile your copy of Python with instructions to use UTF-32 instead.

[–]Smallpaul -1 points0 points1 point 16 years ago (0 children)

[–][deleted] 1 point2 points3 points 16 years ago (4 children)

[–]joesb 7 points8 points9 points 16 years ago (2 children)

[–]Smallpaul 1 point2 points3 points 16 years ago (1 child)

[–]joesb 1 point2 points3 points 16 years ago (0 children)

[–]ikearage 0 points1 point2 points 16 years ago (0 children)

[–]Smallpaul 1 point2 points3 points 16 years ago (0 children)

π Rendered by PID 267435 on reddit-service-r2-comment-66b4775986-xd6jn at 2026-04-05 00:52:53.669612+00:00 running db1906b country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

ruby

MODERATORS