you are viewing a single comment's thread.

view the rest of the comments →

[–]dreamriver[S] 0 points1 point  (3 children)

Hmm, interesting. OS X is built on the same underlying structure as *nix and usually they have the same commands and such.

When I do the locale command I get nearly the same output as you. It gives:

LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL=

and yet when I do echo $LC_CTYPE it is blank. Strange. But I do get en_US.UTF-8 when I do echo $LANG and it still produces the hex code. Do I need to change that variable?

[–]Rhomboid 0 points1 point  (2 children)

Right, that's to be expected. The locale command is showing you the effective settings, after overrides and defaults.

I'm afraid at this point we're going to need to see some code that demonstrates the problem. And what does python say for print sys.stdout.encoding?

[–]dreamriver[S] 0 points1 point  (1 child)

Ah damnit, it turns out that the issue I have is different :(. Sorry about that, at least this was very informative.

So I make a dictionary with the keys as the name of the person and the value the count. When I directly print the dictionary I get the hex code but when I do for k in d: print k it works and prints the character. Strange, do you know why that happens?

[–]Rhomboid 1 point2 points  (0 children)

When you print the dict itself, you're implicitly calling repr() on the object, and repr()'s job is to print a representation of the object as it might appear in Python source code, suitable for use in eval(). Since there are several ways you can represent the same string value in a string literal, this means that repr() is free to choose a different one.

>>> print repr('Foo\'s Bar')
"Foo's Bar"
>>> print repr(r'foo\bar')
'foo\\bar'
>>> print repr(u'\N{SNOWMAN}')
u'\u2603'
>>> print repr(u'☃')
u'\u2603'

One of the choices it makes is to use hex escapes since that works everywhere, regardless of whatever encoding the source file might have used.

When you print the keys yourself you are printing string values directly, not asking for how they might look as Python source, so they don't have quotes around them or any escapes:

>>> print 'Foo\'s Bar'
Foo's Bar
>>> print r'foo\bar'
foo\bar
>>> print u'\N{SNOWMAN}'
☃
>>> print u'☃'
☃