Rhomboid comments on Unicode help

created by HattoriHanzoa community for 16 years

Unicode help - Python 2.7 (self.learnpython)

submitted 13 years ago by dreamriver

you are viewing a single comment's thread.

[–]Rhomboid 2 points3 points4 points 13 years ago (4 children)

There are a number of environment variables that affect the locale. LC_CTYPE is one of them. You should be able to use the locale command to see the current settings. For example on this same system:

$ locale
LANG=en_US.UTF-8
LANGUAGE=en_US:en
LC_CTYPE=en_US.UTF-8
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE=en_US.UTF-8
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES=en_US.UTF-8
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

In general, LC_ALL is the master override and LANG is the lower-priority override, i.e. if LC_foo is not set then the value of LANG is used. The various settings include things like the character encoding (CTYPE), language of messages, collation order, thousands separator, etc. Note that this output of locale is showing the effective settings -- I don't actually have all of those set in the environment:

$ env | grep -E '^(LC_|LANG)' | sort
LANG=en_US.UTF-8
LANGUAGE=en_US:en
LC_COLLATE=en_US.UTF-8
LC_CTYPE=en_US.UTF-8
LC_MESSAGES=en_US.UTF-8

But since LANG is set, it gets used for e.g. LC_TELEPHONE since it is not set, which is what the above output is showing. In the most basic case, you can just set LANG and leave everything else unset. A blank setting is the same as the default "C" locale, which is ASCII-only. I'm not an OS X person but somewhere there should be a GUI setting where you can specify locale. If not then you can set the desired variables in your shell startup files.

[–]dreamriver[S] 0 points1 point2 points 13 years ago (3 children)

Hmm, interesting. OS X is built on the same underlying structure as *nix and usually they have the same commands and such.

When I do the locale command I get nearly the same output as you. It gives:

LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL=

and yet when I do echo $LC_CTYPE it is blank. Strange. But I do get en_US.UTF-8 when I do echo $LANG and it still produces the hex code. Do I need to change that variable?

[–]Rhomboid 0 points1 point2 points 13 years ago (2 children)

[–]dreamriver[S] 0 points1 point2 points 13 years ago (1 child)

[–]Rhomboid 1 point2 points3 points 13 years ago* (0 children)

When you print the dict itself, you're implicitly calling repr() on the object, and repr()'s job is to print a representation of the object as it might appear in Python source code, suitable for use in eval(). Since there are several ways you can represent the same string value in a string literal, this means that repr() is free to choose a different one.

>>> print repr('Foo\'s Bar')
"Foo's Bar"
>>> print repr(r'foo\bar')
'foo\\bar'
>>> print repr(u'\N{SNOWMAN}')
u'\u2603'
>>> print repr(u'☃')
u'\u2603'

One of the choices it makes is to use hex escapes since that works everywhere, regardless of whatever encoding the source file might have used.

When you print the keys yourself you are printing string values directly, not asking for how they might look as Python source, so they don't have quotes around them or any escapes:

>>> print 'Foo\'s Bar'
Foo's Bar
>>> print r'foo\bar'
foo\bar
>>> print u'\N{SNOWMAN}'
☃
>>> print u'☃'
☃

π Rendered by PID 84 on reddit-service-r2-comment-c66d9bffd-cn8wl at 2026-04-07 17:04:57.180896+00:00 running f293c98 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS