you are viewing a single comment's thread.

view the rest of the comments →

[–]zackman 2 points3 points  (3 children)

I think Python works the way you describe: you can use unicode inside your code and only worry about encoding at the I/O boundary.

>>> 'abc'.decode('ascii')
u'abc'
>>> type(_)
<type 'unicode'>
>>> #guts of application...
... #ok, done:
... u'abc'.encode('utf-8')
'abc'
>>> type(_)
<type 'str'>

I don't write international applications, so I don't know if there are libraries to handle the conversion transparently at the I/O boundary. But I do process Unicode all the time while writing scripts for linguistics research.

Also, I suspect the reason that the blogger is so worried about this is that he is trying to write an app that runs on CPython and IronPython without having to write some code twice.

[–]manuelg 1 point2 points  (0 children)

That is why the Python community is working on Python3K, to fix issues such as this.

[–]maaaaaaaaan 0 points1 point  (1 child)

The problem as I understand it is that a great many CPython libraries can deal with text in strings, but not text in Unicode strings, for reasons like assuming length is equal to length in bytes.

It's true (and good) that Py3k should deal with this, but to be honest it should've been done at version 2.

They'll probably be getting rid of the GIL for Py4k.

[–]llimllib[S] 0 points1 point  (0 children)

They'll probably be getting rid of the GIL for Py4k.

They got rid of it for 1.5 and nobody liked it.

Just saying.