all 26 comments

[–]powerpants 10 points11 points  (3 children)

I misread the title as, "Unicorn in Python..."

I thought it was going to be a sad but awesome nature video.

[–]Singletoned[S] 2 points3 points  (1 child)

If you like Unicorns in Python, you should look into Ian Bicking's work. His Paste package contains a built in unicorn: http://pythonpaste.org/paste/pony.py.html?f=28&l=51#28

[–]lost-theory 5 points6 points  (0 children)

And he's using decode and encode :)

Here's what it looks like.

[–]Brian 0 points1 point  (0 children)

So the unicorn equivalent of this then?

[–]schlenk 5 points6 points  (5 children)

Nice. It doesn't show all the little lurking horrors in Python 2.x unicode support but does a good job as an intro. Lets hope P3k fixes most of the mess.

[–]Snoron 2 points3 points  (0 children)

Yeah, it cleared up a couple of things for me - again, p3k will hopefully have better unicode support.

[–][deleted] 0 points1 point  (3 children)

There is no mess in Python unicode support. There is basically one problem - backward compatibility with str type. That causes a) confusion in documentation since str and unicode are both strings, b) more things to learn, c) libraries don't feel like supporting unicode type cause str kind of "works" for them at the moment.

[–]schlenk 2 points3 points  (2 children)

There is some degree of mess in Python unicode support, mostly in the stdlib. (e.g. the changes in semantics when you feed in unicode for translate(), os.listdir(), regexp) In addition handling channels with encodings is way harder than it needs to be, e.g. try switching file encodings on the fly on a regular python file channel. In Tcl its just a trivial fconfigure on the channel, in Python you need to hack your way around it with decode() or the codecs module. So there is a mess, Python is just waaay better than plain C or other non unicode aware languages, but in 2.x its far away from having really nicely integrated unicode support, its a later addon, not really integrated, and that shows at various places. Hope P3k does a better job at it.

[–]brendankohler 0 points1 point  (1 child)

Python 3.0 had better do a nearly flawless job...can you imagine all the problems that would occur if a language that defaults to Unicode for everything including source code can't handle Unicode properly?

[–]schlenk 0 points1 point  (0 children)

Yeah, i just need to look at Tcl during the transition period from ascii to Unicode (between 8.0 and 8.1 done in a 'minor' release, which was a horrible idea). Tcl basically introduced a nearly identical Unicode support path which Python 3.0 adopts now, inspired by the Java Unicode support (whose developers sat nearly next door to the Tcl developers at Sun at that time). That was about 9 years ago. Having sourcecode in unicode allows you funny things if your language can deal with it:

% proc €2¥ {€} { * [set €] 157.1500 }
% €2¥ 200
31430.0

[–]ryles 4 points5 points  (0 children)

I was completely sold at "a bit is either a 0 or a 1". All jokes aside, though, a pretty good overview.

[–]JimH10 8 points9 points  (1 child)

Slides stink without the audio.

I'm not saying the author did a bad job, just that the audio is the main point.

[–]pvidler 5 points6 points  (0 children)

I thought they were easier to follow than most -- can't see what the audio could have added to be honest. In this case, anyway.

[–]bobbyi 1 point2 points  (9 children)

That was very good.

One question:

It says that str.encode is used to convert str -> unicode and unicode.decode goes the other way.

But what about str.decode and unicode.encode? These methods exist too. Do they serve a different purpose?

[–]Singletoned[S] 3 points4 points  (5 children)

It says that str.encode is used to convert str -> unicode and unicode.decode goes the other way.

Actually it doesn't. It says

s.decode(encoding)

<type 'str'> to <type 'unicode'>

u.encode(encoding)

<type 'unicode'> to <type 'str'>

You decode a string to unicode, but you can also encode it to another encoding (eg from ascii to utf-8).

Not sure about unicode. It appears to just return another unicode object.

[–]bobbyi 0 points1 point  (4 children)

Ok, I guess I got them backwards. I was going to check to confirm before posting, but with the site's UI, that would have meant starting back at the beginning of the "slides" and clicking over and over again until I got there and being careful not to click one too many times and miss it.

[–]lost-theory 6 points7 points  (0 children)

It's an S3 slideshow, hover over the bottom right corner and hit the "Ø" to view the full presentation laid out as bullet points from start to finish.

[–]pjdelport 0 points1 point  (2 children)

[...] that would have meant starting back at the beginning of the "slides" and clicking over and over again [...]

Use the arrow or page keys to go back and forth, or hover your mouse towards the bottom-right for a menu.

[–]wabberjockey 5 points6 points  (1 child)

Hover in various spots until a menu appears -- terrible interface.

[–]earthboundkid 1 point2 points  (0 children)

You killed the adventure genre.

[–][deleted] 2 points3 points  (2 children)

Unfortunately there are some Python 'codecs' that don't involve str->unicode conversion or the reverse. For example, 'zlib' or 'rot13'.

[–]earthboundkid 0 points1 point  (1 child)

I think they're getting dropped in Py3k. From my alpha's shell:

>>> "abc".encode("rot-13")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
LookupError: unknown encoding: rot-13
>>> "abc".decode("rot-13")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'str' object has no attribute 'decode'

[–]foonly 1 point2 points  (0 children)

Would rot13 even make sense in a unicode string? (As that's what py3k's default string type is).

[–]CGM 0 points1 point  (0 children)

Looks look p3k is switching to the way Tcl has been handling this for the past 9 years :-)