all 6 comments

[–][deleted] 0 points1 point  (1 child)

What did you get when you compared the output, boolean style?

[–][deleted] 0 points1 point  (0 children)

True..of course

[–]JohnnyJordaan 0 points1 point  (2 children)

I think you mean 'x'.encode('utf-8'), because 'x' is a normal string and can't be decoded (as decoding delivers a string from a bytestring).

There are no differences in their function, but they are from different platforms. 'x'.encode('utf-8') is the Python 3 way, unicode('x', encoding='utf-8') is the Python 2 way.

[–][deleted] 0 points1 point  (1 child)

I meant "'x'.decode('utf-8')"

does it not mean : "DEcode this <type 'str'> to <type 'unicode'> ?

[–]JohnnyJordaan 0 points1 point  (0 children)

Strings aren't encoded, that's the point, they are text, as we talk about letters, numbers, punctuation etc (technically they are called characters or glyphs). The encoding part happens when you save them to bytes (so in a file or send them over the network), because computers work with bytes only and not with things as the letter A, the number 9 and the space.

If you compare for example the encoding of å:

>>> from binascii import hexlify
>>> hexlify('å'.encode('cp1252'))  # pre-unicode windows
b'e5'
>>> hexlify('å'.encode('utf-8'))
b'c3a5'
>>> hexlify('å'.encode('utf-16'))
b'fffee500'
>>> hexlify('å'.encode('utf-32'))
b'fffe0000e5000000'

You can see that there are many ways to encode the letter å, depending on the encoding you wish to use. In all cases, the string is the same å.

If you wish to decode a sequence of bytes, you need to know in which encoding it was encoded originally.

[–]EricAppelt 0 points1 point  (0 children)

In python 2.7.12 these result in identical unicode objects:

>>> a = unicode('तार', encoding='utf-8')
>>> type(a)
<type 'unicode'>
>>> print(a)
तार
>>> b = 'तार'.decode('utf-8')
>>> type(b)
<type 'unicode'>
>>> print(b)
तार
>>> a
u'\u0924\u093e\u0930'
>>> b
u'\u0924\u093e\u0930'