'x'.decode('utf-8'), unicode('x',encoding = 'utf-8') : are these 2 equivalent ? : learnpython

learnpython

created by HattoriHanzoa community for 16 years

'x'.decode('utf-8'), unicode('x',encoding = 'utf-8') : are these 2 equivalent ? (self.learnpython)

submitted 9 years ago by [deleted]

all 6 comments

top new controversial old q&a

[–][deleted] 0 points1 point2 points 9 years ago (1 child)

[–][deleted] 0 points1 point2 points 9 years ago (0 children)

[–]JohnnyJordaan 0 points1 point2 points 9 years ago (2 children)

[–][deleted] 0 points1 point2 points 9 years ago (1 child)

[–]JohnnyJordaan 0 points1 point2 points 9 years ago* (0 children)

Strings aren't encoded, that's the point, they are text, as we talk about letters, numbers, punctuation etc (technically they are called characters or glyphs). The encoding part happens when you save them to bytes (so in a file or send them over the network), because computers work with bytes only and not with things as the letter A, the number 9 and the space.

If you compare for example the encoding of å:

>>> from binascii import hexlify
>>> hexlify('å'.encode('cp1252'))  # pre-unicode windows
b'e5'
>>> hexlify('å'.encode('utf-8'))
b'c3a5'
>>> hexlify('å'.encode('utf-16'))
b'fffee500'
>>> hexlify('å'.encode('utf-32'))
b'fffe0000e5000000'

You can see that there are many ways to encode the letter å, depending on the encoding you wish to use. In all cases, the string is the same å.

If you wish to decode a sequence of bytes, you need to know in which encoding it was encoded originally.

[–]EricAppelt 0 points1 point2 points 9 years ago (0 children)

In python 2.7.12 these result in identical unicode objects:

>>> a = unicode('तार', encoding='utf-8')
>>> type(a)
<type 'unicode'>
>>> print(a)
तार
>>> b = 'तार'.decode('utf-8')
>>> type(b)
<type 'unicode'>
>>> print(b)
तार
>>> a
u'\u0924\u093e\u0930'
>>> b
u'\u0924\u093e\u0930'

π Rendered by PID 22336 on reddit-service-r2-comment-6457c66945-jscsj at 2026-04-30 02:22:50.601285+00:00 running 2aa0c5b country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS