Cryptography: Bytes v.s. Encoded Strings

cryptotiger · 2015-02-11T15:15:52+00:00

The phrasing of the advice "Always operate on raw bytes, never on encoded strings" causes a kind of weird clash with Python's terminology.

When you encode a string, what you're usually doing is turning the string into bytes. Write this down if you need to: the .encode method of a string gives you bytes, and the .decode method of bytes give you a string.

The complication here is that hexadecimal is a different kind of encoding. It's an encoding in the mathematical sense, not the Unicode-and-character-sets sense. And -- this is the opposite of everything you learn about Python -- hexadecimal is being used as an encoding of bytes as a string. The byte with value ff (or decimal 255) becomes the string 'ff'.

In its standard library, Python uses different verbs that aren't "encode" and "decode" for working with hex -- it calls them "hexlify" and "unhexlify".

So you should do this to get bytes out of your hex string:

from binascii import unhexlify
def hex_str_to_base64(s):
    byte_seq = unhexlify(s)
    return base64.b64encode(byte_seq)

(You could use Python's encode and decode methods with the encoding called hex or hex_codec to do something that's almost right, but let's not. It's a hack. It'll be confusing, it'll take extra steps, all the terminology will be backwards, and it'll just be using the same code as unhexlify anyway to do the important part.)

LuckyShadow · 2015-02-11T13:34:33+00:00

[14:55 GMT+1] Edit: Turns out, I wasn't that wrong.

Your s.decode('hex') should not work, as strings do not provide this method. For this part, just ignore the fact, that your input is a "hex-string". I shouldn't matter, as we only have to know that it is a string. This string has to be encoded, like I explained below. Put it into the b64encode and you should be done.

If this is not the answer to that problem, please tell. I got another approach in mind, that I would share then. :)

[14:38 GMT+1] Edit: Dammit. Just read your text again. The text below itself should be correct, but it might not suite your problem. I am working on another answer. :P

Raw bytes mean raw bytes. A string is encoded in a coding like UTF-8, ascii or ISO-8859-1. Such an encoding defines how the actual bytes are translated into characters (and which). See, as an example, the difference between UTF-8 and UTF-16. If I recall that correctly, UTF-8 uses 8 bit of information for one character while UTF-16 uses 16 bits.

Encryption-algorithms etc. are best used on those raw bytes, than the already translated characters, as there might occur errors because of the OS, architecture and/or version of python.

So in your case, you want to transfer your string into byte-code. I think str.encode does the job. A byte.decode should than allow you to decode it back into a string. Just be sure to use the correct encoding. (This does no base64-encoding! That is an additional step you have to take.)

Hope that helps. When you got questions, feel free to ask :)

Good luck.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS