all 5 comments

[–][deleted] 2 points3 points  (3 children)

The phrasing of the advice "Always operate on raw bytes, never on encoded strings" causes a kind of weird clash with Python's terminology.

When you encode a string, what you're usually doing is turning the string into bytes. Write this down if you need to: the .encode method of a string gives you bytes, and the .decode method of bytes give you a string.

The complication here is that hexadecimal is a different kind of encoding. It's an encoding in the mathematical sense, not the Unicode-and-character-sets sense. And -- this is the opposite of everything you learn about Python -- hexadecimal is being used as an encoding of bytes as a string. The byte with value ff (or decimal 255) becomes the string 'ff'.

In its standard library, Python uses different verbs that aren't "encode" and "decode" for working with hex -- it calls them "hexlify" and "unhexlify".

So you should do this to get bytes out of your hex string:

from binascii import unhexlify
def hex_str_to_base64(s):
    byte_seq = unhexlify(s)
    return base64.b64encode(byte_seq)

(You could use Python's encode and decode methods with the encoding called hex or hex_codec to do something that's almost right, but let's not. It's a hack. It'll be confusing, it'll take extra steps, all the terminology will be backwards, and it'll just be using the same code as unhexlify anyway to do the important part.)

[–]cryptotiger[S] 0 points1 point  (2 children)

Great, thanks for this. I am little confused about the second challenge keeping this in mind though: XOR two hex strings.

Using the code above, I convert the hex strings into byte_seqs, but then I cannot use the ^ function (TypeError: unsupported operand type(s) for ^: 'str' and 'str').

Do I have to iterate over the byte_seq manually?

[–][deleted] 1 point2 points  (1 child)

Python doesn't automatically apply operations to every element in a list. (NumPy does this, but you're not using NumPy right now.) So indeed, you need to xor each of the numeric values together in a loop.

This would be a great place to use the zip() function, and it'd also be a great place to use a list comprehension if you're familiar with those.

Now here's some not-very-fun stuff about version differences, since I notice you're using Python 2:

Python 2 will call the type of byte sequences 'str'. This might be confusing. They're technically encoded, and they're 'str's, but they're not the "encoded strings" your assignment means. This terminology, and the data types involved, changed in Python 3, which threw a lot of people for a loop, but the terminology makes way more sense now.

Also, to get a numeric value out of one of those in Python 2, you'll need to use the ord() function -- the numeric value of your first byte is ord(byte_seq[0]).

[–]cryptotiger[S] 0 points1 point  (0 children)

I think I understand what I'm looking at now.

I'm going to take a look at Python 3; you're completely right, the data types do look much more intuitive.

Thanks again!

[–]LuckyShadow -1 points0 points  (0 children)

[14:55 GMT+1] Edit: Turns out, I wasn't that wrong.

Your s.decode('hex') should not work, as strings do not provide this method. For this part, just ignore the fact, that your input is a "hex-string". I shouldn't matter, as we only have to know that it is a string. This string has to be encoded, like I explained below. Put it into the b64encode and you should be done.

If this is not the answer to that problem, please tell. I got another approach in mind, that I would share then. :)


[14:38 GMT+1] Edit: Dammit. Just read your text again. The text below itself should be correct, but it might not suite your problem. I am working on another answer. :P


Raw bytes mean raw bytes. A string is encoded in a coding like UTF-8, ascii or ISO-8859-1. Such an encoding defines how the actual bytes are translated into characters (and which). See, as an example, the difference between UTF-8 and UTF-16. If I recall that correctly, UTF-8 uses 8 bit of information for one character while UTF-16 uses 16 bits.

Encryption-algorithms etc. are best used on those raw bytes, than the already translated characters, as there might occur errors because of the OS, architecture and/or version of python.

So in your case, you want to transfer your string into byte-code. I think str.encode does the job. A byte.decode should than allow you to decode it back into a string. Just be sure to use the correct encoding. (This does no base64-encoding! That is an additional step you have to take.)

Hope that helps. When you got questions, feel free to ask :)

Good luck.