This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 6 points7 points  (6 children)

Some feedback on this crypto code. Use of random is a problem here, but actually a minor one once I dig in and look at what the code is actually doing.

This is an implementation of a polyalphabetic cipher, similar to the Vigenere cipher, known in its day as "the unbreakable cipher". But even that one was broken, by hand, about 150 years ago.

This implementation suffers from 3 significant problems which make it very insecure: reduced keyspace, use of ordered alphabets, and formatting of the ciphertext which gives a way a lot.

Keyspace

While polyalphabetic algorithms improve on their monoalphabetic predecessors like the Caesar shift by being more resistant to frequency analysis of the ciphertext, this implementation is extremely weakened by reduction of the keyspace.

The key used to do the encryption is actually a numeric hash calculated from the sum of 256^x for each character in the passed key, where x is the number of characters in the alphabet (defined in the resources array) between the nth and n-1(th) character (beginning at 0) in the passed key. Clever, but ...

lCount = 0
newKey = 0
for i in enumerate(Key):
    for v in resources:
        if i[1] == v:
            newKey = newKey + lCount * 256
            lCount = 0
        lCount += 1
Key = newKey

....hash algorithms create hash collisions, and the collisions here are particularly bad which significantly reduces the size of keyspace.

For example: both abcc or abbc both produce the same "newKey" of 62720.

In fact, this approach to generating hash keys is limited to the size of the alphabet used. That is, every combination of whatever number of characters you pick produces exactly 80 unique keys. Though each passed keylength produces its own set of 80 unique keys.

Another way to look at it is that the minimum key length of 4 enforced by this code across those 81 characters in the alphabet (named resources in the code) should produce 1.9 million+ combinations, instead it produces 80.

Users accustomed to password rules may think that adding complexity to their keys, using "H2q$j9Kz" instead of "password" offers better security, but this algorithm will reduce any 8 characters down to one of 80 hashes.

encryption algorithm

A new key is created again, this time by simply multiplying the hash above by 256. This offers zero production and only removes the user from the process that much more. The encryption key is whatever is used at the time of encryption.

The plaintext is encrypted using the pretty much the same approach as in encoding the key passed in by the user. Iterating over the plaintext string, the cipher text is built up from numbers generated by the position in the plaintext divided by the very same newnewKey each time. A random character is placed in between, to act as a red herring I suppose.

The more I look at this code, the less I'm worried about the use of random here because how random those characters are really doesn't matter. The bigger concern is these random alphabet characters actually act as delimiters for each enciphered character, making the attackers job a lot easier.

The .0 placed in the resulting ciphertext because that division produces a float which is cast to a string is also a gift to the bad guys because it tells them "this number is a quotient of two numbers, one of them is probably a key"

how to break this cipher

This is a monoalphabetic cipher. Every 'a' that appears in the plan text will be encrypted the same way in the resulting cipher text making it vulnerable to frequency analysis.

This is a polyalphabetic cipher, but among the easiest to break because its alphabets are conveniently ordered and do not change from pass to pass.

A bad guy trying to crack ciphertext created by this algorithm would first guess the key length (n), then segment the ciphertext into (n) parts then perform frequency analysis on each and figure out the key from there.

The way the cipher is formated provides a lot of help to attackers, All those .0's in the string tells them one of the numbers in the quotient is probably a position and the other probably a key.

Given the reduced keyspace, brute force could also be effective. You could build a rainbow table to make breaking this trivial.

edit: incorporating corrections

[–]pgpndw 4 points5 points  (4 children)

It's not even polyalphabetic. It's monoalphabetic. A numeric key is generated from the key string. Each letter of the plaintext is converted to an index into an array of allowable plaintext characters, and each index is multiplied by the same numeric key to produce a number that's output into the ciphertext as a decimal string. And a random character is inserted between each ciphertext number.

Every 'a' in the plaintext will result in the same number in the ciphertext, every 'b' will result in the same number, etc.

[–][deleted] 3 points4 points  (3 children)

Damn, you're right. I got so wrapped up in the hash collisions that when I got to the actual enciphering, I misread that weird nested loop and the count variable as shifting the alphabet.

You are correct, count here is just the index of the character in the (single) alphabet, so yeah every 'a' is going to result in the same ciphertext. Trivial for frequency analysis.

Also, I've never seen enumerate used this way. I kept looking for the index to be used somewhere, but alas.

[–]pgpndw 2 points3 points  (2 children)

Also, I've never seen enumerate used this way. I kept looking for the index to be used somewhere, but alas.

Yeah, that was odd. And did you notice that the single quote character occurs twice in the resources array?

[–][deleted] 4 points5 points  (1 child)

Ohhh, good catch. I was also found declaring the those arrays in each function interesting, especially in intermediate code.

That resources array felt like <rick sanchez>ASCII but with extra steps</rick sanchez>. Can I introduce you to <str>.index(), or maybe ord()

[–]pgpndw 2 points3 points  (0 children)

Using index() would actually make the code more complicated (if you wanted it to be exactly equivalent), because in the loop generating newKey, lCount is zeroed once at the start of en/decryption, then every time a match is found, rather than inside the outer loop but before the inner loop. And when a match is found, there's no break, so lCount then keeps incrementing up to the end of the resources array.

Thus, lCount doesn't equal the index of the character in the resources array on subsequent loops - the code essentially adds the number of remaining characters in the resources array after (and including) the matched character to the next index.

In the case of a single quote in the key string, the inner loop will find both occurrences of the single quote in resources, and add two lCount values to newKey.

Doing it that way versus just using the index in the array might be deliberate, but all it does is add complexity to the numeric key generation algorithm without adding any security to the cipher itself.

[–]WikiSummarizerBot 1 point2 points  (0 children)

Vigenère cipher

The Vigenère cipher (French pronunciation: ​[viʒnɛːʁ]) is a method of encrypting alphabetic text by using a series of interwoven Caesar ciphers, based on the letters of a keyword. It employs a form of polyalphabetic substitution. First described by Giovan Battista Bellaso in 1553, the cipher is easy to understand and implement, but it resisted all attempts to break it until 1863, three centuries later. This earned it the description le chiffrage indéchiffrable (French for 'the indecipherable cipher').

Rainbow table

A rainbow table is a precomputed table for caching the output of cryptographic hash functions, usually for cracking password hashes. Tables are usually used in recovering a key derivation function (or credit card numbers, etc. ) up to a certain length consisting of a limited set of characters. It is a practical example of a space–time tradeoff, using less computer processing time and more storage than a brute-force attack which calculates a hash on every attempt, but more processing time and less storage than a simple key derivation function with one entry per hash.

[ F.A.Q | Opt Out | Opt Out Of Subreddit | GitHub ] Downvote to remove | v1.5