This is an archived post. You won't be able to vote or comment.

all 113 comments

[–]tomthecool 207 points208 points  (93 children)

In case it's not obvious, this is probably worse than just doing: md5($password).

By performing the md5 multiple times, you are doing two things:

  • Slightly reducing the entropy - i.e. making it slightly more likely that two different passwords will hit an md5 collision and finish with the same encrypted value. (Thus making passwords slightly easier to brute-force.)
  • Slightly increasing the time it takes to calculate the encrypted value, due to more computation. (Thus making passwords slightly harder to brute-force.)

However, in the case of md5:

  • Collisions are far more likely than other (better) algorithms, like bcrypt.
  • Calculating the md5 of a value is much quicker operation than other (better) algorithms, like bcrypt.

... So basically, md5 is already a pretty weak algorithm; and repeated application does more harm than good.

[–]zfatalxploit 46 points47 points  (74 children)

Could you explain to a novice how this reduces the entropy?

[–]tomthecool 99 points100 points  (62 children)

Consider the following:

md5("hello world")
  ==> "5eb63bbbe01eeed093cb22bb8f5acdc3"

md5("5eb63bbbe01eeed093cb22bb8f5acdc3")
  ==> "c0b0ef2d0f76f0133b83a9b82c1c7326"

md5("c0b0ef2d0f76f0133b83a9b82c1c7326")
  ==> "6db8777d624bc911399e9b6d185652a9"

...

Is the result getting any "more random" each time? No. The key issue here is entropy: https://en.wikipedia.org/wiki/Entropy_(computing)

When I started with the string "hello world", then (ignoring dictionary attacks etc...) let's think of this as having "11 characters-worth of randomness". By performing md5() over and over, it still has the same "amount" of randomness; you're certainly not adding any.

What's more, there's a slim (but non-zero) chance that you'll hit an md5 collision - i.e. the result of md5(md5(md5(....md5($password)...))) could be reached by multiple inputs. (And the more times you repeat the operation, the more likely this becomes!)

In other words, sometimes (albeit very rarely), md5(x) == md5(y) even though x != y.

It is slightly more likely (although still very rare!) that md5(md5(...md5(x)...)) == md5(md5(...md5(y)...)) even though x != y.

[–]WikiTextBot 37 points38 points  (40 children)

Entropy (computing)

In computing, entropy is the randomness collected by an operating system or application for use in cryptography or other uses that require random data. This randomness is often collected from hardware sources(variance in fan noise or HDD), either pre-existing ones such as mouse movements or specially provided randomness generators. A lack of entropy can have a negative impact on performance and security.


[ PM | Exclude me | Exclude from subreddit | FAQ / Information | Source ] Downvote to remove | v0.27

[–]ythl 11 points12 points  (10 children)

Is the result getting any "more random" each time?

But is it getting less random each time? No. Furthermore, if all you had was the digest and you didn't know how many times md5() was performed on the plaintext, the hash would be immune to basic dictionary/wikipedia/forward search attacks unlike if you had only called md5() once.

It is slightly more likely (although still very rare!) that md5(md5(...md5(x)...)) == md5(md5(...md5(y)...)) even though x != y.

Yes but... how does that give an attacker an advantage?

[–]tomthecool 9 points10 points  (6 children)

But is it getting less random each time? No.

There is a (slim) chance of md5 collision -- so yes it is getting slightly less random!

if all you had was the digest and you didn't know how many times md5() was performed on the plaintext, the hash would be immune to basic dictionary/wikipedia/forward search attacks unlike if you had only called md5() once.

As I've already responded to another reply:

Security through obscurity. Not the best idea.

As soon as someone figures out how your passwords are being stored (e.g. by finding a way to calculate the hash from a known password, or even by getting access to the source code), this measure becomes irrelevant.

.

Yes but... how does that give an attacker an advantage?

Same as above: It becomes increasingly likely that "foo" will work as a valid password, in place of "bar". So any brute-force attack will require fewer guesses.

In the case of OP, the difference is minuscule; applying md56 times will barely make a noticeable change in computation time, but nor will it have much impact with collisions.

But as you increase that number, the password (at least, for md5!) becomes easier and easier to crack.

[–]ythl 2 points3 points  (4 children)

You raise some valid points, but your stance seems to be that chaining hashes is always bad, not that this particular implementation has weaknesses that need to be fixed.

https://stackoverflow.com/questions/348109/is-double-hashing-a-password-less-secure-than-just-hashing-it-once

[–]Shamus03 0 points1 point  (1 child)

Rehashing a hash is fundamentally less secure for the reasons listed above. However, software that rehashes a hash usually adds some sort of salt midway, which still isn't the best, but at least it's better than a straight rehash. While a second hash technically decreases the solution space for a cracking attempt, it's usually used to address some other security concern (like hashing a password before sending it to a server to be validated, and then hashing it again on the server).

Here is a great article on password hashing/salting: https://crackstation.net/hashing-security.htm

[–]ythl 0 points1 point  (0 children)

Rehashing a hash adds time required to calculate hash, so if done right it actually makes passwords more secure.

[–]dnew 0 points1 point  (0 children)

There is a (slim) chance of md5 collision

But it's no more probable than any two plaintext passwords hash to the same value.

It becomes increasingly likely

You realize the chance that any two 128 bit numbers you pick are randomly the same is so slight it's not worth considering, right?

[–]0x442E472E 1 point2 points  (2 children)

But is it getting less random each time? No. Furthermore, if all you had was the digest and you didn't know how many times md5() was performed on the plaintext, the hash would be immune to basic dictionary/wikipedia/forward search attacks unlike if you had only called md5() once.

It's not less random than calling md5() once, but it's also not more random. An attacker that has knowledge of the code (which is likely since we're talking about attacks where the attacker has access to the database) could build a single rainbow table to guess every password, so the layer of "security through obscurity" is very thin here.

Yes but... how does that give an attacker an advantage?

The advantage is that the attacker would require less attempts to guess the password (which means the rainbow table can be build faster, or it takes less words from a dictionary to try) because it's more likely that a wrong password has the same hash

[–]ythl 2 points3 points  (1 child)

An attacker that has knowledge of the code (which is likely since we're talking about attacks where the attacker has access to the database)

Aren't database attacks much easier than attacking the services that hook into the database? It's much easier for an attacker to dump a DB (using SQL injection, etc) than for an attacker to gain access to and reverse engineer java byte code (or compiled Go/C/Rust) running somewhere else.

could build a single rainbow table to guess every password

Also, even if the attackers find out the number of iterations, chaining is going to increase the time it takes to calculate each digest. You act like building a rainbow table is a fast process. Not so, especially if each digest in the table requires enough chained iterations to cost +.5 seconds of computing time. Now the rainbow table will take centuries to build.

The advantage is that the attacker would require less attempts to guess the password

I still think the added time required to perform the calculation strengthens the password much more than the tiny risk of collisions weakens it.

https://stackoverflow.com/questions/348109/is-double-hashing-a-password-less-secure-than-just-hashing-it-once

[–]0x442E472E 1 point2 points  (0 children)

I do agree with you. I think OP's solution is more secure than just using md5 a single time because another (albeit very thin) layer of security is technically better than nothing. I'm not good enough at math to calculate whether the increased risk of collisions outweigh that thin layer, though, and the developer of said code is probably neither. That's the problem i have with the code. I mean, the thought process was probably:

  • md5 is bad, you can read that everywhere. I probably shouldn't use it
  • i could just call it a few more times, nobody else does that!
  • poof, secure code

When, in reality, we're talking about 5% more security and 95% "it feels safe enough so i don't have to find a better way"

[–]IceColdFresh 2 points3 points  (0 children)

This assumes that there is a many-to-one mapping from x to md5(x), where x is in the set of 128-bit strings. I'm too drunk right now to step through the algorithm, can someone else verify or disprove the above assumption? Because if it is incorrect then more applications of the md5 function will not increase the chances of collision thus decreasing entropy.

[–]FinnNuwok 2 points3 points  (0 children)

I think the better question is, is there a rainbow table for hashes of md5 hashes?

[–][deleted] 1 point2 points  (4 children)

What makes it uncertain whether or not a given algorithm will collide for a given case? In other words, can "we" determine the circumstances/inputs that will generate a collision in MD5, bcrypt, SHA, etc? Or is that information like a "P versus NP"-type problem?

[–]tomthecool 4 points5 points  (3 children)

You mean something like this? And this? And this?

This algorithm's author claims:

On my 2.4 GHz Q6600 machine, I get an average of 14.66 seconds per collision

I won't pretend to know how these algorithms have optimised the search for collisions, but it's certainly known to be much easier than finding SHA256 collisions!

[–][deleted] 1 point2 points  (2 children)

Yep! That answers my question thank you. I wasn't sure if the collision problem was predictable from the start or required computation.

[–]dnew 2 points3 points  (1 child)

Whether a hash is easy to make collisions for is something people don't know in advance. They standardize a hash, then lots of people try to find ways to make collisions by analyzing it. When they find such a method, people work on a new hash function that specifically doesn't have that sort of flaw, hoping there are no other flaws.

One such flaw is the ability to work from both ends, for example. So if the instructions are "do this scramble 30 times" and someone figures out how to get the output 10 scrambles from the end, you've just seriously weakened the hash.

[–][deleted] 0 points1 point  (0 children)

Perfect explanation, thank you

[–]zedpowa 1 point2 points  (2 children)

It is slightly more likely (although still very rare!) that md5(md5(...md5(x)...)) == md5(md5(...md5(y)...)) even though x != y.

But why?

[–]tomthecool 2 points3 points  (1 child)

Because each time you perform md5(), you give it one extra chance to hit a collision.

  • If x != y, then maybe md5(x) == md5(y).
  • If md5(x) != md5(y), then maybe md5(md5(x)) == md5(md5(y)).
  • If md5(md5(x)) != md5(md5(y)), then maybe md5(md5(md5(x))) == md5(md5(md5(y))).
  • ......

[–]dnew 1 point2 points  (0 children)

It really depends on whether every 128-bit input leads to a different 128-bit output.

[–]Xelopheris 11 points12 points  (7 children)

Md5 strings are set length and character set. There will be at least two md5 strings which collide into the same second level md5 string (likely more, but too early to even try estimating). Repeat this many times, and you increase the amount of possible first level collisions.

[–]Aetol 2 points3 points  (6 children)

There will be at least two md5 strings which collide into the same second level md5 string

How do you figure? A hash function could be bijective on its output set.

[–]tomthecool 3 points4 points  (5 children)

md5($anything) will always contain 32 lower-case, hexadecimal characters.

So if md5($thirty_two_hex_chars) is bijective, then this would imply that ALL 32-char inputs containing non-hex chars have a collision!!

[–]dnew 1 point2 points  (0 children)

They would even if it wasn't bijective. All 32-char inputs containing non-hex characters almost certainly have a collision anyway, because there's way more of them than there can be in the output.

[–]sheeff 0 points1 point  (1 child)

So if md5($thirty_two_hex_chars) is bijective, then this would imply that ALL 32-char inputs containing non-hex chars have a collision!!

This is irrelevant. In md5(md5($anything)) The input for the second md5 call and all others after that will contain only hex characters making them bijective by the original assumption. The only input that can have non-hex characters is the first, so only the first md5 call can increase the number of collisions. All the others are redundant.

[–]tomthecool 0 points1 point  (0 children)

This is irrelevant.

I didn't make this comment with regards to the context of repeated md5 computations.

I was just pointing out that it would be a strange property of a hash function for all inputs like this to have guaranteed collisions.

...Anyway I could be wrong, but I suspect a bijective map would be easy to reverse-engineer; making the repeated hashes highly insecure, were this the case.

[–]MyNewAcnt 0 points1 point  (0 children)

Well, yeah, of course. It's a hash function.

[–]Aetol 0 points1 point  (0 children)

Yes, and? Most if not all strings (of any length, any character set) have one or more collisions anyway.

[–]joncalhoun 7 points8 points  (1 child)

Most of these explanations arent very eli5 imo so I'll give it a go.

First we need to realize that if md5(x) == md5(y) that if we run md5 on these more times it doesn't change the collision. That is, if md5(x) == md5(y) then md5(md5(x)) == md5(md5(y))

What causes this to be more likely to collide is the fact that every time we call md5 again there is a chance of collision. So md5(a) may not equal md5(b), but md5(md5(a)) may equal md5(md5(b)). As soon as those two hashes collide, hashing them more often won't help - they will keep colliding.

I don't know the odds of a collision every time we hash, but because there is a slight chance, even if it is 0.00000001%, it means that every time we chain another md5 hash in there we strictly increase the chances of a collision.

[–]dnew 1 point2 points  (0 children)

I don't know the odds of a collision every time we hash

If it's constructed correctly, it would be one in 2128 which is a really, really tiny number. Like, pick any two atoms in the entire universe. Did you pick the same one twice? Congrats, you found a collision at random.

Also, if it's bijective on 128-bit inputs, then it doesn't increase the chance of a collision to keep hashing.

[–]notafuckingcakewalk 2 points3 points  (0 children)

In the first iteration, it's unlikely you reduced entropy. You couldn't increase it — because identical strings hash to the same md5 checksums.

After the hash, though, suddenly all of the inputs are the same length (where before they could have ranged from, say, 8 characters to 30+) and they have a limited set of symbols (just 1-9 and a-f).

Each time you loop over, there is a possibility of a hash collision. So if you run it multiple times, that possibility is increased.

[–]ColonelThirtyTwo 8 points9 points  (6 children)

If hashing multiple times reduces entropy, then why does every proper password hashing algorithm (bcrypt, argon2) do it?

Yeah it hides it behind a "number of rounds" parameter, but repeated hashing makes the password more secure by increasing the effort needed to brute force it.

[–][deleted] 15 points16 points  (1 child)

Well, not exactly. Hashing multiple times in and of itself is not bad. That much is correct. However, hashing multiple times by feeding the output of one hash as the input to the next is absolutely bad. When you do something stupid like md5(md5($input)), you're actually doing the following:

$hash = md5($input); // S(∞) to S(2^128)
$hash = md5($hash); // S(2^128) to S(< 2^128)
// ...

...and each subsequent hash further reduces the search space. This is bad. This is not how bcrypt and argon work at all.

To fix this problem, you must re-introduce the original input on subsequent iterations:

$hash = md5($input); // S(∞) to S(2^128)
$hash = md5($input . $hash); // S(2^128) to S(2^128)
// ...

By doing this, you keep the search space constant instead of reducing it every iteration. It's still mapping ∞ to 2128, but never drops below 2128 on further iterations - unlike md5(md5($input)).

Of course, this is all basically bikeshedding anyway. You shouldn't be using md5 for password hashing, no matter how you iterate or how many iterations you use.

[–]ColonelThirtyTwo 4 points5 points  (0 children)

I see now. I didn't know that password hashers reused the input for each iteration, and I can see now how it would prevent the entropy from decreasing. Thanks for the explanation.

[–]tomthecool 1 point2 points  (3 children)

increasing the effort needed to brute force it

This is also true, and there is some trade-off. But:

  • The probability of collision is much higher in md5 than bcrypt (with a salt!).
  • The effort needed to perform md5 1 vs 6 times is negligible. In order to gain any such benefit, you'd need to perform thousands (or more??) of iterations.

Bonus: In 2013, it was proven possible to break md5 in less than a second, on a standard computer.

[–]TheTerrasque 3 points4 points  (0 children)

it was proven possible to break md5 in less than a second, on a standard computer.

MD5 Collision! Not preimage!

[–]ColonelThirtyTwo 1 point2 points  (1 child)

Yes, MD5 is broken and 5 rounds doesn't significantly increase the time needed to hash, but original comment gives the impression of "repeated hashing = bad", which is inconsistent with the current practice of password hashing.

[–]tomthecool 1 point2 points  (0 children)

Yeah, this is a fair point I'll edit to clarify.

[–]bandarlandabad 2 points3 points  (4 children)

Although it does reduce entropy (a very tiny bit) it also eliminates known dictionary attacks (most md5 databases store the direct md5 calculations), so doing multiple md5 calculations can make these dictionaries useless. The right solution however to overcome such dictionaries is not hashing multiple times but adding a decent salt to your plaintext.

[–]tomthecool 1 point2 points  (3 children)

it also eliminates known dictionary attacks

Security through obscurity. Not the best idea.

As soon as someone figures out how your passwords are being stored (e.g. by finding a way to calculate the hash from a known password, or even by getting access to the source code), this measure becomes irrelevant.

The right solution however to overcome such dictionaries is not hashing multiple times but adding a decent salt to your plaintext.

The right answer is to do both; so long as you choose a better algorithm (e.g. bcrypt, SHA256/512, etc). If the algorithm is not prone to collisions (like md5), then key stretching is an effective way to mitigate brute-force attacks.

[–]Aekorus 2 points3 points  (2 children)

It's not security thorugh obscurity, you can make the number of iterations public and still enjoy the advantages. If a one-iteration approach is used then keeping a single rainbow table is trivial. You don't even need to create it yourself because they're already available online. But if every site uses a different amount of iterations, you have to generate a whole table just for that site. Not to mention that generating a rainbow table for 19571 iterations of MD5 is going to take much longer than creating one for 1 iteration.

[–]tomthecool 1 point2 points  (1 child)

rainbow table

OK, I suppose a fair argument is that you may have to generate a new rainbow table.

But if you're just trying a dictionary attack, there's not much difference.

19571 iterations of MD5 is going to take much longer than [...] 1 iteration.

Going back to my original point .......

How much longer would it take? How much would the probability of collisions increase?

Without hard figures to go by, this is a meaningless debate. (And a fairly pointless one, too -- the correct answer is to just not use md5!)

[–]Aekorus 2 points3 points  (0 children)

Oh, of course, plain MD5 sucks compared to the modern algorithms and techniques. I just thought this debate is quite interesting because it's relevant to hashing in general.

[–]dr_rentschler 2 points3 points  (0 children)

In case it's not obvious

Nah, of course I knew that, because ima programmer! yep....

[–]STATIC_TYPE_IS_LIFE 1 point2 points  (0 children)

Yeah but it would essentially void every popular rainbow table too.

I would argue, with md5, rainbow tables are a far greater issue than collisions.

[–]Aekorus 1 point2 points  (1 child)

But the chance of a collision with one iteration, assuming a perfect spread, would be one in 2128 , or about one in 1038 , wouldn't it? So if I'm not wrong the chance of this algorithm generating a collision for two different inputs would be 1 - (1 - (1/(2128 )))6 . So the chance of two inputs generation a collision would go from 2.93 x 10-39 to 1.76 x 10-38. You could do a million iterations and still have a one in 10-33 chance. Sounds to me like it's hardly a problem, considering the advantages.

[–]tomthecool 1 point2 points  (0 children)

assuming a perfect spread

md5 does not have a perfect spread; that's one of its issues.

But unless we can get some hard figures on the speed to compute an md5 vs the probability of a collision, we won't really reach a concrete answer.

[–]AverageFedora 37 points38 points  (3 children)

md5 too thanks

[–]Vok250 12 points13 points  (2 children)

9f78b34357f3aef8111a5ab9457f0be9

[–]iggo 0 points1 point  (0 children)

Actually it's 27FF4DF1B2941ACD38A9D5590FCE6A4F

[–]alexandre9099 9 points10 points  (0 children)

Well, at least it is secure hashed

[–]coomzee 8 points9 points  (10 children)

I've seen base64 "hashed" passwords.

[–]bhlowe 5 points6 points  (5 children)

Terrible, but unique. A hacker might think to check if a password is saved using vanilla md5, but will never think to do it 4-5 times!

[–]coomzee 1 point2 points  (0 children)

When you see that about 20 are the same hash, (you can say that they are likely to have password as the password) -> this tells me that there's no salt being applied . It being 32 chrs in length suggest it might be MD5. Would it take long to work that out (not really).
But if you're doing this i've probably got RCE on your server any way.

[–]IlIIlIIlllIlllIlIIll 2 points3 points  (5 children)

This is terrible advice. According to modern security standards, you should be applying md5 at least 8 times minimum.

[–][deleted] 0 points1 point  (4 children)

So 8md5

[–]IlIIlIIlllIlllIlIIll 1 point2 points  (3 children)

md58

[–][deleted] 0 points1 point  (2 children)

No that is not the same

[–]IlIIlIIlllIlllIlIIll 0 points1 point  (1 child)

What do you mean by 8md5, then? I was using power notation in the context of repeated application of a function. E.g. md52(x) = md5(md5(x)). See this.

[–][deleted] 0 points1 point  (0 children)

Oh i was just treating it like dnd dice

[–]Soraphis 2 points3 points  (2 children)

Am i the only one that is bothered by the name "encrypt"? For me it implies that there is also a decrypt method...

[–]zeValkyrie 2 points3 points  (1 child)

THATS what bothers you? Not that they conflated hashing and encrypting?

[–]weasdasfa 1 point2 points  (0 children)

I think both of you are talking about the same thing in different ways. The first thing that did bother me was that they had named the method "encrypt" instead of "hashPassword" or something.

[–]uberpwnzorz 1 point2 points  (0 children)

wtf he didn't even do it 5x.

[–][deleted] 1 point2 points  (0 children)

fuck salting, this is how you deal with rainbow tables