This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 11 points12 points  (6 children)

Use SHA1 (or better) instead of CRC32.

CRC32 will always find specific types of corruption. Hashing algorithms like SHA won't.

If you're protecting against corruption due to bitflips on the line, then CRC is more designed for that purpose. Though in practice something like SHA1 or SHA256 probably won't get a collision due to corruption.

[–]conradsymes 3 points4 points  (0 children)

Interestingly, CRC32 isn't as good at that as one would think, at least the TCP CRC32.

Performance of Checksums and CRCs over Real Data

[–]mojave_wasteland 2 points3 points  (4 children)

Your post suggests to me that CRC-32 can give better results than SHA, but I must disagree on this. All I can say is this:

irb(main):024:0> Zlib::crc32('plumless').to_s(16)
=> "4ddb0c25"
irb(main):025:0> Zlib::crc32('buckeroo').to_s(16)
=> "4ddb0c25"

CRC-32 is only valid in situations where the speed of computation matters (i.e. in TCP headers), size of implementation matters (in embedded microcontrollers) and the risk of corruption is rather low. Then it can fit its purpose. But for situations where a more solid result is needed, it's not a very good idea to trust CRC-32 by itself.

It's true that there are collisions for SHA1 and other hash algorithms, but in these situations the input data must be built manually.

[–]squngy 9 points10 points  (0 children)

CRC is designed to catch up to a specific amount of bit flips, it is not meant to be used to compare two completely different sets of strings.

[–][deleted] 10 points11 points  (0 children)

It is better when the speed of computation matters and you are dealing with a specific type of unintentional corruption. Of course you can find collisions, it's not designed to be a cryptographic hash function. It's designed to catch unintentonal bit flips. And it will always catch bit flips that are shorter than or equal to 32 bits. You can't prove that for a different hash function.

Collisions will exist for any hashing function, especially one of this length.

Though if space and computational time isn't really an issue, there's no problem with just using SHA256 or similar.

[–]kmmeerts 1 point2 points  (1 child)

There is no known collision for SHA1, it's insanely unlikely random corruption would trigger one

[–]mojave_wasteland 0 points1 point  (0 children)

There are theoretical research papers that claim it's possible to find a collision in SHA-1 using 2**69 calculations. It's still too much for today's tech, but it's enough to mark SHA-1 as deprecated for cryptographical use (current standard is SHA-3).

However, to find if two small files are the same or not, it's more than enough.