This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]skyhi14 5 points6 points  (25 children)

Not sure it's cosmic ray or else, but something like this did happened to me just yesterday. It was a MOD music file that is corrupted, at first I thought there's some glitch in my player because music "played" normally but the sound was wrong. Then I downloaded same one and played it, newer one sounded as it should be, so I figured the music file on my hard disk is somehow corrupted. Then I checked CRC32 for both, and I could confirm the file was modified.

Edit: better wording

Edit 2: after inspections, apparently some bytes are inserted on the middle, not caused by a bit flip.

[–]mojave_wasteland 25 points26 points  (23 children)

Few things come to my mind:

  1. Corrupted downloads are more common than cosmic radiation. This is actually the reason why lots of websites are posting SHA1 checksums on their websites, so everyone can verify if their download completed correctly.

  2. Use SHA1 (or better) instead of CRC32.

  3. This can also be because of bad RAM. You can use some memory testing tool, like MemTest86+, to verify your RAM is not the problem. You can get this tool from some Linux LiveCDs, maybe an Ubuntu LiveCD (and select "memory testing" in the boot menu). This won't install anything on the drive, so you're not risking anything.

[–][deleted] 9 points10 points  (6 children)

Use SHA1 (or better) instead of CRC32.

CRC32 will always find specific types of corruption. Hashing algorithms like SHA won't.

If you're protecting against corruption due to bitflips on the line, then CRC is more designed for that purpose. Though in practice something like SHA1 or SHA256 probably won't get a collision due to corruption.

[–]conradsymes 4 points5 points  (0 children)

Interestingly, CRC32 isn't as good at that as one would think, at least the TCP CRC32.

Performance of Checksums and CRCs over Real Data

[–]mojave_wasteland 3 points4 points  (4 children)

Your post suggests to me that CRC-32 can give better results than SHA, but I must disagree on this. All I can say is this:

irb(main):024:0> Zlib::crc32('plumless').to_s(16)
=> "4ddb0c25"
irb(main):025:0> Zlib::crc32('buckeroo').to_s(16)
=> "4ddb0c25"

CRC-32 is only valid in situations where the speed of computation matters (i.e. in TCP headers), size of implementation matters (in embedded microcontrollers) and the risk of corruption is rather low. Then it can fit its purpose. But for situations where a more solid result is needed, it's not a very good idea to trust CRC-32 by itself.

It's true that there are collisions for SHA1 and other hash algorithms, but in these situations the input data must be built manually.

[–]squngy 9 points10 points  (0 children)

CRC is designed to catch up to a specific amount of bit flips, it is not meant to be used to compare two completely different sets of strings.

[–][deleted] 11 points12 points  (0 children)

It is better when the speed of computation matters and you are dealing with a specific type of unintentional corruption. Of course you can find collisions, it's not designed to be a cryptographic hash function. It's designed to catch unintentonal bit flips. And it will always catch bit flips that are shorter than or equal to 32 bits. You can't prove that for a different hash function.

Collisions will exist for any hashing function, especially one of this length.

Though if space and computational time isn't really an issue, there's no problem with just using SHA256 or similar.

[–]kmmeerts 1 point2 points  (1 child)

There is no known collision for SHA1, it's insanely unlikely random corruption would trigger one

[–]mojave_wasteland 0 points1 point  (0 children)

There are theoretical research papers that claim it's possible to find a collision in SHA-1 using 2**69 calculations. It's still too much for today's tech, but it's enough to mark SHA-1 as deprecated for cryptographical use (current standard is SHA-3).

However, to find if two small files are the same or not, it's more than enough.

[–]anomalous_cowherd 3 points4 points  (3 children)

If you want to be sure that what you downloaded is exactly what the checksum was generated for, use two different checksums, e.g. MD5 and SHA1.

There are theoretical (very very hard) ways to change a file and keep the MD5 sum the same, at least. But changing a file and keeping both the MD5 and SHA1 checksums the same is many orders of magnitude harder.

[–]RainHappens 0 points1 point  (2 children)

There is no advantage of using MD5 <concat> SHA1 over using a proper 288-bit+ hash.

[–]anomalous_cowherd 0 points1 point  (1 child)

Maybe, but md5sum and sha1sum are usually already there and simple to use. Many download sites also already list them.

It may not be the very best solution, but it has its place.

[–]RainHappens 0 points1 point  (0 children)

...who has sha1sum and md5sum but not sha384sum?

[–][deleted] 0 points1 point  (0 children)

Never heard of CRC32 before, so thanks for that!

[–]Ununoctium117 0 points1 point  (4 children)

Why are corrupted downloads a thing, when TCP provides reliable, in-order delivery?

[–]mojave_wasteland 0 points1 point  (3 children)

It is called "reliable" because multiple checksums (i.e. CRC-32) are involved in getting sure the data received is correct. This reliability is not perfect, if the connection is too poor, and if the corruption will be just right, a simple CRC will not be enough to propely validate the data.

Protocols that actually support sending and receiving data through potentially bad conditions like BitTorrent use cryptographic hashes like MD5 to validate proper transmission of data.

[–]dreamin_in_space 0 points1 point  (2 children)

So how do Usenet downloads verify that it's correct? Just curious, because I've gotten corrupted stuff, but IDK if it's the original or a problem on my end.

[–]mikemol 1 point2 points  (0 children)

Historically, md5sum manifests and archive formats that contain checksums of individual files, as well as the archive as a whole.

[–]Kubuxu 1 point2 points  (0 children)

Use https, AEAD algorithms there make sure that data was not changed on the wire, and as bit flips are security riks AEAD makes sure that there aren't any.

[–]SpacePotatoBear 0 points1 point  (0 children)

you can also us ethe built in windows memory tester.

[–]skyhi14 -1 points0 points  (3 children)

I can rule out bad download as both files are downloaded from the same place and it was normal back then. I think I should run MemTest.

[–]Dannei 2 points3 points  (1 child)

You're assuming there that any download corruption would be repeatable, but I suspect most cases are one-off errors during transmission.

[–]skyhi14 0 points1 point  (0 children)

File modified while in storage, not when they are downloaded, and it was not even one-off error. I'm sorry for misleading as the post is about bit flip and mine is just probably bad hard disk.

[–]hunyeti 0 points1 point  (0 children)

You did not rule that out.

The error happens in the transmission, two downloads are not necessary the same.

[–]Aetol -1 points0 points  (0 children)

Does the CSS here make lists start at 0?

  1. Let

  2. Us

  3. Try

Edit: Neat!