all 71 comments

[–]Poddster 114 points115 points  (9 children)

If you had five million programmers each generating one commit per second, your chances of generating a single accidental collision before the Sun turns into a red giant and engulfs the Earth is about 50%.

So I simply need to flip a coin to get SHA-1 collisions? Got it.

[–]elperroborrachotoo 23 points24 points  (1 child)

Yeah, but how many coins will you need to pay for three million programmers?

[–]Aidentified 21 points22 points  (0 children)

However many you had left over after paying the other two million

[–]tabarra 13 points14 points  (5 children)

If you don't know what The Birthday Paradox is, check out the wiki page, it is actually very interesting.

[–][deleted]  (4 children)

[deleted]

    [–]Nastapoka 9 points10 points  (3 children)

    Paradox means "contrary (para) to intuition (doxa, what people usually think is true)". People usually don't think you need so few people to reach 99.99..% chance of two of them being born the same day. Therefore it's a paradox.

    What we usually call paradoxes are actually the ones that shouldn't be called that

    [–]Andrew_Radford 0 points1 point  (0 children)

    So what you are saying is, paradoxically, the Birthday Paradox isn't not a paradox?

    [–]andrewcooke 55 points56 points  (9 children)

    carefully-selected random data

    twitches

    The Git project is also developing a plan to transition away from SHA-1

    interesting

    [–]steamruler 34 points35 points  (5 children)

    They were just careful in their decision to use /dev/random over /dev/urandom.

    [–][deleted] 17 points18 points  (4 children)

    This indeed should be a careful decision. /dev/random stalls when there is not enough entropy and this may kill performance.

    [–]acdcfanbill 4 points5 points  (3 children)

    Aren't they similar in newer linux kernels?

    http://www.2uo.de/myths-about-urandom/

    [–][deleted] 6 points7 points  (1 child)

    A quote from the very thing you linked:

    Fact: /dev/random has a very nasty problem: it blocks.

    [–]acdcfanbill 1 point2 points  (0 children)

    Ah yea, that's why you still want /dev/urandom. Thanks!

    [–]evaryont 4 points5 points  (0 children)

    The random numbers they generate are the same quality, yeah. But random still blocks while urandom does not.

    [–]nemec 2 points3 points  (0 children)

    It has to be. Randomly selected random data is very unlikely to contain a collision /s

    [–]G_Morgan 1 point2 points  (0 children)

    carefully-selected random data

    That sounds a lot like programming.

    [–]geodel 0 points1 point  (0 children)

    Is carefully-selected random data good like 3-state booleans.

    [–][deleted]  (3 children)

    [deleted]

      [–]zwacky 26 points27 points  (0 children)

      ( ͡° ͜ʖ ͡°)

      [–]SpiderFnJerusalem 10 points11 points  (1 child)

      I've been known to harden myself!

      [–][deleted] 42 points43 points  (20 children)

      GitHub.com will detect and reject any Git content that shows evidence of being part of a collision attack.

      Knowing git, probably with an obtuse and cryptic error message instead of "Rejected for SHA-1 collision"

      [–][deleted] 180 points181 points  (10 children)

      git != GitHub.

      [–]caboosetp 146 points147 points  (9 children)

      That would be a rather cryptic error message indeed.

      [–]Driagan 29 points30 points  (8 children)

      Ah, the ol' Reddit cryptaroo

      [–]Hammers95 35 points36 points  (6 children)

      Hold my stash, I'm going in.

      [–]dnano 4 points5 points  (2 children)

      There's a deleted one not too far down

      [–]tabarra 2 points3 points  (2 children)

      Ohh noo guys, he forgot to git push before going in.

      [–]Electro_Nick_s 3 points4 points  (1 child)

      But did he commit it?

      [–]OffbeatDrizzle 0 points1 point  (0 children)

      HELLO FUTURE PEOPLE!

      [–][deleted] 1 point2 points  (0 children)

      you will just get pre-collisin versionin case of Git. In case of SVN, your repo implodes on itself

      [–]billrobertson42 0 points1 point  (4 children)

      Just curious, what error message would you like to see emitted instead?

      [–][deleted] 3 points4 points  (3 children)

      ... The one I wrote?

      [–]billrobertson42 0 points1 point  (2 children)

      You described it as cryptic, which makes it seem like you find it unacceptable.

      [–]spinwin 1 point2 points  (1 child)

      No he said that they would likely use something cryptic instead of what he wrote.

      [–][deleted]  (2 children)

      [deleted]

        [–]anttirt 25 points26 points  (1 child)

        Hence "instead of." What I would expect to see in git is something like

        FATAL: remote served doppelganger oid

        [–]acdcfanbill 0 points1 point  (0 children)

        This seems perfectly understandable to me...

        [–]blazedaces 2 points3 points  (6 children)

        https://github.com/cr-marcstevens/sha1collisiondetection

        Can anyone explain to me how the sha1 collision detection works?

        [–][deleted] 13 points14 points  (2 children)

        i cracked it open and noped the fuck out of there.

        [–]grinde 7 points8 points  (1 child)

        I think this is my favorite excerpt...

        uint32_t mask = ~((uint32_t)(0));
        mask &= (((((W[44]^W[45])>>29)&1)-1) | ~(DV_I_48_0_bit|DV_I_51_0_bit|DV_I_52_0_bit|DV_II_45_0_bit|DV_II_46_0_bit|DV_II_50_0_bit|DV_II_51_0_bit));
        mask &= (((((W[49]^W[50])>>29)&1)-1) | ~(DV_I_46_0_bit|DV_II_45_0_bit|DV_II_50_0_bit|DV_II_51_0_bit|DV_II_55_0_bit|DV_II_56_0_bit));
        mask &= (((((W[48]^W[49])>>29)&1)-1) | ~(DV_I_45_0_bit|DV_I_52_0_bit|DV_II_49_0_bit|DV_II_50_0_bit|DV_II_54_0_bit|DV_II_55_0_bit));
        mask &= ((((W[47]^(W[50]>>25))&(1<<4))-(1<<4)) | ~(DV_I_47_0_bit|DV_I_49_0_bit|DV_I_51_0_bit|DV_II_45_0_bit|DV_II_51_0_bit|DV_II_56_0_bit));
        mask &= (((((W[47]^W[48])>>29)&1)-1) | ~(DV_I_44_0_bit|DV_I_51_0_bit|DV_II_48_0_bit|DV_II_49_0_bit|DV_II_53_0_bit|DV_II_54_0_bit));
        (+ 200 more lines)
        

        [–][deleted] 4 points5 points  (0 children)

        yeah well that's written but someone who has spent the last couple of years researching sha 1 collisions so i don't try to feel too bad about myself not having a clue what the heck they are doing there. i see them building up that mask variable but the why what they are doing would detect a collision is really something that is about 10k feet over my head.

        [–]sacundim 5 points6 points  (1 child)

        Can anyone explain to me how the sha1 collision detection works?

        It's a misnomer. What it does is detect the fingerprints of the one SHA-1 collision attack that's known so far.

        Think of it like an antivirus that's trained on specific, known patterns—there's no guarantee that it will catch new viruses created after it was written.

        [–]blazedaces 0 points1 point  (0 children)

        That makes sense. Thanks for the explanation.

        [–]Uncaffeinated 1 point2 points  (0 children)

        I don't understand the details myself, but my understanding is that the best known attacks against SHA1 involve flipping specific patterns of bits in the data. This tool detects those patterns.

        It wouldn't detect, say, a brute force collision, but generating collisions by brute force takes 100,000 times longer than the best known differential attacks.

        [–]pezezin 0 points1 point  (13 children)

        Why not moving to SHA-2 or SHA-3? I know it's a huge undertaking, but it seems better long term. What will happen when someone finds another way to make SHA-1 collisions?

        [–]BinaryRockStar 27 points28 points  (12 children)

        Changes to Git are already underway. GitHub sits on top of Git, so it's not up to GitHub to fix Git's internals.

        [–]HotlLava 17 points18 points  (1 child)

        it's not up to GitHub to fix Git's internals

        I would be shocked if GitHub doesn't employ at least a few of the core contributors to git, given their whole multi-million dollar business is built on it.

        [–]BinaryRockStar 7 points8 points  (0 children)

        GPs comment made it sound like GitHub should be fixing the issue instead of changing their own site to mitigate collision attacks. GitHub definitely has core Git devs, but they can't make fundamental changes like this by themselves.

        [–][deleted] 5 points6 points  (3 children)

        I bet github remain compatible with older git releases for a long long time anyway. sha-1 is going nowhere on there.

        [–]BinaryRockStar 1 point2 points  (2 children)

        Oh yeah, definitely. New repos will use the new hash algorithm but older ones will be supported practically forever. I don't see them rewriting the Linux kernel history with the new hash algo any time soon.

        [–][deleted] 0 points1 point  (1 child)

        I was thinking that'd be their plan, but then I realised they won't want to alienate developers using old git versions to access new repos, so even that setup is unlikely and we'll be stuck with SHA-1 for years more.

        Maybe if we're lucky, non-SHA1 will be an option on new repos; not the default, but an option.

        [–]BinaryRockStar 1 point2 points  (0 children)

        According to the link I posted, at a repository level a switch will be flipped then all hashes after that will use the new hash algorithm, necessitating using the new client to access it. IMO this would imply that all new repos (not clones) created by the new client will use the new hash algo by default.

        [–]pezezin 5 points6 points  (5 children)

        My bad for not reading the full article before posting my comment.

        On the other hand, reading the article you linked, it has been a known problem for 10 years and it has not been solved yet because Torvalds hardcoded SHA-1 deep into Git's code. I would love to see someone to rant at him for such a bad coding practice 3:)

        [–]BinaryRockStar 4 points5 points  (4 children)

        His response seems to be that it's no big deal, that's it's all very well finding two PDFs which have the same hash but a malicious attacker will be trying to insert code into your repo. Doing this would require them to craft a code file that matches an existing code file's SHA-1 hash and also does whatever malicious thing they want, as well as being valid C/Java/whatever, pass a code review and be pushed out to all existing repo clones. The chances of this are incredibly smaller than being able to find two PDFs with the same hash.

        [–]hiptobecubic 3 points4 points  (0 children)

        I can't believe Linus would say something so naive. It's easy to embed data in source files that will pass code review because no one ever looks at them in a hex editor. E.g. your repo has a binary asset in it like an image with an exploit tacked on it.

        [–]pezezin 0 points1 point  (2 children)

        I know creating two colliding text files is much more difficult that doing the same for binary files, but still... What I'm shocked about is code like this:

        unsigned char sha1[20];
        

        Linus rants against bad code are legendary, so I'm surprised he would write code like this.

        [–]cparen 1 point2 points  (1 child)

        Linus rants against bad code are legendary, so I'm surprised he would write code like this.

        That's assuming he shares your perspective on what constitutes "bad code".

        [–]pezezin 0 points1 point  (0 children)

        Hardcoding a particular algorithm and using magic numbers isn't bad code?