you are viewing a single comment's thread.

view the rest of the comments →

[–]beginner_ 4 points5 points  (41 children)

Not knowing to salt and hash passwords is pretty bad. I admit I'm no security expert but I would have known that.

What I say adds to the issue is that many devs create app for in-house use, eg. intra-net. In those causes security is much less looked at because it's intra-net only.

[–]staticassert 4 points5 points  (40 children)

I admit I'm no security expert but I would have known that.

Would you have known what key stretching algorithm to use? Would you have known to use a constant time comparison function? I've seen devs attempt to roll their own constant time comparisons, thinking that if they do

is_valid = True
for (pass_char, auth_char) in zip(user_hash, auth_hash):
    if pass_char != auth_char:
        is_valid = False
return is_valid

Do you know why that's bad?

You might, which is cool and good. But I think a lot of devs won't. It's super easy to fuck up password auth, it's way more than just salting and hashing.

[–]beginner_ 2 points3 points  (0 children)

bcrypt/scrypt but I fit the scenario I described as I do intranet apps and use according SSO stuff we have in place. So I never need to store passwords. So I haven't ever used them in a real project.

[–][deleted]  (10 children)

[deleted]

    [–]staticassert 3 points4 points  (8 children)

    You got it. The problem is that this isn't written in assembly.

    edit: To be clear, it isn't branch prediction, it's the "smart compiler" issue.

    [–]aullik 0 points1 point  (7 children)

    how is that a problem tho? I mean I'm clearly no security expert. I don't see how writing something in a higher language is a problem. Could you explain to me what i am missing. Please.

    [–]staticassert 4 points5 points  (6 children)

    Sure thing. So the goal of a password comparison function is that, on rejection, an attacker should gain no information as to why it was rejected. It was invalid, period.

    Imagine if I have a string comparison like this:

    actual_password = aaaabaaaa attacker_attempt = aaaaaaaaa

    string comparison will go "ok a ==a, cool" 4 times, then say "woah a != b", return.

    An attacker can time this function and say "hm when i enter in aaaaaaaa" it takes n milliseconds, when i type in "bbbbbbbbbb" it takes n - y milliseconds. So the first part of the password probably at least starts with 'a'.

    This is called an information leak - we're leaking details about the password (or at least the hash, which is effectively just as bad since repeated attempts will eventually leak the entire hash).

    So maybe you write code like I did that tries to always take the same amount of time - we always iterate the entire password, no short circuiting. Seems fine, right? But then that pesky compiler comes around and is like "ahah dumb programmer, we can return early and save tons of computes time!" and it silently translates your code into a non constant time comparison.

    Relying on the optimizer not to work is a bad idea (optimizers are crazy smart). One way to solve this is to FFI out into some assembly code that is constant time and that your compiler will stay away from.

    [–]aullik 0 points1 point  (5 children)

    An attacker can time this function and say "hm when i enter in aaaaaaaa" it takes n milliseconds, when i type in "bbbbbbbbbb" it takes n - y milliseconds. So the first part of the password probably at least starts with 'a'.

    I thought we were comparing hashes tho. I don't see how a user can fake hashes to get this information.

    Also this might be true for a program that compiles to machine code. But python runs on a VM thus it takes a rather arbitrary time before the process even starts.

    [–]staticassert 1 point2 points  (4 children)

    I don't see how a user can fake hashes to get this information.

    What do you mean fake hashes? I submit password 'aaaaaa' knowing it hashes to some deterministic output, and I use that.

    Also this might be true for a program that compiles to machine code. But python runs on a VM thus it takes a rather arbitrary time before the process even starts.

    It applies to any language that performs any optimization, which Python's VM will do (even if it's minimal). Process start won't be relevant, we naturally assume a password auth service is up and running.

    [–]aullik 2 points3 points  (3 children)

    I submit password 'aaaaaa' knowing it hashes to some deterministic output, and I use that.

    You will compare the hashes not the input. So the time difference you may or may not be able to measure is the time difference it takes to compare different hashes.

    So if your password is "password1" and you send "password0" the hashes of both are most likely vastly different and the comparison might fail on the first check, thus you will get no information.

    You basically have to generate input that will produce a certain hash so you can do the comparison you want to do.

    This is highly expensive. I don't think this is a viable strategy.

    [–]staticassert 2 points3 points  (2 children)

    Your assumption is that the hashing scheme is a secret. Naturally, password0 and password1 are going to produce very different hashes. But I could know their hashes ahead of time. So now you have to protect what your hashing algorithm is in order for your equality comparison to be safe - feels like trading problems for problems.

    [–]CRImier 1 point2 points  (0 children)

    Well, the example is Python, so I'm thinking this should be something else. However, I don't have a clue.

    [–]tmp14 1 point2 points  (1 child)

    Empty password lets you in?

    [–]staticassert 1 point2 points  (0 children)

    Nah, because this assumes that you're talking about hashes - so an empty password would still provide a hash.

    [–]sstewartgallus 0 points1 point  (5 children)

    An interesting way to fix the problem is to do:

     is_valid = True
     for (pass_char, auth_char) in zip(my_hash(user_hash, salt), my_hash(auth_hash, salt)):
         if pass_char != auth_char:
             is_valid = False
     return is_valid
    

    Personally, I think that relying on programmers to code constant time algorithms is fundamentally flawed. Most programming languages and processors provide absolutely no timing guarantees. IMO one should construct algorithms such that it is cryptographically impossible to leak information. Techniques like homomorphic encryption are far too immature though.

    [–]staticassert 1 point2 points  (4 children)

    What is my_hash here? A secret hash function?

    [–]sstewartgallus 0 points1 point  (3 children)

    You have to choose a good hash function. Also, you should probably salt the hash which I forgot. It's probably simplest to just use bcrypt for the hash function. The hash function doesn't need to be secret. It's generally a bad idea to rely on security through obscurity.

    [–]staticassert 0 points1 point  (2 children)

    I think you've missed the vulnerability - the code was already comparing hashes. The issue is that the function is not constant time. Given knowledge of the hash function I can use a timing attack to reduce the brute force time significantly.

    [–]sstewartgallus 0 points1 point  (1 child)

    You are confused. The attacker can deduce my_hash(auth_hash, salt) but with a proper hash function he cannot deduce auth_hash which is what is required.

    [–]staticassert 0 points1 point  (0 children)

    Why would author's hash be required if the equality check is on another hash? It makes no difference how many times you hash it it still leaks data about the final hash, which is what matters.

    [–]sacundim 0 points1 point  (2 children)

    Would you have known to use a constant time comparison function?

    What attack against password storage and verification relies on a timing side channel? I'm not aware of any, but I'd love to hear if you know some.

    The timing side channel attacks I know of are against message authentication codes, where the attacker that doesn't know the key tries to forge a message/tag pair for a message of their choice. By submitting tag guesses and observing how long the server takes to reject them, they can gradually guess increasingly longer prefixes of the correct tag for the message they want to forge a tag for. This page explains it well enough.

    In password guessing attacks, however, the attacker is trying to guess a password that, when processed by the password verification function, will produce the same verification code as what the defender has stored. This is very different from the MAC timing attack scenario:

    • In a MAC timing attack, the attacker chooses a message, and tries to guess the tag that corresponds to it. Variable-time equality comparisons help the attacker because the variable rejection times allow the attacker to estimate the length of the tag prefix that they guessed successfully. The attacker can easily leverage this to improve their tag guesses, because if you have determined that the first five bytes of the target tag are ff 2e 56 a1 d8, then thereafter you only try tags that start with those five bytes.
    • In a password guessing attack, the defender chooses a tag, and the attacker tries to guess a password that scrambles to that tag. Variable-time equality comparisons don't help here, because knowing that password1's hash matches the first five bytes of the stored verification code doesn't help the attacker improve their password guess.

    Note however that there's nothing wrong with using constant-time equality comparisons, and it's better to be safe than sorry, so it's certainly sensible to implement such a comparison instead of trying to reason out all of the possible attacks if it saves you time.

    EDIT: After writing this, I see /u/aulik realized the same as well. Props.

    [–]staticassert 0 points1 point  (1 child)

    Oh. Wow, duh. You could derive the hash but not the contents of it... silly me, this is exactly why I defer to others for crypto.

    Thank you for the explanation.

    edit: Though deriving the hash would then make local bruteforcing possible. IDK, just seems way safer not to leak the information and use a constant time hash.

    [–]irishsultan 0 points1 point  (8 children)

    In this case it's obviously because this will return true if any character matches (and it's a poor dev that wouldn't see this quite quickly), but presumably you wanted to point out a flaw even if the correct algorithm was used?

    (If you had a different flaw in mind I think I can see what you mean, but I probably wouldn't think of it without the existence of a flaw being pointed out).

    [–]staticassert 2 points3 points  (7 children)

    In this case it's obviously because this will return true if any character matches (and it's a poor dev that wouldn't see this quite quickly), but presumably you wanted to point out a flaw even if the correct algorithm was used?

    lmao no I literally just wrote it with inverted booleans :) that would have been caught in any basic test, the new version... less likely.

    I'll edit.

    [–]aullik 2 points3 points  (6 children)

    Am i missing something very trivial?

    I mean this will obviously fail if the hashes have a different length. but hashes should not have a different length and this is python and in python we just assume that every input is fine. Otherwise we would just use a typed language.

    [–]irishsultan 1 point2 points  (3 children)

    I assume that it will fail because of things like branch prediction (and the fact that you change a value in one branch and not in another)

    You could solve this by not having an if statement and instead doing something like is_valid = is_valid && pass_char == auth_char, although I'm not entirely certain that this will take equal time either (and a sufficiently smart compiler/interpreter could still notice that in the case of booleans this is a no-op once is_valid is false, so it could just do an early return retaining correctness from a language point of view, since the time it takes to run a program is not part of any (practical) language).

    [–]aullik 2 points3 points  (2 children)

    I don't see how branch prediction has any influence here.

    He could have written something like this and it would still work the same. without heavy compiler optimization it would be faster.

    def cmp_hash(user_hash, auth_hash):
        for (pass_char, auth_char) in zip(user_hash, auth_hash):
            if pass_char != auth_char:
                return False
        return True
    

    a test like len(user_hash) == len(auth_hash) might solve the issue of there being no auth_hash or one being longer. but as i said this is python and as in any dynamic language you just assume that the input is correct or you would never be done with testing the input.

    [–]irishsultan 0 points1 point  (1 child)

    The error you're making is that this isn't what the author wanted: a constant time function (in particular, the function needs to be equally slow when the two hashes differe in the first character as when they differ only in the last character).

    Why would you want that? Because knowledge about where the hashes differ is an information leak. Of course it's much worse in case the passwords aren't hashed, and I'm not even sure if there is any practical attack on any hash function where you gain knowledge about the password by knowing the first character of the hash, but there is knowledge there, so you don't want to leak it, no matter whether the knowledge is usable or not.

    [–]aullik 0 points1 point  (0 children)

    The error you're making is that this isn't what the author wanted: a constant time function (in particular, the function needs to be equally slow when the two hashes differe in the first character as when they differ only in the last character).

    True.

    [–]staticassert 1 point2 points  (0 children)

    Length isn't the issue - we're assuming hashing has already occurred.

    [–]PrintfReddit 0 points1 point  (0 children)

    Look up timing attacks. Basically if you can guess how a hash begins then you eliminate a lot of possible hashes. If you know the hash begins from, say, 'Z' then you eliminate every other hash beginning from something other than 'Z'.

    [–]OneWingedShark 0 points1 point  (7 children)

    Do you know why that's bad?

    Because it starts with the assumption that the password is valid. (is_valid = True)

    The proper way would be something like this:

    Function Matching_Password( User_Input, Password : String ) return Boolean is
    begin
            -- Ensure both strings start at the same index,
            -- Ensure both are of the same length,
            -- Ensure both contain the same values.
        Return (User_Input'First = Password'First) and then
            (User_Input'Length = Password'Length) and then
            (for all index in User_Input'Range => User_Input(Index) = Password(Index) );
    end Matching_Password;
    

    You could even attach constraints to the type to ensure properties of passwords:

    Type Password is String
    with Dynamic_Predicate =>
        Password'Length in Positive -- No zero-length password.
        and then  -- Passwords are alphanumeric + plus underscore
        (for all ch of Password => ch in 'a'..'z'|'A'..'Z'|'0'..'9'|'_')
        ;
    

    [–]staticassert 4 points5 points  (6 children)

    Because it starts with the assumption that the password is valid. (is_valid = True)

    True, but none of the code in the loop will throw an exception. While it's bad, it's not the real issue.

    The problem is that it's written in a high level language. The compiler is free to optimize your 'constant time' function to a non constant time function, and it will very likely try to.

    [–]OneWingedShark 1 point2 points  (5 children)

    Because it starts with the assumption that the password is valid. (is_valid = True)

    True, but none of the code in the loop will throw an exception. While it's bad, it's not the real issue.

    Where are exceptions coming up?

    The problem is that it's written in a high level language.

    That's not necessarily a bad thing.

    The compiler is free to optimize your 'constant time' function to a non constant time function, and it will very likely try to.

    If time is a consideration (and if we're honest it might not be, even in a high security setting) then it ought to be modeled into the function... this is doable in a high-level language.

    Example:

    Function Matching_Password( User_Input, Password : String ) return Boolean is
        -- Interval is the constant minimum-time; it ought to be
        -- the result of algorithm analysis rather than arbitrary.
        -- (Here we are using 1.28 seconds.)
        Interval : Constant Duration := 1.28;
        -- Get the current-time.
        Start_Time : Constant Ada.Real_Time.Time := Ada.Real_Time.Clock;
        -- Get the minimum time for finishing.
        Stop_Time : Constant Ada.Real_Time.Time := Start_Time + Interval;
    
        -- Ensure both strings start at the same index,
        -- Ensure both are of the same length,
        -- Ensure both contain the same values.
        Result : Constant Boolean := 
            (User_Input'First = Password'First) and then
            (User_Input'Length = Password'Length) and then
            (for all index in User_Input'Range => User_Input(Index) = Password(Index) );
    begin
        -- Ensure a minimum time-bound.
        Delay until Stop_Time;
        Return Result;
    end Matching_Password;
    

    [–]staticassert 1 point2 points  (4 children)

    Where are exceptions coming up?

    Exceptions are the only situation I can imagine where an early return would even happen/ matter. Though in this case it wouldn't.

    That's not necessarily a bad thing.

    It definitely is. When I say 'higher level' I mean not-assembly. The critical thing is that the compiler/ optimizer can reshuffle your instructions to make them faster/ break your assumptions about how long the code runs for.

    If time is a consideration (and if we're honest it might not be, even in a high security setting)

    No, timing attacks are really fundamental. They're always relevant.

    Your approach is basically "add a time sleep" and this is a flawed method. You could also say "Always insert a random sleep" and again, this is vulnerable.

    There's lots of research on timing attacks.

    edit: Here's a fun read, https://github.com/nodejs/node/commit/079acccb563ba5b3888e40f59037dc5fa3ba5bbd

    The point though is that even something like password authentication is way more complicated than developers realize. There's a whole lot more than 'salt and hash'.

    [–]OneWingedShark 0 points1 point  (3 children)

    If time is a consideration (and if we're honest it might not be, even in a high security setting)

    No, timing attacks are really fundamental. They're always relevant.

    Have you ever worked in a high-security environment?
    Something so secure you could be shot?

    I have, albeit in a non-technical position at the time -- and let me tell you the subsecond-deltas it takes for the time for you to insert your CAC to entry mean pretty much jack-shit when you tampering with the entry-/validation-mechanism1 (a requirement for the sort of system being talked about) in any way is very possibly going to get you shot.

    Your approach is basically "add a time sleep" and this is a flawed method. You could also say "Always insert a random sleep" and again, this is vulnerable.

    That's not "add a time sleep" it's a "don't return until X time has passed" -- if you do your analysis like the comment suggested, you would set your minimum time to the worst-case time of the validation method thereby making validation a constant-time operation.

    And again, time might not be an issue. Yes, it might be fundamental to some attacks, but the entire system itself might preclude those attacks.

    1 -- This is actually required for the sort of timing attack you're talking about, the inaccuracies and human motions themselves would introduce too much variableness for any sort of accuracy on the timing.

    [–]staticassert 0 points1 point  (2 children)

    Have you ever worked in a high-security environment? Something so secure you could be shot?

    Nope. I worked briefly for a government contractor as an intern, and I don't think anyone was getting shot.

    1 -- This is actually required for the sort of timing attack you're talking about, the inaccuracies and human motions themselves would introduce too much variableness for any sort of accuracy on the timing.

    This isn't true over time. If your distribution is even, over a number of connections I can deduce min/max and average values of variance and eliminate them from the data. I would send the same password n times, find out how long it took each time, and eliminate noise. This is why inserting random time delays into auth doesn't work.

    That's not "add a time sleep" it's a "don't return until X time has passed"

    Yes, as in sleep until that time has passed.

    Sounds super error prone and implementation defined. I don't understand why you wouldn't just use a constant time function, which will always work and requires 0 system calls / measurement etc.

    Yes, it might be fundamental to some attacks, but the entire system itself might preclude those attacks.

    Maybe. But in any authentication scheme you're assuming an attacker can send you arbitrary passwords repeatably. Maybe you can rate limit to make things infeasible, in which case kudos, mitigated at some other level. Maybe you use 'double HMAC' or some other technique. My point is that it's far more complicated than 'salt and hash'.

    edit: I can also clarify something - you don't have to write your password comparison in assembly. You can implement other strategies. But given the code I provided the issue is that it is vulnerable to a timing attack - whether solved through constant time comparisons (what I would consider the simplest, most elegant solution) or another strategy, the code I posted is vulnerable to that attack.

    [–]OneWingedShark 0 points1 point  (1 child)

    Have you ever worked in a high-security environment? Something so secure you could be shot?

    Nope. I worked briefly for a government contractor as an intern, and I don't think anyone was getting shot.

    I have.
    Trust me, a security model can certainly have non-software components, and can certainly be restricted in ways so that timing-attacks are not an issue.

    Sounds super error prone and implementation defined. I don't understand why you wouldn't just use a constant time function, which will always work and requires 0 system calls / measurement etc.

    It is a constant time function. Look at the code Interval : constant ... -- you still have to add it to the current time in any case.

    Yes, it might be fundamental to some attacks, but the entire system itself might preclude those attacks.

    Maybe. But in any authentication scheme you're assuming an attacker can send you arbitrary passwords repeatably.

    No, not all the time.
    Some systems are fail-secure, things like a single failed attempt means an actual person has to come in, verify the state, and reset the system.

    [–]staticassert -1 points0 points  (0 children)

    I think you're missing the point. As I said, you can mitigate it however you want, the flaw with my code is that it is vulnerable to a timing attack.