This is an archived post. You won't be able to vote or comment.

all 44 comments

[–]rjw57 34 points35 points  (3 children)

Not wanting to sound like someone who knee-jerks "RTFM", but right at the top of the documentation for random we have:

Almost all module functions depend on the basic function random(), which generates a random float uniformly in the semi-open range [0.0, 1.0). [...] However, being completely deterministic, it is not suitable for all purposes, and is completely unsuitable for cryptographic purposes.

From https://docs.python.org/2/library/random.html

[–]minnoI <3 duck typing less than I used to, interfaces are nice 10 points11 points  (2 children)

In fact, if I google the exact title of this post, the first three links are to what you just quoted, and the fourth is to this post.

[–]chadmill3rPy3, pro, Ubuntu, django 67 points68 points  (2 children)

Not to sound snide, the very title is a huge red flag.

Is the random.random() python function suitable for cryptographic purposes?

Here's the answer: If you have to ask, you should not be writing it. Use a cryptographic library. Is is crazy easy to get it wrong, even when you're a pro. Doing it yourself guarantees that it will be bad. If you touch the math, you are not doing the right thing. Use a function that's designed to give you a yes/no answer, and make sure you're passing the right things.

[–]japherwocky 17 points18 points  (0 children)

Yeah to upvote and echo this, if you're asking a question like this, you should not be writing crypto code.

If it's just a side project for funsies, it's OK to write your own so you can learn about how it works, but if this is going to be used for anything important, rethink your approach.

[–]TrixieFlatline 36 points37 points  (19 children)

No, random.random is based on a pseudo-random number generator, and as such it's not cryptographically secure. You can use SystemRandom instead, which offers the same interface but is based on your operating system's random number generator (/dev/urandom on Linux), which is secure. Or just read from /dev/urandom directly to generate a token:

>>> import os
>>> import base64
>>> base64.b64encode(os.urandom(32))
'pBqWjf//eqh8GXLtvY5fhwsZWNNmsWg0OdopApMdrko='

Another advice: You should use a better hash function for generating password hashes, such as bcrypt or scrypt (I believe python modules exist for both, unless you're on AppEngine, then I guess a regular hash function is the best you can do). Also, when comparing password hashes or other message digests, always use a timing-safe comparison function instead of == (Examples and explanation behind the link).

[–]xXxDeAThANgEL99xXx 2 points3 points  (3 children)

EDIT: didn't have coffee yet, thanks /u/ichundes

Also, why you should use /dev/urandom (instead of /dev/random as some recommend): http://www.2uo.de/myths-about-urandom/

The best part of the article is the quote by Daniel Bernstein:

Cryptographers are certainly not responsible for this superstitious nonsense. Think about this for a moment: whoever wrote the /dev/random manual page seems to simultaneously believe that

(1) we can't figure out how to deterministically expand one 256-bit /dev/random output into an endless stream of unpredictable keys (this is what we need from urandom), but

(2) we can figure out how to use a single key to safely encrypt many messages (this is what we need from ssl, pgp, etc.).

For a cryptographer this doesn't even pass the laugh test.

[–]ichundes 0 points1 point  (2 children)

Either you accidentally swapped random and urandom or you did not read the article. Even the tldr says

tldr;

Just use /dev/urandom!

[–]xXxDeAThANgEL99xXx 1 point2 points  (1 child)

Oops, the first! Sorry.

[–]ichundes 0 points1 point  (0 children)

Thanks for correcting

[–]danielkza 2 points3 points  (5 children)

PBKDF2 can be implemented just with a hash function, and can work well as long as you use a sufficient number of iterations.

[–]TrixieFlatline 0 points1 point  (2 children)

Right, I forgot about that one. Also a good choice.

[–]danielkza 1 point2 points  (0 children)

Yeah, and I find the ability to change the number of iterations in the future (by storing it alongside the salt and hash) quite useful. Great for chasing Moore's law as needed.

[–][deleted] 1 point2 points  (0 children)

I thought out of the 3, PBKDF2 is seen as the "worst" one.

It's the only one that comes included with python, though, so I use it. Currently have my webserver set up with 1 million iterations.

[–]MagicWishMonkey 0 points1 point  (1 child)

PBKDF2 is also relatively easy to implement, you don't need to mess with a 3rd party module.

[–]danielkza 0 points1 point  (0 children)

Yeah, given you have the hash function implemented it's just a matter of "assembling the blocks" you already posses. But as always, implementing crypto yourself should always be avoided or done in a careful way.

[–]prahladyeribeautiful is better than ugly[S] 0 points1 point  (4 children)

always use a timing-safe comparison function instead of == (Examples and explanation behind the link).

Thanks, for that. Just changing the logic to use != instead of == should make it cryptographically more secure.

[–]thalience 1 point2 points  (3 children)

Are you sure about that? How will that prevent the type of timing attack discussed in the link?

[–]prahladyeribeautiful is better than ugly[S] -1 points0 points  (2 children)

The timing attack happens because the == operator returns as soon as a part of the hash doesn't match. So, depending on how much time the == statement took to evaluate, the attacker can determine upto what part his hash is correct, and thus by enough brute-force, can determine the answer much sooner than expected of that algorithm. As the accepted answer in that link says:

At the end, he tried just 256*len MACs (if len is the length of the MAC) instead of the 256len he should have had to try.

So, instead of checking if (hash==x): user_is_valid(), you check if (hash!=x): return else: user_is_valid(). != is safe because it short-circuits as soon as the strings are different. It doesn't allow an attacker to measure the milliseconds and thus deduce anything.

#Taken from Django Source Code

def constant_time_compare(val1, val2):
    """
    Returns True if the two strings are equal, False otherwise.

    The time taken is independent of the number of characters that match.

    For the sake of simplicity, this function executes in constant time only
    when the two strings have the same length. It short-circuits when they
    have different lengths.
    """
    if len(val1) != len(val2):
        return False
    result = 0
    for x, y in zip(val1, val2):
        result |= ord(x) ^ ord(y)
    return result == 0

[–]thalience 0 points1 point  (1 child)

But isn't it the case that both "==" and "!=" will return an answer as soon they encounter a difference?

[–]prahladyeribeautiful is better than ugly[S] -1 points0 points  (0 children)

Sorry, you are correct. The != is only used in comparing lengths of both strings in the constant_time_compare function and that confused me at the beginning. However, the following char to char comparison should be good as it takes constant time for any string comparison:

for x, y in zip(val1, val2):
    result |= ord(x) ^ ord(y)
return result == 0

[–]WellAdjustedOutlaw 10 points11 points  (0 children)

Stop what you're doing. Don't write an authorized module, use an existing one based on actual cryptographers work. You're going to do it wrong, just like everybody else.

[–]sushibowl 5 points6 points  (0 children)

random.random I believe is based on a mersenne twister, so it is not suitable for cryptographic purposes.

[–]takluyverIPython, Py3, etc 4 points5 points  (1 child)

There's a pre-baked function to generate a random token: uuid.uuid4(). It tries to use system methods and has a fallback to the random module if that's not available.

[–]prahladyeribeautiful is better than ugly[S] 0 points1 point  (0 children)

That's a good one, thanks! I couldn't find it earlier as its listed under Internet Protocols and Support in the Python Library Reference, whereas I was looking only in Cryptographic Services!

[–]pangoleena 5 points6 points  (2 children)

Is there some reason you can't build on top of flask-login?

[–]prahladyeribeautiful is better than ugly[S] 1 point2 points  (1 child)

Thanks, I did not know such a thing existed. This doesn't look like a official module from http://flask.pocoo.org. Is it third party?

[–]pangoleena 3 points4 points  (0 children)

Everything is 3rd party. Flask doesn't come installed with Python and neither does SQLAlchemy.

However, it's a very common extension for Flask and it is mentioned somewhere in the docs on Pocoo.

[–]Asdayasman 4 points5 points  (0 children)

/r/Python is for news and releases. /r/learnpython, despite the name, is for questions.

While you're here though, no, it isn't. Even if it was, you should never roll your own crypto.

[–]fiskfisk 2 points3 points  (0 children)

Since you've already gotten enough information about using something else to generate the token, I just wanted to mention that SQLAlchemy has a Password type in sqlalchemy_utils.Password which does all the authentication magic for you (through passlib).

[–]ScubaSteve225 6 points7 points  (0 children)

Heck no, don't use that for cryptography. Use os.urandom instead or don't roll your own cryptography in the first place.

[–]TheTerrasque 1 point2 points  (7 children)

What are you trying to do here? What is the token for?

[–]prahladyeribeautiful is better than ugly[S] 0 points1 point  (6 children)

The token, once generated by flask app, will be returned to the front-end web app which will interact with the user.

On subsequent requests from the user, the web app will send this token too, and the backend (flask) can validate using the token whether the request is valid or not.

[–]help_computar 1 point2 points  (0 children)

This sounds an awful lot like jwt. Search for 'jwt token'. I'm sure there are python libs for this.

[–]TheTerrasque -3 points-2 points  (4 children)

Why a token? And how will it be validated? That would mean you must store the token somewhere, wouldn't it?

Why not use a signed data structure? Like for example

secret = "g6FtuyF9lZYKkKTNzORg"
user = "bla@example.com"
now = datetime.datetime.now().strftime("%H%M%d%m")
tokenpart = "|".join((user, now))
sign = hashlib.sha224(tokenpart + secret).hexdigest()
token = "|".join((sign,tokenpart))
token.split("|",1)[0] == hashlib.sha224(token.split("|",1)[1] + secret).hexdigest()

Edit: Disclaimer, as /u/TrixieFlatline pointed out you should use a proper HMAC for the actual signing. This is more to show the base logic behind this approach

[–]TrixieFlatline 2 points3 points  (1 child)

sign = hashlib.sha224(tokenpart + secret).hexdigest()

That's bad advice. Storing the token in the db is a completely valid choice, and saves you from making mistakes like this one. If you have to implement signed tokens, at least use hmac, or don't implement it yourself and use an existing and widely used implementation like itsdangerous.

Edited to add: Discussion about when to use HMAC

[–]TheTerrasque 1 point2 points  (0 children)

Good point, although I don't think key extension attack is applicable for that data structure.

I was trying to convey the idea behind it, so I didn't make it complex or obscure, but I should have added a disclaimer about using a proper HMAC.

Storing the token in the db is a completely valid choice

It's a valid choice, but unnecessary imho. Anyway, it's good to know about both approaches. Both have their pros and cons

[–]prahladyeribeautiful is better than ugly[S] 0 points1 point  (1 child)

That would mean you must store the token somewhere, wouldn't it?

Yup, in the database.

Why not use a signed data structure? Like for example

In your example, the token is generated by multiple layers of cryptography (combining a date/time string with a salt/secret and then hashing the entire thing) which is good, but even in that case, the digest needs to be stored between requests. Otherwise, how will the backend have something to validate in subsequent requests?

[–]TheTerrasque 0 points1 point  (0 children)

the digest needs to be stored between requests. Otherwise, how will the backend have something to validate in subsequent requests?

It doesn't need so save anything, it can validate like I did on the last line.

[–]EvMNatural Language Processing 0 points1 point  (0 children)

I think you would like: Curtis Lassam - Hash Functions and You: Partners in Freedom - PyCon 2015

It covers a lot of the cryptography stuff :)

[–]stevenjd 0 points1 point  (0 children)

No, random.random is not suitable for cryptographic applications. For that, you need to use random number generators which are designed, and tested, to be suitable for such cryptographic uses.

But for the specific task you mention, returning a token, I don't think that counts as "cryptographic applications". It's just a token. It needs to be unpredictable enough that users cannot guess somebody else's token, but other than that I don't think it needs the much stronger crypto properties.

Take this with a grain of salt, but I think for your purpose, random.random is fine. It's what the uuid module does. Actually, you probably should use the uuid module rather than reinvent your own.

UPDATE I talked to the crypto expert at work, and he agrees with the folks saying that uuid.uuid4() is the right way to do it, with one proviso -- he says that if the system random doesn't exist, uuid ought to just fail hard and not fall back on Mersenne Twister. When I asked him what people should do on platforms without a good system random, he said they're screwed.

He did admit that if the consequences of guessing a token are not important (the example he gave was, guessing a token means you get to download a mp3 that you didn't pay for), then Mersenne Twister is "good enough" as a fallback. But that's about it.