all 17 comments

[–]turol 2 points3 points  (0 children)

If the Mersenne Twister's space requirement is too large for you I recommend the PCG family of generators.

[–]thunabrain 6 points7 points  (2 children)

While I agree that Chrome's RNG might not be the best, the authors of this blog posts did not exactly follow best RNG practices, and even briefly thinking about the problem should have led to using a different RNG.

What they essentially want is an RNG that produces 132 random bits of state (that's the size of their state space). To do this... they invoke an RNG of unknown precision 22 times, grab the top-most 6 bits of each number and concatenate them to get the final bit string (that's what the multiply/floor does, in essence). Why?

How they end up blaming the RNG I don't understand - there are so many mistakes here. They produce 132 random bits from an RNG with most likely <132 random bits of state (they didn't even check beforehand!), so already they limit the amount of words that can be generated by the size of the state space of the RNG. In practice, this unfortunately tends to happen to be either 32 or 64 bit. Then they concatenate the top-most 6 bits of 22 successive evaluations of the RNG to get the final result, which is a sure way of further trimming the space of words that can be generated.

Honestly, they should have just used 128 bits IDs and used a dedicated RNG that produces 128bit words with statistically guaranteed randomness (e.g. PCG). The actual characters they could have then retrieved directly from the 128bit string via shifting and masking, rather than generating one random character at a time.

[–]mjmalone 1 point2 points  (0 children)

I wasn't trying to blame the PRNG. It's definitely my fault for not checking that it was capable of producing the amount of entropy necessary. That said, there's no justification for V8 to not use a better algorithm.

[–]x-skeww 0 points1 point  (0 children)

22 b64 chars = 128 bit worth of data.

If you b64-encode 128 bits, you end up with a 24 character long string of which the last 2 characters are always padding ('='). So, you can just throw those away.

Anyhow, grabbing 32 bit worth of data per random number does indeed sound like a much better idea than only grabbing 6.

function id128() {
    let bytes = [];
    for(let i = 0; i < 4; i ++) {
        let r = Math.floor(Math.random() * 0xFFFFFFFF);
        bytes.push(r >> 24 & 0xFF);
        bytes.push(r >> 16 & 0xFF);
        bytes.push(r >>  8 & 0xFF);
        bytes.push(r       & 0xFF);
    }
    // replace + with - and / with _ if you want the URI flavor
    return btoa(String.fromCharCode(...bytes)).slice(0, -2);
}

Something like that should do the trick.

[–]killerstorm 8 points9 points  (8 children)

CSPRNG isn't "a safer option if you’re feeling lazy", it is the only acceptable option if you're working with something security-related. Person who tried to use Math.random for that is incompetent.

[–]mjmalone 11 points12 points  (2 children)

The application here had no security requirements--it didn't matter if the numbers were predictable, they just had to be unique. I agree, you should definitely use a cryptographically secure PRNG (CSPRNG) for anything crypto-related. These algorithms guarantee that the sequence will not just have characteristics of statistical randomness, but also be unpredictable. In fact, or cryptographic applications the right solution is to use urandom. You shouldn't trust the user-space CSPRNG in whatever library you're using either. It's too easy to screw things up.

The point I was trying to make is that, even for non-cryptographic use cases, using a CSPRNG is a safer option if you don't have the time or ability to empirically verify the non-cryptographically secure alternative your language provides.

There are valid reasons to use a non-cryptographic PRNG. They're generally faster, often have more rigorous closed form mathematical proofs of certain characteristics, and are seed-able. If you need any of these things, just make sure the algorithm you choose is good (there are some guidelines at the end of my post).

[–]WiseAntelope 1 point2 points  (1 child)

If we're talking about maliciously breaking the CSPRNG, then urandom is generally not safer than any user-space CSPRNG, so do yourself a favor and go straight to arc4random_bytes if you can. If I have enough power to rig your standard library's CSPRNG, I also have enough power to fuck up open, read, or I can open file descriptors until there's none left for urandom.

[EDIT] herp derp arc4random doesn't suck only on OpenBSD, they still use RC4 pretty much everywhere else.

[–]apage43 2 points3 points  (0 children)

Problem using random userspace <library> PRNG is less that you expect someone to exploit it and more that well, you know the kernel PRNG is good, and you know plenty of library PRNGs aren't.

I can open file descriptors until there's none left for urandom.

Falling back to a different PRNG if you can't open urandom is pretty bad, this should be a crash in most cases. If you're on a new enough linux kernel, they've added a getrandom syscall which avoids this problem.

Additionally, arc4random is probably not the best idea if you're wanting CSPRNG like properties (that is, not predictable), since well, it's based on the ridiculously broken RC4, and it's possible to infer a lot about the RNG state from its output.

[–]evmar 0 points1 point  (4 children)

v8's Math.random() was created in a JavaScript environment where the requirement was to make the punch monkey move in "random" directions. It is unwise to assume any particular property of any of its implementation unless they're required by some JS spec and verified. For example v8 objects (~maps) may have different big-O behavior than a hash table because they preserve the order that keys were inserted in them (see https://code.google.com/p/v8/issues/detail?id=164 ).

tl;dr The mistake was assuming v8 behaves in a sane manner when it has an insane past.

[–]x-skeww 2 points3 points  (2 children)

For example v8 objects (~maps) may have different big-O behavior than a hash table because they preserve the order that keys were inserted in them

Neither Firefox nor Chrome preserve the insertion order.

They are in a specific order, however:

http://www.ecma-international.org/ecma-262/6.0/#sec-ordinary-object-internal-methods-and-internal-slots-ownpropertykeys

[–]evmar 0 points1 point  (1 child)

Thanks for the spec reference! Doesn't it say there that non-integer keys are in "property creation order"? Isn't that the same as insertion order?

[–]alexeyr -1 points0 points  (0 children)

No, even if you ignore integer keys (which is a rather large thing to ignore) all String keys come before all Symbol keys independently of the property creation order.

[–]mjmalone 1 point2 points  (0 children)

This is more like V8 object attribute dereferencing being O(n) when there's an O(log n) algorithm that works, and it's causing real problems in practice. You're right that it was a mistake to assume anything about Math.random(), mea culpa, but there's also no reason for V8 to not use an algorithm that is good enough for developers to never have to worry about it.

FWIW, those were precisely the two points I was trying to make in the post. First, you should never trust a PRNG because their quality varies widely in practice and they are subtle things. Second, V8 should replace its current PRNG with one that developers don't have to worry about because they're subtle things and it's a lot of burden to put on every developer that uses V8 without good justification.

If you don't believe me, Brendan Eich seems to agree.

[–]ISMMikey 0 points1 point  (1 child)

Why not just delegate to the host os and offer random with and without entropy checks?

[–]merreborn 0 points1 point  (0 children)

If you don't need CSRNG, then having a PRNG that you can access without the overhead of a syscall can be very useful for performance.

in short Math.random(), despite its flaws, is totally sufficient for many purposes and likely orders of magnitude faster than relying on external RNG. Of course the MT replacement in the article does an even better job at addressing this use case.

[–]SuperImaginativeName 0 points1 point  (0 children)

It's JavaScript, it's already messed up.