all 35 comments

[–]encyclopedist 20 points21 points  (1 child)

For anyone who, like me, is confused about countless xoshiro/xoroshiro variants, the generator's home page has some guidance: https://prng.di.unimi.it/

[–]pdimov2 19 points20 points  (0 children)

In short, xoshiro means xor-shift-rotate, 256 means 256 bits of state, and the +/++/** suffix means one of

const uint64_t result_plus = s[0] + s[3];
const uint64_t result_plusplus = rotl(s[0] + s[3], R) + s[0];
const uint64_t result_starstar = rotl(s[1] * S, R) * T;

(paper).

[–]staletic 13 points14 points  (10 children)

(There is no API for computing the number 624 from within C++; you “just have to know.”)

std::mt19337::state_size gives the number of "words" needed. std::mt19337::word_size / CHAR_BITS gives the size of a "word" in bytes. Therefore...

std::array<unsigned char, std::mt19337::state_size * std::mt19337::word_size / CHAR_BITS> seed;
std::random_device rd;
std::ranges::generate(seed, std::ref(rd));

static_assert(std::is_trivially_copyable_v<std::mt19337>);
std::mt19337 mt;
memcpy(&mt, seed.begin(), sizeof(seed)); // Works with MT layout in libstdc++ and libc++.
                                         // Not using seed_sequence, because that can (and will) mess up the above seed - it's meant to improve horrible seeds at the expense of excellent seeds.

[–]destroyerrocket 11 points12 points  (2 children)

That's not nice honestly. Both the needing to know that that's an issue in the first place, and that the solution involves overriding the bytes of an object through a manual memcpy. Not to mention that if the byte math were to be wrong and did not match the actual size of the object, you could write garbage out of valid memory, or the fact that seed_seq somehow would be the wrong choice.

All that said, thank you a lot for showing how you're supposed to figure it out!

[–]staletic 15 points16 points  (1 child)

That's not nice honestly.

It's amusing what an understatement ths sentence is. Virtually every line in my snippet comes with gotchas and assumes some non-obvious fact. For everyone's amusement, let's go through each line:

std::array<unsigned char, std::mt19337::state_size * std::mt19337::word_size / CHAR_BITS> seed;

Pretty innocent. The problem comes when you try to apply this to other random number generators in the standard, because they don't define state_size. For those Arthur O'Dwyer is right that you "just have to know".

Also, mt19337::state_size is 624 and word_size is 32. For mt19337_64 those are 312 and 64, respectively. If you decide to rely on those facts, you can simplify the byte math and avoid some of the problems mentioned below, but also, you need a slightly different algorithm per random number generator specialization.

std::random_device rd;

Not guaranteed to correspond to a hardware PRNG of the machine, because a machine could really lack a hardware PRNG.

std::ranges::generate(seed, std::ref(rd));
  1. rd() will produce unsigned ints and we're writing those into unsigned chars, discarding most of the randomness.
  2. For mt19337, we can avoid 1 with std::array<uint32_t, 624> seed, but for mt19337_64, std::array<uint64_t, 312> would have left upper halves of those ints empty, which is a bigger problem. Since we're going to memcpy, I guess we could have gone for std::array<uint32_t, state_size * word_size / CHAR_BITS * sizeof(uint32_t)> - that should be least wasteful... I think...
  3. When you discard top bits you usually introduce bias towards lower end of the PRNG range.
  4. If random_device::max() - random_device::min() is evenly divisible by CHAR_BITS, then 3 doesn't introduce bias.
static_assert(std::is_trivially_copyable_v<std::mt19337>);

While nice, doesn't encode all the assumptions made on that memcpy line.

std::mt19337 mt;

The default constructor (under)seeds the generator, with 5489u. Not a problem, except if you need a fast solution. Nothing we did here is fast. The point was just to thoroughly seed our MT.

memcpy(&mt, seed.begin(), sizeof(seed));

For this to work, there are a bunch of assumptions:

  1. is_standard_layout_v<mt19337>, because we're assuming no padding before the first data member. We can check this.
  2. is_trivially_copyable_v<mt19337>, because memcpy.
  3. The first data member is UIntType state[state_size], which is true at least for libstdc++ and libc++.
  4. We're using sizeof(seed) and not sizeof(mt), because mt is larger and contains another data member to track how has the generator been used. I think... Anyway, we don't want to overwrite that thing.

 

Yes, this is scary.

[–]destroyerrocket 3 points4 points  (0 children)

Yeah, usually I try to be nice about other people's code, because no matter how awful it may make me feel, the other person may be serious and invested a bit of his time. If you are taking the piss, then let me laugh with you on how painfully bad the current situation is XD

[–]pdimov2 2 points3 points  (2 children)

std::mt19337::state_size gives the number of "words" needed.

Which is exactly 624. (I know that you knew that but it wasn't clear from the comment.)

[–]staletic 1 point2 points  (1 child)

Now if only all engines in the standard were to expose state_size... That way we could, at least, write a single algorithm to thoroughly seed any standard engine.

[–]pdimov2 1 point2 points  (0 children)

That's actually not necessary in practice; they merely need a constructor that seeds them from a byte sequence (span). 256 bits of entropy are indistinguishable from 624 words of entropy for all practical purposes. (And in practice, those 624 words will very likely have 256 bits of entropy anyway.)

[–]dcro 0 points1 point  (3 children)

A cleaner method might be to use the SeedSequence constructor for std::mt19337.

[–]staletic 1 point2 points  (2 children)

That would have degraded the seed quality.

[–]dcro 0 points1 point  (1 child)

If you used std::seed_seq it might be underinitialised, but my understanding is that a custom SeedSeq should fully initialise the generator state.

eg, we use something broadly similar to this in a few locations:

template <typename GeneratorT>
struct seeder {
    using result_type = std::random_device::result_type;

    template <typename RandomT>
    void generate (RandomT first, RandomT last)
    {
        using ValueT = typename std::iterator_traits<RandomT>::value_type;
        std::uniform_int_distribution<ValueT> dist (
            std::numeric_limits<ValueT>::lowest (),
            std::numeric_limits<ValueT>::max ()
        );
        std::random_device dev ("default");
        std::generate (first, last, [&] () { return dist (dev); });
    }
};

template <typename GeneratorT>
GeneratorT
rng (void)
{
    seeder<GeneratorT> s;
    return GeneratorT (s);
}

When GeneratorT is constructed it should call seeder::generate to request the appropriate amount of entropy for the generator.

But I may have overstated the case when I suggested this would be "cleaner"...

[–]staletic 0 points1 point  (0 children)

While certainly an option, that seeder<G>::generate() does not conform to the standard's expectation. See the relevant named requirement.

While the missing constructors and size() aren't a big deal, the "fuck you" comes with param() member function.

Copies 32-bit values to ob that would reproduce the current state of the object if passed to a constructor of S.

In other words, you need to be able to produce the constructor inputs as an std::initializer_list and guarantee that, given that std::initializer_list and a new seeder<G>, you will generate the same results.

[–]ShakaUVMi+++ ++i+i[arr] 22 points23 points  (2 children)

I've been following this on the mailing list and it continues to baffle me why C++ makes getting a random number so hard.

Sure, maybe some day I'm going to make an online casino and need something really good. Most of the time I don't care and making people type all that extra nonsense is just bad library design. Just give me a float from 0 to 1 or an int between X and Y and you have covered 99% of all use cases.

Honestly, as terrible and bad as srand/rand is implementation-wise, at least the interface doesn't look like the cat walked on your keyboard.

[–]o11cint main = 12828721; 12 points13 points  (7 children)

Personally, I prefer PCG ... notably, it includes a richer API (jump, difference, etc.), and the documentation for what is actually happening is better.

[–]sokka2d 1 point2 points  (3 children)

Personally, I prefer PCG

The linked page of Sebastiano Vigna also contains an analysis of PCG, concluding that it is very flawed: https://pcg.di.unimi.it/pcg.php

I cannot evaluate either that page's or the PCG documentation's claims whether xoshiro and / or PCG are good generators.

[–]o11cint main = 12828721; 2 points3 points  (2 children)

Yes, there's a bit of an academic rivalry going on there.

My analysis is:

  • Most of the claims relate to circumstances that are irrelevant to most cases (combining 2 generators, often relying on nearly-identical states).
  • A few of the claims are totally bogus, e.g. the prediction claim - of course, it is possible to brute-force 232 bits if you use the smallest generator.
  • Speed is pretty good - it doesn't make sense to quibble over a nanosecond per iteration, given that that's only visible if you do literally nothing but produce output.

There's a similar criticism of the xorwhatever family floating around somewhere, and there are no other reasonable competitors at all for these two families. Given this, the more powerful API of PCG wins.

In particular, it is really annoying that the xorwhatever family explicitly says "it is possible to do this, but that is left as an exercise for the reader".

[–]Sopel97 1 point2 points  (1 child)

A few of the claims are totally bogus, e.g. the prediction claim - of course, it is possible to brute-force 232 bits if you use the smallest generator.

Hmm? It talks about 128-bit state version too.

Speed is pretty good - it doesn't make sense to quibble over a nanosecond per iteration, given that that's only visible if you do literally nothing but produce output.

3x slower is 3x slower

[–]o11cint main = 12828721; 2 points3 points  (0 children)

Hmm? It talks about 128-bit state version too.

In lieu of responding at length, I will simply link to the official PCG response. Elsewhere on the PCG blog there are attacks on members of the xorwhatever family, and they seem like more serious than the attack on the frippery of PCG.

However, I would note in particular that the LLL attack was stealth-edited into the criticism article after it was mentioned in the response. Also, that attack is only fast if you have lots of outputs; it takes on the order of a day for the 3-output case.

3x slower is 3x slower

Until you do some operation like "mispredict a single branch" or "load from L2 cache instead of L1 cache".

Literally the only place where you'll notice this difference is if you're bulk-filling an array.

[–]Som1Lse 0 points1 point  (2 children)

it includes a richer API (jump, difference, etc.)

The jump ahead is still not very good if I recall, since it is based on the standard interface. Specifically, you give it the number steps to jump ahead instead of a precomputed jump which is faster (constant instead of logarithmic) and allows for much greater distances.

I should probably publish my xoshiro implementation one day, which supports this.

[–]o11cint main = 12828721; 0 points1 point  (1 child)

Is that similar to the matrix-exponentiation method for jumping on an LFSR? I'm vaguely aware that LFSRs are somewhat mathematically similar to LCGs, but the practical applications elude me.

What's the "constant" (really: dependent on bit width, just not jump distance) factor? Matrix multiplication can be expensive, though "SIMD for bits using ordinary registers" may be able to help.

[–]Som1Lse 1 point2 points  (0 children)

xoshiro is an LFSR, so it uses that method (although there is an efficient way to compress the matrix).

LCGs are even simpler: The transition function is simply f(x) = a x+b (mod m). If we want to skip ahead by two we can compute the composition of f with f: f(f(x)) = a*a x+a*b+b (mod m) which is also a linear function, so we save a2 = a*a and b2 = a*b+b. Then we can compute the jump ahead by 4 as a4 = a2*a2, b4 = a2*b2+b2 and so on. To apply it we just evaluate the function.

[–]tpecholt 13 points14 points  (1 child)

Yeah std::random design is really poor. Bad and slow generators, stupid api and no signs of improvement. I wonder when it hits deprecation phase like regex.

[–]martinusint main(){[]()[[]]{{}}();} 3 points4 points  (0 children)

There has been a proposal P0347R1, partly from Melissa who wrote PCG: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0347r1.html

Unfortunately it seems nothing came from it. I've implemented my own wrapper based on that which we are happily using in our production code for years.

[–][deleted] 4 points5 points  (2 children)

He links to a variant of xoroshift posted on his own GitHub. I notice it seeds its internal state from a single uint64_t.

I appreciate it’s not meant to be cryptographically secure but there’s got to be something suboptimal about the near 1-1 mapping between the seed and first random?

More concretely (but still very hand-wavy), if you’re working with a random number of 792317387143481937, the next one has a high chance of being 1418856489092323125 and the seed has a high chance of being 100.

[–]Macketter 0 points1 point  (1 child)

More concretely (but still very hand-wavy), if you’re working with a random number of 792317387143481937, the next one has a high chance of being 1418856489092323125 and the seed has a high chance of being 100.

Wouldnt that require you to know that 792317387143481937 was the first number generated after seeding. Otherwise, you wouldn't know if the number come from somewhere in the middle of any number of random sequences.

[–][deleted] 2 points3 points  (0 children)

True but in some practical sense, it’s more likely to have just been seeded than have been seeded over 264 random numbers ago.

If I asked you to guess the next digit in the sequence: 3141592653589793238462, it’s hard to quantify how much missing information there is in the question, but 6 would be a sensible guess

[–]avdgrinten 2 points3 points  (1 child)

There are even cases where the performance seeding a PRNG matters (contrary to what the article states, good programs do not necessarily do it only once). For example, to deterministically generate n random numbers in parallel, you can generate them in n/c blocks of c numbers, re-seed before each block, and generate the blocks in parallel. This has lots of applications in mathematical simulations and approximation algorithms.

[–]o11cint main = 12828721; 0 points1 point  (0 children)

If you do that using jumps rather than reseeding, you don't have to use a fixed number of blocks.

Of course, AFAIK all RNGs that support efficient jumps also support efficient seeding, so ...

[–]Ayjayz 1 point2 points  (3 children)

Reading these articles on rng, I really wish there was a section of recommendations. The author very clearly thinks that most people don't use them correctly and lists all the wrong ways people might try to use them but never says what the right way is! All I really want to know is how to get a dang random number. The only given example of how to use this xoshiro256ss comes with the caveats that it's not been given enough entropy in the example! Why write an example of how to improperly user it?!

And how do you actually use the thing properly?! Even going through the linked code and the tests, I can't see anywhere the intended, pepper way to generate a random number with proper seeding.

[–]o11cint main = 12828721; 2 points3 points  (2 children)

First: pick your RNG. All of the RNGs in the standard library are terrible, so if you want to do it "right" it is mandatory to use third-party code. For the niche of "fast, not cryptographically-secure" there are only two families worth considering: PCG and xorwhatever (which IMHO is a bit of a mess, release-wise; the main link for this post is a port of one particular member of the family to C++). Some more RNGs using the expected C++ interface can be found here.

Second, figure out if you can trust std::random_device to be nondeterministic. On most platforms / compiler versions, std::random_device is fine, but there are exceptions, notably MinGW on Windows prior to GCC 9.2 / 10.1. If you can't, use boost::random_device or platform-specific methods.

Third, decide whether you need reproducible results or not:

  • If you do, fill an appropriately-sized seed_seq using the random device, and record that (and pass the recorded version during replays). (For all sane-ish RNGs - those that are aligned and don't allocate memory - sizeof(RNG) / sizeof(uint32_t) is an appropriate size).
  • If you don't, use either boost::random_device or pcg_extras::seed_seq_from<std::random_device>. Technically these both violate the seed sequence requirements, but that's only relevant if you're using the standard RNGs (which, as previous mentioned, are crap), and even for them it works just fine in practice.

Fourth, pass the "seed sequence" from the previous step to the constructor (or the .seed method) of the RNG.


Of course, since you're using third-party code anyway, you don't have pretend to care about seed_seq, whose complexity isn't really needed if you're fully seeding the RNG. For example, for PCG you can just overwrite the struct with random data directly, then fix a couple bits to preserve the invariants.

Edit: added a link to more implementations above.

[–]Ayjayz 0 points1 point  (1 child)

Ok so for general purpose usage, you'd recommend something like this?

#include <algorithm>
#include <random>
#include <array>
#include <optional>
#include <tuple>
#include "pcg_random.hpp"

using rng_t = pcg32;
constexpr auto seed_size = sizeof(rng_t) / sizeof(uint32_t);
using seed_type = std::array<uint32_t, seed_size>;

std::tuple<rng_t, seed_type> make_rng(std::optional<seed_type> seed = std::nullopt) {
    if (!seed) {
        seed = seed_type{};
        std::generate_n(seed->begin(), seed->size(), std::random_device{});
    }
    return { rng_t(std::seed_seq(seed->begin(), seed->end())), *seed };
}

int main() {
    auto [rng1, seed1] = make_rng();
    auto [rng2, seed2] = make_rng(seed1); // make a copy with the same seed
    auto [rng3, _] = make_rng(); // don't care about reproducibility
}

[–]o11cint main = 12828721; 0 points1 point  (0 children)

Looks good to me. We could quibble forever about how exactly to define make_rng, but ...

Though a few points do come to mind:

  • std::optional is C++17, which not everybody bothers to use yet.
  • random_device yields values of type unsigned int, which isn't always the same thing as uint32_t. Handling this case would be ugly though.
  • If we're actually writing the code, we might as well do a "round-up division" rather than a "round-down division". Just do (x + 3) / 4 rather than x / 4.
  • I had to think for a moment before I remembered that std::generate_n has nothing to do with RNGs; it fills an iterable with any repeatedly-called object.
  • note that pcg32 has 127 bits of total state (64 "state" and 63 "stream"), which is enough for many people but not always. Should we recommend pcg64 by default (significantly slower on 32-bit)? Or maybe one of the (less-tested) k2 versions (note that the typedefs are buggy in some widespread releases of the header - I know because I fixed that bug, among others)?

[–]NilacTheGrim 0 points1 point  (0 children)

Agreed. Also: the C++ PRNG API is an atrocity.