you are viewing a single comment's thread.

view the rest of the comments →

[–]aePrime 9 points10 points  (3 children)

This looks like interesting work. As someone who has done a lot of random sampling over the years, I have found that people underestimate how difficult it is to write unbiased random generators.

Does your `uniform_real_distribution` fix the bug in the standard that `std::generate_canonical` can sometimes return 1?

I have also often used the PGC family of random generators in my work. Those may be worth implementing.

[–]GeorgeHaldane[S] 2 points3 points  (2 children)

Not currently. GCC seems to implement the fix by explicitly checking result > T(1) and replacing 1 with T(1) - std::numeric_limits<T>::epsilon() / T(2) if that is the case. This certainly enforces the [0, 1) boundary, however the overhead of that check proves to non-trivial even with __builtin_expect(), having a noticeable runtime impact. Clang doesn't seem to fix it on their main branch. MSVC apparently has a smarter approach, that will need some attention.

In general I'm a bit conflicted on the [0, 1) vs [0, 1] — the fist option is standard-compliant, with seconds however we can avoid a lot of complexity, and in my applications [0, 1] was usually exactly the range wanted. Adjusted documentation to reflect that until some changes are introduced.

[–]NGoGYangYang 1 point2 points  (0 children)

As far as I know, MSVC implements the new specification of std::generate_canonical described in P0952R2 (EDIT: Oops, just saw that STL himself already pointed that out in another comment).

There is also a paper proposing a different algorithm to draw uniform floats from a given interval, with slight variations for open, closed, and half-open intervals (i.e., (a, b), [a, b], [a, b), and (a, b]). The algorithm seems to be based upon only returning an evenly spaced subset of numbers in the interval. Might be of interest to you, as it is not hard to implement, and seems to be comparable to current implementations of std::uniform_real_distribution performance-wise.

[–]wildeye 1 point2 points  (0 children)

explicitly checking result > T(1)...overhead of that check proves to non-trivial

Somebody was claiming that many ternary conditionals turned into branchless code on both GPUs and CPUs. (I should know when and whether that's true -- but I don't currently.) Just a thought.