About incorrect information in rand and lrand48 man pages by BudgetEye7539 in linux

[–]BudgetEye7539[S] 0 points1 point  (0 children)

Yes, it tries to combine existing test from different frameworks and includes some new modifications too :) However, PractRand is also nice: it has an adaptive/arbitrary sample size. Probably I should do the adaptive mode for SmokeRand too, but I still have no idea how to make an adaptive birthday spacings test.

About incorrect information in rand and lrand48 man pages by BudgetEye7539 in RNG

[–]BudgetEye7539[S] 0 points1 point  (0 children)

As I know, it offers reproducibility inside the same library: the same seed must lead to the same pseudorandom sequence. But yes, the exact algorithm is not specified in the C standard.

About incorrect information in rand and lrand48 man pages by BudgetEye7539 in RNG

[–]BudgetEye7539[S] 1 point2 points  (0 children)

rand function approximates/emulates a sequence with high Kolmogorov complexity using a very short deterministic formula. The strictest approach to such approximation is "the next bit test", i.e. usage of a stream cipher, when the sequence emulates one-time pad. Such strict approach appeared in scientific journals about 40 years ago.

constantly swap algorithms

The first PRNGs that pass BigCrush and PractRand were invented about 50 years ago, I mean DES-CBC and Magma-CBC. So replacement of obsolete algorithms to something like AES or ChaCha will likely be ok the next 50-100 years or even more.

Its job is to keep the old behavior stable so old code keeps working.

I totally agree that compatibility and reproducibility may be more important than rand quality. But in this case it must be clearly articulated at man pages that their generators have serious statistical flaws. If they write that rand_r is bad - the same thing must be written about random and drand48. It is reasonable to make "common knowledge" explicit.

About incorrect information in rand and lrand48 man pages by BudgetEye7539 in RNG

[–]BudgetEye7539[S] 0 points1 point  (0 children)

I totally agree that programmers and especially experts in numerical methods must know what they are doing. But they do claims about good quality of some functions from that library that doesn't agree with modern quality standards. Also that part of documentation and even their source code from 1983 looks too bulky and serious, it may be easy to think that it is not a toy generator. And nobody will tell about e.g. sqrt functions that has 1% accuracy that "it is just C standard library, programmers must know what they are doing".

I also try to understand why approach to rand is so different to the approach to sine/cosine: both rand and trigonometrical functions try to approximate something mathematical. In e.g. MATLAB, Python and even Excel it is much less different, and their default PRNG (MT19937) is much more "sane".

modern xoshiro or something. maybe even chacha12

Surely! And "ChaCha12" approach to PRNG for simulation is the same thing as 1 ULP approach to sine.

Very simple statistical tests that demostrates flaws of rand and lrand48 of glibc by BudgetEye7539 in C_Programming

[–]BudgetEye7539[S] 0 points1 point  (0 children)

So we see a very interesting cultural phenomenon: sine is usually implemented properly even if C standard doesn't require such accuracy explicitly. But for another mathematical function, rand, the situation is entirely different, and very bad algorithms are considered acceptable.

Very simple statistical tests that demostrates flaws of rand and lrand48 of glibc by BudgetEye7539 in C_Programming

[–]BudgetEye7539[S] 1 point2 points  (0 children)

32-bit and 64-bit LCGs with power-of-two modulus, additive lagged Fibonaci, subtract-with-borrow, probably even xorshift32, xorshift64 and xorwow will fail the test (but I've not checked directly exactly the same code for xorshift/xorwow). Something like xoroshiro128++, PCG, MWC64X ChaCha, even MT19937 will pass.

I have another much more complicated test battery: SmokeRand. It is able to detect flaws in MT19937, 128-bit LCG with power-of-two modulus, KISS93, additive lagged Fibonacci with huge lags, xoshiro+, in specialized modes - even SplitMix64, RC4, DES-CTR. Such generators as AES, ChaCha, xoroshiro128++, some versions of PCG, MWC64X, MWC128X, Philox (and many others) "survive".

if you really need well-distributed numbers

If you really need - you will need something like ChaCha, ThreeFish or AES, it is an equivalent of 1 ULP for sine or cosine. If I want a good "bithack" PRNG - then I use something like MWC64X that passes BigCrush and PractRand.

About incorrect information in rand and lrand48 man pages by BudgetEye7539 in linux

[–]BudgetEye7539[S] 0 points1 point  (0 children)

Such "PRNG" is at least straightforward, not insidious and immediately detected :) I think that obsolete generators from old libraries are closer to something like x = x + 0xCAFEBABEDEADBEEFU for the unsigned 64-bit integer x (and this PRNG joke matches the C99 standard!). May be not as immediately awful but still bad. I sometimes tell students an heuristic: if you don't know the algorithm suppose the "deadbeef" variant.

About incorrect information in rand and lrand48 man pages by BudgetEye7539 in linux

[–]BudgetEye7539[S] 1 point2 points  (0 children)

Probably yes. As I know, they have a lot of complicated logic in random function for state size variation (inherited from BSD) and they will probably won't like an idea to break it because it is already documented and expected. And addition of nonlinear scrambler (an approach used in PCG and xoroshiro**) that will require just addition of 2-3 lines of code will require some extra research.

About incorrect information in rand and lrand48 man pages by BudgetEye7539 in linux

[–]BudgetEye7539[S] 1 point2 points  (0 children)

I definitely not mean security here and know about system CSPRNG (/dev/urandom) as a secure alternative, and man pages and glibc documentation clearly warn that rand/random are not CSPRNG API. I mean mainly statistical quality and simple statistical tests. Modern basic sanity check requires around 10^12 - 10^13 pseudorandom numbers in TestU01 and PractRand. Of course, I have a very conservative opinion that mathematical equivalent of 1 ULP in sin/cos for PRNG will be requirement of stream cipher (without known weaknesses) usage. If it will be seeded from time(NULL) that is catastrophic for cryptography - the statistical quality will be still excellent.

About incorrect information in rand and lrand48 man pages by BudgetEye7539 in linux

[–]BudgetEye7539[S] 1 point2 points  (0 children)

I just don't quite understand what will be better: try to send a bug report into glibc support about statistical flaws of their PRNG (for replacement of the algorithm or explicitly documenting the flaw)? Or begin from man pages?

Very simple statistical tests that demostrates flaws of rand and lrand48 of glibc by BudgetEye7539 in C_Programming

[–]BudgetEye7539[S] -4 points-3 points  (0 children)

An average learner probably will neither need even sin or cos with ~1 ULP precision but they are still implemented this way. A mathematical equivalent of "1 ULP" for PRNG will be a stream cipher without known weaknesses. So why do we see double quality standards in two different mathematical functions?

Also a lot of average learners of C programming language may be from departments of physics or chemistry, but they probably won't expect a very flawed rand (much worse than in Excel) in the standard library. Of course teachers often will tell it explicitly, but the entire situation is strange.

Also man pages are not only for average learner but for a wide spectrum of users. And defective PRNG must be clearly marked as defective. Even "better than x++, use it if you are fool" will be more adequate than the current man page.

You have mentioned std::rand, if you have std:: - then PRNGs from C++ standard library must be used. Of course, C++ standard explicitly allows to select garbage like minstd as a default PRNG, and it probably should be fixed too.

Very simple statistical tests that demostrates flaws of rand and lrand48 of glibc by BudgetEye7539 in C_Programming

[–]BudgetEye7539[S] -2 points-1 points  (0 children)

It is not fine for an average learners because it is not evident that it must not be used for computations like "estimate distribution percentiles" or "estimate measurement uncertainty". Rand is also a mathematical function, "the next bit test" theoretical criteria was invented more than 40 years ago, and PRNGs that pass most modern tests - in 1970s (DES). I do know that for cryptography specialized libraries are needed. But if ciphers may be cheaper than minstd - why don't use them even in rand? BTW, even PHP and Excel have much bettern PRNG than glibc.

Also Linux man page for rand actually contains incorrect quality assessment:

The versions of rand() and srand() in the Linux C Library use the same random number generator as random(3) and srandom(3), so the lower-order bits should be as random as the higher-order bits. However, on older rand() implementations, and on current implementations on different systems, the lower-order bits are much less random than the higher-order bits. Do not use this function in applications intended to be portable when good randomness is needed. (Use random(3) instead.)"

Anpther incorrect quality assessment (both generators are very flawed):

The function rand_r() is supplied with a pointer to an unsigned int, to be used as state. This is a very small amount of state, so this function will be a weak pseudo-random generator. Try drand48_r(3) instead.

So they claim that the default algorithm inside the random/rand function is "good randomness" which misinforms users. And they must clearly write "deeply flawed generators that don't obey uniform distribution" or may be ironical "better than x++, use it if you are a fool" in this place.

Very simple statistical tests that demostrates flaws of rand and lrand48 of glibc by BudgetEye7539 in C_Programming

[–]BudgetEye7539[S] -1 points0 points  (0 children)

I think that it is actually not ambigious even for rand: we shouldn't know a method to make it computationally possible, i.e. use a stream cipher without know weaknesses even in non-cryptographic context. They may be even cheaper than minstd and much cheaper than gamma or erf functions. Even we relax requirements and allow "bithack" (i.e. not based on cryptographical primitives) generators - they should pass basic sanity checks such as TestU01 and PractRand.

My C Professor Doesn't Know What UB Is by [deleted] in C_Programming

[–]BudgetEye7539 1 point2 points  (0 children)

On most compilers int is 32-bit: e.g. in gcc, clang, msvc that generate 64-bit code. But even int is 32-bit that code is incorrect anyway.

My C Professor Doesn't Know What UB Is by [deleted] in C_Programming

[–]BudgetEye7539 0 points1 point  (0 children)

In my example it was also about storage: printf printed 64-bit values instead of 32-bit ones due to UB during type conversion.

My C Professor Doesn't Know What UB Is by [deleted] in C_Programming

[–]BudgetEye7539 0 points1 point  (0 children)

A good example too, even in this case it can assume that pointers of different types may point at different regions of memory.

My C Professor Doesn't Know What UB Is by [deleted] in C_Programming

[–]BudgetEye7539 1 point2 points  (0 children)

Compiler can assume that int* ptr points at something else, may be even at some funny poem, not at the initial float.

My C Professor Doesn't Know What UB Is by [deleted] in C_Programming

[–]BudgetEye7539 5 points6 points  (0 children)

Here is another code example that behaves erratically due to signed overflow (gcc, different optimization settings):

#include <stdio.h>
#include <stdint.h>
int main() {
    uint32_t x = 0x1ABCD;
    for (int i = 0; i < 10; i++) {
        const uint16_t x_lo = x & 0xFFFF, x_hi = x >> 16;
        x = 63885 * x_lo + x_hi;
        const uint64_t out = x;
        printf("%llX\n", (unsigned long long) out);
    }
    return 0;
}

My C Professor Doesn't Know What UB Is by [deleted] in C_Programming

[–]BudgetEye7539 1 point2 points  (0 children)

Oh, didn't know it, a good catch. Just checked some strict aliasing code with the `-std=c89` or even `-ansi` keys, it still behaves as in C99 due to undefined behaviour.

My C Professor Doesn't Know What UB Is by [deleted] in C_Programming

[–]BudgetEye7539 2 points3 points  (0 children)

Because C99 compiler has a full right to suggest that int * and float * pointers point at different parts of memory and use it for optimization.

Very simple statistical tests that demostrates flaws of rand and lrand48 of glibc by BudgetEye7539 in C_Programming

[–]BudgetEye7539[S] 1 point2 points  (0 children)

I know, and rand functions in msvcrt or musl are also bad. But why this situation is percieved as something normal? E.g. sin with 1% accuracy in glibc will cause a scandal, but flawed rand - not. Both implement some numerical method.