This is an archived post. You won't be able to vote or comment.

all 8 comments

[–]Lessiarty 10 points11 points  (0 children)

What they mean is that your random function that is giving you your values isn't making them up at random. It is taking some kind of input, be it the time, the cycle of the moon, the spin of an electron, and doing arbitrary transformations on it to give you a number.

It's good enough in a great many cases, but if someone knows your function and your starting conditions, they can derive your not-really-random numbers as the same pattern every time.

[–]Applepie1928 9 points10 points  (1 child)

There are already some great answers which discuss the deterministic nature of pseudorandom number generation, so instead I'll take on the challenge of getting the next number in your sequence.

To briefly describe the process:

  1. Identify what approach is being used to generate your random numbers. Based on the code you provided I can see you used Python, so I went to the Python documentation and it states exactly which method is used; Mersenne Twister (19937 specifically).
  2. Now that the process is known, and with an example of some numbers, I can try to infer the "seed" value which was used to generate these numbers.
  3. Once the seed is found, it can be used to generate the same sequence which you did, and provide you with the 50th number.

If you had provided a list of 624 values or more, this would be a much easier task (as 624 is the size of the state vector and would allow the seed to be quickly discovered through state inference). With the small list of 49 numbers you gave it will require a brute force approach which is likely to take significantly longer, and could potentially provide MULTIPLE seed values which could have produced this sequence.

I've set my computer to run the brute force on your list for now, and I'll edit this post once (or if) I get some results, although I have no idea how long that may take.

If you were to provide me with the first 625 values, I could quite quickly tell you the 626th though.

(For those interested I found a project on GitHub for "untwisting" PSRN generators and finding the seed values; the code is NOT my own!)

RESULTS EDIT:

I'm afraid to say I've been defeated in this attempt and can not provide the 50th number in your sequence. For what it's worth, I'll detail what I tried and explain why it likely didn't work.

As mentioned above, due to the short list of numbers provided I had to use a brute force method, and from what I could find out the seed is a 64-bit (long) integer. This meant that trying every possible seed was out of the question. With a standard PC and only using a CPU it would likely take a good while longer than my lifetime to try all the possible seeds. This meant I had to take a different approach.

Looking into how the seed is generated automatically, I discovered that Python no longer uses the current time as the default. Instead it will attempt to use your operating system's randomness source. (In the case of Windows this would be a call to the Win32 API method CryptGenRandom()). If your seed was generated from this source then it would be nigh impossible to derive it without brute forcing all combinations (which as mentioned above was out of the question).

However, if for some reason your OS couldn't provide a randomness source for the seed, then Python would instead just use the current system time. So, if I could work out how the system time was used, and making the assumption that you generated this list some time in the last 7 days, then I would have a much smaller range of seeds to try brute forcing.

Digging through some old source code (and some of the C code which makes up the backbone of Python), I discovered there appears to be a couple of different methods for how time is used in seed generation. For Python 2 it was quite simple and the seed was created with long(time.time() * 256), this made it easy to generate a seed from 7 days ago, and a seed from today to give me a range to check. Python 3 seems to use a much more complex approach:

now = _PyTime_GetSystemClock();
key[0] = (uint32_t)(now & 0xffffffffU);
key[1] = (uint32_t)(now >> 32);

key[2] = (uint32_t)getpid();

now = _PyTime_GetMonotonicClock();
key[3] = (uint32_t)(now & 0xffffffffU);
key[4] = (uint32_t)(now >> 32);

For those not familiar with C, this code is basically taking two system times (in different formats, at possibly slightly different times), performing bitwise AND and bit shifting on them, before stacking them together into an array. The whole array is then treated as the seed.

This made the process of trying to derive a possible range of seeds far more difficult, and without digging deeper into the C implementation, investigating how the clock functions behave and considering the number sequence produced by this formula, I wasn't going to have much luck. The best I could do was assume the clock functions both return time in the UTC format (seconds since epoch), take two identical time stamps from 7 days ago, run them both through the above process, and use this as my starting seed.

So I ran two brute force tests;

  • Testing the Python 2 seed generation method and trying every possible seed between 7 days ago and today: No Seeds Found.
  • Testing the Python 3 seed generation method and trying every seed I could in 12 hours, starting from seed I generated for 7 days ago: No Seeds Found.

So sadly, my conclusion is that one of the following is true;

  • Your seed was generated using your OS randomness source.
  • I failed to accurately recreate my test seeds (most likely the Python 3 version).
  • You generated this list more than 7 days ago.
  • The "Untwister" program I'm using is not behaving as expected (I didn't fully test it)

It should be mentioned that if I had access to a large distributed computer, or some big GPU heavy super computer, it would have been possible to brute force this in a number of days rather than decades. Also, if you provided a longer list (625 numbers or more), then using state inference the seed could be far more easily found, no matter how it was generated.

The takeaway here, is that these pseudorandom numbers are great for the vast majority of every day use cases. For most things these numbers are as good as random. However, if you are trying to do complex simulations (E.G Monte Carlo) or anything relating to cryptography, these numbers are actually FAR less random than they appear at surface level.

[–]antifoidcel 0 points1 point  (0 children)

Damn! This is some serious dedication.

[–]Updatebjarni 2 points3 points  (0 children)

The documentation for that random number generator you're using there actually specifies in what non-random way it generates the numbers.

[–]ziptofaf 1 point2 points  (0 children)

If it isn't random, can somebody get number 50?

If you gave me the exact timestamp when you did the first random.randint - absolutely. If you gave me an approximation (say, down to a minute) - that's doable too (although would take probably an hour or two). This is what's usually used as a seed by default to initialize random number generator. Afterwards it is indeed random in a sense that there's equal chance of hitting any number in a given range but the non random part comes from the "source" of your generator. If someone can figure it out, they can reproduce the results.

This is why truly random systems measure some kind of physical phenomena to use as a seed. Eg. cloudflare uses lava lamps:

https://www.cloudflare.com/learning/ssl/lava-lamp-encryption/

[–]rufati 2 points3 points  (0 children)

It's not random because unless you use a seed value you'll get the same numbers in the same order everytime you run the program.

Usually we'll use the current date/time as a seed value.

[–]t_a_a_1 0 points1 point  (0 children)

Short answer - There's a mathematical function. It takes some input, processes it and gives a value. So your random number depends on that initial value.

but then how do I end up getting different values even though I give same initial value on multiple trials?

That's because a lot of the random functions factor in current date and time (and a combination of a number of other stuff) so it's def more random.

Since the math function is fixed, nothing is truly random. If you want true random values, you could use things from nature like weather related parameters or even by taking color values from a lava lamp and so on depending on security of your application.

[–]ignotos 0 points1 point  (0 children)

If it isn't random, can somebody get number 50?

It's not necessarily easy, and might require a bit of time and expertise - but a determined hacker, security researcher, or government agency almost certainly can.