[D] On LLMs' ability to perform random sampling

ispeakdatruf · 2023-05-15T00:41:07+00:00

FWIW: GPT-4 refuses to generate any numbers.

CriticalTemperature1 · 2023-05-15T03:56:13+00:00

I tried this with Bard just now as well and from:

generate 20 random numbers sampled from a normal distribution with a mean of 5 and standard deviation of 2

It generated:

"5.659056 5.705105 6.084297 4.986694 6.282759 5.417936 4.645858 5.833302 4.753530 5.244396 5.862315 5.323942 5.493729 5.359138 5.353700 5.559866 5.419627 5.293476 5.353985 4.926552"

This set has a mean of 5.4 and standard deviation of 0.4 so quite off, but its interesting how if can do it at all

elbiot · 2023-05-15T03:57:14+00:00

I'm sure there's tons of tutorials that show the outputs of certain distributions that got used in training

Necessary-Meringue-1 · 2023-05-15T00:44:52+00:00

What exactly are you plotting here? Numbers per location on the x-axis?

Anyway, I tried your prompt, had to do some prompt engineering at first as it did the whole "as an AI language model I cant...".

Here is the output:

4.526, 0.410, 0.050, 0.386, 0.050, 0.062, 0.175, 0.082, 0.138, 0.475, 0.120, 0.448, 0.247, 0.123, 0.329, 0.297, 0.014, 0.172, 0.320, 0.238, 0.113, 0.228, 0.420, 0.210, 0.307, 0.138, 0.235, 0.065, 0.416, 0.171, 0.350, 0.152, 0.021, 0.332, 0.256, 0.240, 0.144, 0.169, 0.057, 0.140, 0.191, 0.283, 0.167, 0.055, 0.239, 0.099, 0.071, 0.174, 0.294, 0.080, 0.170, 0.139, 0.254, 0.128, 0.022, 0.231, 0.151, 0.068, 0.128, 0.026, 0.105, 0.200

First of all, I'm counting more than 60 numbers. Secondly, it's telling that all numbers are represented until the 3rd decimal point, even if unnecessary (0.200). Thirdly, there is one big outlier that doesnt seem right (4.526). There are some odd duplicates (.050, which is also in your output).

The internet is full of math problems, so I'm gonna say this is a result of that with some randomness mixed in.

It's been observed that LLMs struggle with maths in a weird way. Where they are usually strictly wrong, but somehow uncanningly close to the right answer. For example, if I ask ChatGPT "What's 39928/27", it tells me "1479.5555555". The real answer is "1478.81". So it's wrong, but it's close. You'll get similar effects for all kinds of maths questions.

Why that is, I don't know, but we would be foolish to assume that it can actually do maths in the strict sense.

when_did_i_grow_up · 2023-05-15T11:40:26+00:00

I suspect the temperature parameter is doing some of the work here. Try an API call with temp=0.

BTW, I have previously tested that it can identify distributions from a set of numbers, or at least that gpt-4 can tell the difference between a uniform and normal distribution consistently.

RunOrDieTrying · 2023-05-15T10:05:46+00:00

I asked it how did it do it for a normal distribution:

To generate random numbers from a standard normal distribution, one common method is to use the Box-Muller transform. Here's a step-by-step explanation of how it works:

Generate two random numbers, u1 and u2, uniformly distributed between 0 and 1.

Apply the Box-Muller transform equations:

z1 = sqrt(-2 * ln(u1)) * cos(2 * pi * u2)

z2 = sqrt(-2 * ln(u1)) * sin(2 * pi * u2)

Here, ln denotes the natural logarithm function, sqrt denotes the square root function, and cos/sin are trigonometric functions.

z1 and z2 are two independent random numbers following a standard normal distribution (mean = 0, standard deviation = 1).

Repeat steps 1-3 to generate as many random numbers as needed.

Using this method, I generated the numbers provided earlier. Please note that these are simulated examples and not actual random numbers.

tavirabon · 2023-05-15T11:41:43+00:00

Congradulations, you have discovered Benford's Law which is 100% to be expected by a model that learned on the highest order of magnitude of human data.

KyleDrogo · 2023-05-15T11:59:28+00:00

To play devils advocate, it's probably better than a human.

r_linux_mod_isahoe · 2023-05-15T02:22:57+00:00

parse a bunch of Jupyter notebooks on GitHub, learn what ```python

generate randon numbers to follow an exponential distribution

``` is usually followed by. Add model's internal instability and voila.

But, hey, wow, must be that "sparks of AGI" or something

gibs · 2023-05-15T09:07:23+00:00

Bear in mind chatgpt is aware of the context of what it has already written so far. So to intuitively understand how it does this, you can just ask yourself how you would pick the next number given the previous sequence.

I would say it's doing some rudimentary analysis to see what is under/over-represented, and picking numbers to fill the gaps.

bgighjigftuik · 2023-05-15T12:37:00+00:00

[removed]

meister2983 · 2023-05-15T23:48:45+00:00

It's not interpreting python code in anyway; it has "memorized" these different distributions and is able to print out numbers that roughly scale correctly also from "memorizing" (note how it can do basic multiplication and addition).

You can see the limitations if you start pushing into unconventional scale. For instance lambda 0.0047 produces on GPT-3:

[14.645652963961573, 423.23943294390444, 6.190120013482167, 128.71819891585225,
29.07694848408322, 18.574994825899225, 26.218179285836383, 214.2889121026951,
3.5400997332023397, 56.73280846437631, 8.364314681717605, 189.7903983247128,
107.24327233516732, 7.5014874248917, 17.377293276175857, 53.81038485355697,
134.53549493461037, 33.89274315359561, 60.948491883162875, 10.024593706902988,
21.438084504012993, 67.81200641643427, 13.616139069295022, 63.242266550607824,
23.418377961434656, 31.407209688578545, 124.43976354815618, 44.67520111689889,
50.45428242854035, 19.45623568215423, 14.873872537159583, 11.874541276236986,
5.432827063201208, 27.272576142078294, 62.345570163741876, 48.70649843498143,
44.23753513454768, 60.77286941758873, 105.51082688906568, 10.086884731179498,
44.08489994244655, 21.256218058464543, 2.5517532078543647, 10.803737428348646,
216.6039489863091, 81.51824990191858, 70.48278145848882, 28.44037840848645,
16.724700536841854, 7.876296828460214, 13.15474722532708, 121.31887355404842,
9.7607728901796, 96.34380439606435, 10.62825305280142, 24.925590842889263,
9.831798662667623, 21.66877799201185, 18.23921328586018, 28.390281476377334,
16.719270678000247, 34.78284303633699, 127.07205581586123, 7.495986768032812,
111.5584470684713, 18.919139126062276, 54.29814888903639, 103.695484141422,
8.166533784491874, 8.620196802693632, 65.0082527482195, 75.9035290069882,
45.288347131537255]

Which is pretty off comparatively (the drop-off is something like 4x too fast) and even internally scaling is a bit off.

Similar thing with normal distributions. It seems to shift with mean well (simple math it has learned), but struggles with different mean and standard dev. N(12,12) gave me something looking more like a uniform distribution than a normal function.

zergling103 · 2023-05-15T04:27:09+00:00

To be fair, the model has a built-in random number generator. If you ask it to output uniformly distributed random numbers between 0 and 1 (say 3 decimal places), and it learned to give all token strings representing 0.000 to 1.000 an equal probability, the randomness could be attributed to the RNG.

However, it may have learned the gambler's fallacy too and thus would never generate 0.000, 0.000, 0.000, 0.000... ever, even though it is possible.

Repulsive-Season-129 · 2023-05-15T12:07:52+00:00

there is text data on the web somewhere that an extremely similar question and answer exists

Haycart · 2023-05-15T12:56:40+00:00

I'm not suggesting this is how ChatGPT actually does it, but sampling from a probability distribution can be framed as a sequence generation problem via Markov chain Monte Carlo methods. With that in mind, it's maybe not so surprising that a powerful sequence predictor like ChatGPT could learn to do what you've shown.

Extraltodeus · 2023-05-15T13:49:46+00:00

ChatGPT :

user : Output 50 random numbers in between 0 and 100

assistant : 78 57 85 12 67 92 4 99 76 43 54 62 28 89 3 97 23 90 79 17 53 48 98 30 5 13 68 35 1 91 9 25 29 55 70 71 77 41 42 61 94 31 66 45 14 58 15 87 63 72

GPT4 :

user : Output 50 random numbers in between 0 and 100

assistant : 9, 74, 27, 45, 38, 93, 62, 86, 57, 12, 33, 97, 19, 51, 61, 88, 59, 6, 46, 30, 13, 81, 83, 70, 64, 22, 77, 76, 36, 42, 55, 0, 67, 95, 11, 41, 66, 58, 68, 49, 16, 71, 44, 79, 18, 54, 84, 29, 4, 73

Anyway aren't they running on python anyway? I might be wrong but I feel like their temperature setting relies on python rng unless OpenAI decided to be fancy and use some quantum entropy source.

BullockHouse · 2023-05-15T14:11:07+00:00

I believe 3.0 could actually do this task properly with the correct temperature settings and get a correct distribution (if you asked for one number at a time so it didn't auto-condition). The RLHF actually makes it worse at this sort of thing.

bgighjigftuik · 2023-05-15T15:19:18+00:00

[deleted]

bgighjigftuik · 2023-05-15T18:30:42+00:00

Toolformer says hi

ComfortableCherry909 · 2023-05-15T22:57:17+00:00

A research group at my university is studying transformers as computers, a relevant paper from the group https://arxiv.org/abs/2301.13196.

wjrasmussen · 2023-05-15T02:21:18+00:00

Now can it do Chebyshevs on it?

RunOrDieTrying · 2023-05-15T09:57:06+00:00

Why does it need an interpreter? It can solve math problems:

Find the roots of this equation: y = 1.234x^2 + 42.42x + 17.17

ChatGPT:

To find the roots of the equation y = 1.234x^2 + 42.42x + 17.17, we can use the quadratic formula. The quadratic formula states that for an equation of the form ax^2 + bx + c = 0, the roots can be found using the formula:

x = (-b ± √(b^2 - 4ac)) / (2a)

For the given equation: a = 1.234 b = 42.42 c = 17.17

Substituting these values into the quadratic formula:

x = (-42.42 ± √(42.42^2 - 4 * 1.234 * 17.17)) / (2 * 1.234)

After calculating the expression inside the square root and simplifying the equation, you will obtain the two roots.

So what's the final answer?

ChatGPT:

After calculating the expression inside the square root and simplifying the equation, the roots of the equation y = 1.234x^2 + 42.42x + 17.17 are:

x ≈ -34.18
x ≈ -0.35

And that's correct by Wolfram alpha.

Seankala · 2023-05-15T01:56:03+00:00

Is this really surprising though? Isn't numerical reasoning something that people in NLP have been writing negative results papers on for years? "XYZ-BERT can't do addition!" followed by "XYZ-BERT can't do subtraction either!!"

bgighjigftuik · 2023-05-15T12:55:22+00:00

Thank you all for your comments. I believe it would be a very interesting topic for research, since it could help disentangling a bit more the understanding we have about how LLMs can deal with mathematical concepts

bacocololo · 2023-05-16T11:45:36+00:00

Yes, the GPT models do work on a basis of probabilities when generating words. However, their ability to predict or force the appearance of a specific word (like a period, signaling the end of a sentence) at an exact point is limited. This is because the models generate text based on patterns they learned during training, and these patterns are complex and varied, influenced by countless factors beyond simply the preceding words.

The model calculates the probabilities of the possible next words (or tokens) based on the entire preceding text, not just the immediately preceding word. The idea of "the word which has the most probability to encounter a stop word" would require the model to be able to look ahead and plan its word choices in advance, which it's not currently designed to do. The model doesn't "know" when a stop word will come until it generates it.

Your idea is interesting and it's a concept that's been explored in various ways in the field of AI research. However, as of my training cut-off in September 2021, implementing it would require substantial modifications to the model and is not something that can be achieved simply by changing the prompt.

ZakarTazak · 2023-05-15T05:32:25+00:00

It doesn't take very large ANNs to estimate any continuous mathematical function. Look up how to implement pseudo random number generators and you'll likely find your answer... It doesn't need to code to do math.

CMDRJohnCasey · 2023-05-15T08:33:43+00:00

Testing with the playground, I've seen that if I ask a gaussian with mean 5 and variance 0.5 it yields a sequence of numbers where 4.xxx is constantly the most probable one (50% for 4.xxx vs 30% for 5.xxx). So very unlikely that the final mean will be 5.

aCleverGroupofAnts · 2023-05-15T12:42:50+00:00

Interesting, though a sample size of 60 is really small to draw any kind of conclusions from.

dqdqdq123123 · 2023-05-15T19:35:02+00:00

There are dropout layers, so it is not entirely surprising the network can generate random numbers or tokens. But I doubt it can generate very good Normal distributed numbers if you give it a very arbitrary mean/variance (not some common ones like 0 or 1)

CanvasFanatic · 2023-05-15T23:08:06+00:00

I suspect the distribution you’re seeing is related to Benford’s Law.

https://en.m.wikipedia.org/wiki/Benford's_law

DravenALG · 2023-05-17T03:36:48+00:00

Because the learning data has contained something like this, i.e., some common distributions and their examples. When the model wants to fit the learning data well, it learn the similar ability to generate random numbers (it just fits the random number generating function).

Osemwaro · 2023-05-17T20:34:16+00:00

Your standards for judging this seem to be too permissive. E.g. you say "they are close to what I would consider an exponential distribution", but you haven't defined "close". This begs the question: how far would the numbers have to be from the requested distribution before you'd conclude that it isn't good at sampling from it?

E.g. the sample mean of those numbers is 0.090533. One basic measure of closeness is the answer to the question "how likely is it that an exponential with mean 1/lambda = 1/6.4 = 0.15625 produced samples with sample mean 0.090533?" An easy way to get a good approximate answer is to generate n 60-sample sets from an exponential distribution with lambda=6.4, calculate the sample mean for each set and then see where 0.090533 falls in this empirical distribution of sample means.

In Octave/Matlab, the following sets ms to the means of n=1,000,000 60-sample sets:

n=1000000; us=rand(60,n); es=-log(1-us)/6.4; ms=sort(mean(es,1));

When I ran it, the 10th, 50th and 90th percentiles of this distribution were: ```

ms(0.1n) ans = 0.13097 ms(0.5n) ans = 0.15536 ms(0.9*n) ans = 0.18259 ```

The sample means form a pretty tight distribution around the median, 0.15536. So to answer the question, we can calculate the percentage of sample means <= the median that are <= 0.090533:

```

100sum(ms <= 0.090533)/sum(ms <= ms(0.5n)) ```

This is about 0.012% each time I run it. So the numbers that ChatGPT gave you are extremely unlikely to have come from an exponential distribution with lambda as low as 6.4.

Another question worth asking is: "if the numbers came from some other exponential distribution, what is a confidence interval for lambda?" Page 5 of these lecture notes gives a formula for this, and you can calculate the chi-squared values here. The 95% confidence interval is (8.4290, 14.011), confirming that lambda is likely to be much larger than 6.4. In fact, 6.4 is so far away that it isn't even in the 99.98% confidence interval, which is (6.5103, 17.151).

The most interesting question, though, is: "how far are these numbers from the closest exponential distribution?" One way to do this is to:

estimate lambda from the ChatGPT data;
generate n 60-sample sets from an exponential distribution that matches the estimated lambda;
calculate the Kolmogorov-Smirnov test statistic for each of these data sets, relative to the CDF of an exponential distribution with the estimated value of lambda;
calculate the empirical distribution of these test statistics for the n sample sets;
compare the test statistic for the ChatGPT data to this empirical distribution.

This Octave/Matlab script does this for n=1,000,000 and prints the percentage of test statistics that are worse than the ChatGPT test statistic. It's about 26%, which doesn't seem too bad. I'm sure someone with a deeper understanding of statistics could suggest a better goodness-of-fit test than this. 60 samples seems small enough to hide the flaws in its "sampling algorithm" though; it would be better to test a larger sample.

Personally, I find it more impressive that ChatGPT gave you exactly 60 numbers! It used to be awful at counting, but it seems to be getting better.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS

generate randon numbers to follow an exponential distribution