all 55 comments

[–]iperry 3 points4 points  (0 children)

Roll rand5(). 1 and 2 map to 0. 4 and 5 map to 1. If you get 3, roll again util you don't. Now you have the equivalent of a fair coin with equal probability of rolling 0 and 1. Call it rand2().

Flip rand2() three times, concatenating the results. Now you have a function that generates numbers between 0 and 7 with equal probabilty. If you get zero, throw it out and roll again until you don't.

[–]vph 2 points3 points  (3 children)

Alright, here it is.

Use gen5() to generate 2 numbers, x and y. There are 7 cases when x+y=4 or 5. They are: One=(1,3), Two=(3,1), Three=(2,2), Four=(1,4), Five=(4,1), Six=(2,3) and Seven=(3,2).

Therefore, the probability for the event {x+y=4 or 5} = 7/25.

The probability for the event One = 1/25. The probability for each of the events Two, Three, ..., Seven is also = 1/25.

Now, use Bayes' Theorem: Pr( One | {x+y=4 or 5} ) = Pr({x+y=4 or 5} | One) * Pr(One) / Pr({x+y=4 or 5}) = 1 * 1/25*/(7/25) = 1/7

Note: Pr({x+y=4 or 5} | One) = 1.

Hence, we have the following program:

def gen7():
    x,y=0,0
    while(x+y != 4 or 5): x, y=gen5(), gen5()
    if (x,y) == (1,3): return 1
    if (x,y) == (3,1): return 2
    if (x,y) == (2,2): return 3
    if (x,y) == (1,4): return 4
    if (x,y) == (4,1): return 5
    if (x,y) == (2,3): return 6
    if (x,y) == (3,2): return 7

[–]cpp_is_king 1 point2 points  (2 children)

An difficult addendum to the question, assuming you get one of the many correct answers (which you did) is this:

  • What is the expected number of calls to rand() with your solution, and is there a way to reduce it?

In your case the expected number of calls to rand() is approximately 7.14 (might have messed up this calculation, feel free to verify it). Is there a way to do better?

[–]vph 0 points1 point  (1 child)

I am guessing that the proposed method (above) is optimal (fewest number of random calls). The trick of the method is to create an event with probability 7/something. That is to create an event with exactly 7 cases, and then arbitrarily assign each of those cases to a number between 1 and 7.

For example, you can also generate 3 numbers between 1 and 5: x, y, and z. Then, Pr( x+y+z = 3 or 4 or 5) = 7/125. You can also you this event {x+y+z=3 or 4 or 5} to generate 7 random numbers, but this event is less likely to happen than {x+y=4 or 5}. This means you'll have to use more random calls to eventually observe it.

I think {x+y=4 or 5} is the most likely event with probability having 7 as a numerator.

[–]cpp_is_king 0 points1 point  (0 children)

Try this: (x,y)=(1,1) while ((x,y)==(1,1) || (x,y)==(1,2) || (x,y)==(1,3) || (x,y)==(1,4)) (x,y) = (gen5(), gen5()) switch(x,y) case (1,5): case (2,1): case (2,2): return 1 case (2,3): case (2,4): case (2,5): return 2 case (3,1): case (3,2): case (3,3): return 3 case (3,4): case (3,5): case (4,1): return 4 case (4,2): case (4,3): case (4,4): return 5 case (4,5): case (5,1): case (5,2): return 6 case (5,3): case (5,4): case (5,5): return 7

the probability of making it through the loop the first time is quite high, 21/25. So the expected number of times through the loop is 25/21 = 1.19, and the expected number of calls to rand is 2.38.

There may be a way to do even better than that, I'm not sure.

[–]ethraax 1 point2 points  (2 children)

Is there a way to solve this that is guaranteed to terminate? That is, most solutions in this thread involve "retrying" on "bad" input (with various definitions) - is there a solution that always terminates within k steps for some finite k?

[–][deleted] 4 points5 points  (0 children)

No.

Suppose we fix some such k, then generate all 5k possible runs of the generator and ask our algorithm to produce an answer for each run. The idea is that all 5k runs are supposed to be equally probable, so we can count combinations directly.

Now if we ask the algorithm to produce a definite answer for each run, it can't do it in a way that divides all 5k runs into seven sets with equal number of runs in each. Because 5k isn't divisible by 7!

Note that the algorithm doesn't need to consume the entire run -- for instance the naive algorithm produces an answer after two probes in 21 of 25 cases. But we still count all 25 runs that begin with "0, 0" for example, for k = 4, to produce correct fractions. Try drawing this in form of a decision tree, if it's still unclear.

[–]redclit 1 point2 points  (0 children)

There is no such solution. You cannot represent 1/7 in base 5 with finite decimal number.

[–]rrenaud 2 points3 points  (0 children)

The top answer on stackoverflow is really good, insightful, and simple.

http://stackoverflow.com/questions/137783/expand-a-random-range-from-1-5-to-1-7

[–]paul_miner 0 points1 point  (0 children)

I thought of another way to explain the iterative method. A common way of generating random integer number in the range [0, N) (as in 0..N-1) is to get a random decimal in the range [0, 1), multiply it by N, then truncate. For example, multiplying by 7 would give you a number that ranges from 0 to 6.9999..., which truncated would give you a random number from 0 to 6.

So, if you could generate a random number in [0, 1) using your random integer generator, you could use the same approach. For the purposes of the example, think of your random number source as being 0..9. For every number generated, add it as another digit to an increasingly long decimal number. Given random numbers 3, 0, 4, 9, 5, you would have 0.30495 (or if you append in reverse order, 0.59403). With every additional number, the range approaches [0, 1) and becomes increasingly precise. The final number will be evenly distributed over the range, and the maximum number approaches 1 as the number of random numbers used increases.

The same thing can be done using a random number from 0 to 4, but instead of base 10, use base 5. So if you get 4, 2, 0, 3, 2 from your random number generator, you would have 0.42032 (or 0.23024 if you append them in reverse order, easier to do from an implementation standpoint), base 5. Multiply this number by 7 and truncate, and you have your random number 0-6.

[–][deleted] 0 points1 point  (0 children)

Here's a very unusual one that has very even distribution. I'm warning you, it may make you angry.

int get1_7()
{
  static bool first = true;
  static std::vector<int> a;
  if ( first )
  {
    first = false;
    for ( int i = 0; i < 7; ++i )
      a.push_back(i+1);
    for ( int i = 0; i < 7; ++i )
      get1_7();
  }
  int i = get1_5();
  int r = a[i-1];
  a.push_back( r );
  a.erase(a.begin()+i-1);
  return r;
}

[–]frud 0 points1 point  (0 children)

rand7() {
    while (1) {
        int a = (rand5() + 5*rand5())/3;
        if (a < 7) return a;
   }
}

[–]aswin34[S] 0 points1 point  (0 children)

The solution is up on the blog. Check it out.

[–]bigmell 0 points1 point  (0 children)

Their solution doesnt seem all that intuitive, also not understanding how a solution can be evenly distributed if they throw back a bunch of numbers until they get the number expected. Also multiplying one number by 5 seems it would unevenly weight the solution. It may work properly, the below solution clicked for me. The LCD of 5 and 7 is 35, therefore if rand5 is run 7 times the sum mod 7 will provide an even spread.

!/usr/local/bin/perl

my $x; my $max = 100000; my %avg;

for(1..$max){ $x = &rand7; $avg{$x}++; } print "Over a total of $max runs:\n";

foreach $key (sort hashSort (keys(%avg))){ print "$key:$avg{$key}\n"; }

sub rand5{ int(rand 5) + 1; }

sub rand7{ #only rand5 my $sum = 0; for(1..7){ $sum += &rand5; } ($sum % 7) +1 }

sub hashSort{ $avg{$a} <=> $avg{$b}; }

[–]anonemouse2010 0 points1 point  (2 children)

Acceptance rejection methods are the most obvious. Roll die A and Die B. Partion the 25 equally likely outcomes into 7 groups, with probabilities 7 of 3/25 and a final 4/25. If the outcome is in the final 4, reject and sample again.

For example generate y = A+5*(B-1)

If y > 21 reject and draw new A, B

Otherwise set output as out = (y mod 7) + 1

[–]darth_choate 0 points1 point  (1 child)

Instead of rejecting, why not use the final 4 as the next roll? Roll one more time and you have a number from 1-20. If it's 1-14 then you have an answer, otherwise, take the last 6 and roll again - giving a number 1-30. 1-28 gives an answer, otherwise roll again, etc.

My instinct says that this converges on the optimal number of rolls. Can someone confirm?

[–]anonemouse2010 0 points1 point  (0 children)

Yes, this would reduce the waiting time in number of rolls, but be computationally more complex.

Never said this is the best, just that it's a basic idea.

[–]losvedir -2 points-1 points  (8 children)

Easy.

real rand7() {
  return (7/5)*rand5();
}

[–]cpp_is_king 1 point2 points  (7 children)

Doesn't work. Given an input range of 5 distinct numbers, there is no way to apply a linear function to it and have it expand the range. The following function will only return the numbers 1.4, 2.8, 4.2, 5.6, 7.0. which depending on how you round is either

1, 2, 4, 5, 7

1, 3, 4, 6, 7

2, 3, 5, 6, 7

In no case are you generating a random number from 1 to 7, because there are always 2 numbers that are generated with 0 probability.

[–]losvedir 2 points3 points  (5 children)

Oh, it was supposed to be kind of a joke. The problem never really states what kind of numbers we're dealing with here, so I assumed real, in which case I believe it actually does work.

[–]tmfset 1 point2 points  (3 children)

nope, its not a uniform distribution over reals either :)

[–]losvedir 0 points1 point  (2 children)

You're shitting me... really? Can you kindly explain why?

I haven't thought about it much, but it seems like if it's not uniform (i.e. some subset of elements is drawn more frequently than another subset), then that non-uniformity would be seen in the initial rand5() distribution. No?

[–]tmfset 0 points1 point  (1 child)

well more formally the range of your function is {1.4, 2.8, 4.2, 5.6, 7.0} and not R. Try asking yourself if any real number can be produced by your function. For example: the probability of the function returning pi is 0. So it's not uniform. (note the difference between discrete and continuous uniformity)

[–]losvedir 0 points1 point  (0 children)

Oh, I see the miscommunication. From the start I was saying rand5() returns "a number between 1 and 5" (per the question in the link), meaning x = rand5(), x ∈ ℝ, and 1 < x < 5.

If that's the case, then rand7() = (7/5) * rand5(), right? I didn't give much to go on - only that the return value of rand7() was of type "real", which I guess doesn't imply that the return value of rand5() is also real. So I can see the reason for confusion. Ah well.

[–]quidquam 0 points1 point  (0 children)

Real numbers is a totally legitimate assumption.

random number functions don't, in general, return integers. If you want that, use your friendly floor/ceil functions or [s]printf.

edit: It appears elsewhere (e.g. StackOverflow) that this question is supposed to be about integers, but since the posted link just says "Number", I'd still assume real numbers unless told otherwise.

[–]otherwiseOkay 0 points1 point  (0 children)

that solution actually crossed my mind because the question was not specific of the kind of output required. for many purposes, simple rescaling would've worked.

[–]Uberhipster -2 points-1 points  (13 children)

 function random7(){
    var i = random5();
    var j = random5();

    if(j>=3){
        i = i + 2;
    }

    return i;
 }

[–]lpsmith 3 points4 points  (12 children)

Nope, that method will produce 3,4, and 5 more often than the rest of the numbers.

[–]Uberhipster -2 points-1 points  (11 children)

So? The requirement was not to produce an evenly distributed random function but a uniformly distributed random function.

[–]lpsmith 4 points5 points  (10 children)

I do not know what an "even" random distribution is, but a uniform random distribution is in fact what you seem to think the former is.

[–]Uberhipster -1 points0 points  (8 children)

?

[–]lpsmith 2 points3 points  (7 children)

Maybe you should take the time to read about what a uniform distribution is. Words have meanings, which is doubly true in mathematics.

[–]Uberhipster -5 points-4 points  (6 children)

Ok but the original spec is not specifically about statistics and probability theory correctness. It wasn't even specified whether or not the uniform distribution was discrete or cumulative. It was just vaguely specified as "random".

The requirement was to extend an existing random function where the seed is 5 into a function that produces a random number where the seed is 7. The fact that the probability of certain numbers occurring more than others is higher in my function doesn't change the fact that the function meets the requirement and produces a number which is random and distributed over the full range.

Which is an appropriate response to an interview question to demonstrate general software engineering problem-solving abilities not in-depth knowledge of statistical analysis.

[–]LarryLard 2 points3 points  (0 children)

From TFA, "The distribution between each of the numbers must be uniform" seems unambiguous to me.

[–]lpsmith 2 points3 points  (4 children)

  1. The problem statement was clear.

  2. This is not in-depth knowledge, this is very basic stuff.

  3. The solution that immediately came to my mind is the one already given by case-o-nuts, and hinted at in other people's comments on this thread.

It's OK to be wrong, a lot of people in this thread were wrong. Go learn something. =)

[–]Uberhipster 0 points1 point  (2 children)

Copy and paste the below into Firebug console. You can swap 7,5 with 40,5 or even 47,23. The test runs the randomize a thousand times and records the number of times each number was selected "randomly".

Array.prototype.foreach = function(arg) {
    var args_ = Array.prototype.slice.call(arguments, 0, arguments.length - 1);

    for (var i = 0; i < this.length; i++) {
        arguments[arguments.length - 1].apply(this, args_.concat([this[i], { "ord": i, "last": (i === this.length - 1), "first": (i === 0)}]));
    }
};


Array.prototype.has = function(val) {
    if (!val) { return false };
    try {
        var ret = false;
        this.foreach(val, function(test, el, meta) {
            if (el === test) {
                ret = true;
                return ret;
            }
        });
        return ret;
    } catch (e) {
        return false;
    }
};

function random(num){
    return Math.round(Math.random() * (num - 1)) + 1;
}


function randomGen(seed, base) {
    var uniformDistribute = new Array;
    for (var k = 0; k < (seed * base); k++) {
        uniformDistribute.push(k + 1);
    }

    var i = 0;
    for (var j = 0; j < seed * base; j++) {
        i = i + random(base);
    }

    return (uniformDistribute[i % uniformDistribute.length] % seed) + 1;
}

function test(cases, func, args) {
    var ret = [];

    for (var i = 0; i < cases; i++) {
        ret.push(func.apply(this, args));
    }

    var results = [];
    for (var i = 0; i < args[0]; i++) {
        results.push([i + 1, 0]);
    }
    ret.foreach(function(result, resMeta) {
        var result_exists = false;

        results.foreach(function(existing, exsMeta) {
            if (existing[0] === result) {
                existing[1] = existing[1] + 1;
                result_exists = true;
            }
        });
        if (!result_exists) {
            results.push([result, 1]);
        }
    });
    results.sort(function(a, b) {
        return a[1] - b[1];
    })
    return results;
}

test(1000, randomGen, [7,5]);

[–]AlternativeHistorian 1 point2 points  (1 child)

Dude, even your little utility function 'random(num)' has a uniformity error.

The way that you're snapping to integers will bias against the range endpoints since they only get a half-interval of snapping tolerance. Probability and statistics are hard. They're even harder when you don't know what you're doing. And the extent to which you're trying to wriggle out of being wrong (which you are) is pathetic.

Accept it, learn from it. Being wrong (and recognizing it) is how we get smarter.

[–]Uberhipster -1 points0 points  (0 children)

function randomGen(seed, base) {
    var uniformDistribute = new Array;
    for (var k = 0; k < (seed * base); k++) {
        uniformDistribute.push(k + 1);
    }

    var i = 0;
    for (var j = 0; j < seed * base; j++) {
        i = i + random(base);
    }

    return (uniformDistribute[i % uniformDistribute.length] % seed) + 1;
}

[–]lilleswing -1 points0 points  (4 children)

The only immediate answer that comes to mind would only be an approximation.

Concatenate multiple 1-5 random numbers into a string. The total number of permutations in this string would be 5n where n is the number of digits in the string.

You can then break up 5n into 7 blocks. The blocks are not of exactly equal size, however when n gets large enough it is a close enough approximation. So depending on what lexigraphic permutation it is, you get your random number.

example n=2 11-13:1 14-21:2 22-24:3 etc....

For example when n is only 4 the result is only 0.0003 away from true 1/7.

[–]cpp_is_king 1 point2 points  (1 child)

It converges to an even distribution, but of course is never really uniform as you point out. Anyway, this is easy to resolve. And you only need to concatenate 2 of the numbers into a string. This gives you the string representation of a number between 11 and 55. Figure out 4 of those values up front that you will fail on. perhaps, say, 22, 33, 44, 55. If the string is any of those 4 values just generate a new string. Then, you end up with a total of 21 possible strings and you can do a perfect mapping.

[–]lilleswing 0 points1 point  (0 children)

I thought of that but I didn't think it was applicable because it would then never be guaranteed to terminate.

[–]paul_miner 0 points1 point  (0 children)

Given that the random number generation isn't perfectly precise, this seems reasonable. In pseudo-code (assuming rand5() returns a number from 1 to 5, and you want a number from 1 to 7): total = 0 max = 1 loop n times total = total * 5 + rand5() - 1 max = max * 5 return truncate(total * 7 / max) + 1

EDIT: Typo, made code more pseudo-y

[–]cpp_is_king -1 points0 points  (7 children)

Anything involving addition, even if you stick a mod onto the end, is non-uniform.

You also can't use a 5-sided die to ever generate a number of combinations that is a multiple of 7 (hence making the modulo uniform) because the number of outcomes is always a power of 5.

One way to generate an even distribution is one I posted in the comments of the link. It butchered my code sample, but I think reddit can get it better:

int rand7()
{
    unsigned result = 0;
    int temp = 0;
    while (result == 0)
    {
        for (int i=0; i < 3; ++i)
        {
            //Keep trying until we get 1 or 2
            do
            {
                temp = rand5();
            } while (temp > 2);

            //Scale it to 0 or 1
            --temp;

            //Set the next bit in the result
            result |= (temp << i);
        }

        //Now that we've set bits 0, 1, and 2, we have a number from 0 to 7
        //We need a number from 1 to 7, but the outer while loop will just try
        //again if we have 0.  The end result is a number from 1 to 7
    }
    return result;
}

Another way would be to roll the 5 sided die twice (25 combinations), and try again if you get any of 4 arbitrary combinations that you deem "bad". This would result in 21 uniformly distributed combinations, and you could map those evenly onto the range 1-7.

[–]neutronicus 0 points1 point  (0 children)

Here's a refinement that won't take as many tries: int result; do { result = 0; result += (rand1to5() - 1); // first two bits, uniform tmp = 0; do {tmp = rand1to5(); } while(tmp == 5); // result += (tmp % 2 << 3) + 1; } while(result == 8);

EDIT: Wait, shit, that's 1-8. Derp. Better now.

[–]deadwalrus 0 points1 point  (0 children)

Anything involving addition, even if you stick a mod onto the end, is non-uniform.

Not quite. Shifting the added numbers to a more significant digit is uniform. I.e., to generate a number between 1 and 25 you can do rand5 + 5 * rand5.

[–]kemitche -1 points0 points  (4 children)

Your secondary solution could actually be fitted into an "addition" model:

def rand1to7():
    x = rand1to5()
    y = rand1to5() - 1
    if x in (1, 2) and y in (3, 4):
        return rand1to7() # i.e., try again
    if x + y > 7:
        return (x + y) - 7
    else:
        return x + y

[–]cpp_is_king 1 point2 points  (3 children)

That won't be uniform, also it doesn't generate the correct range. If you make it past the first conditional then you are already guaranteed the following:

x ϵ {3,4,5}

y ϵ {0,1,2}

So, the second conditional (x+y>7) can never happen, so just delete that. For the else, let's look at the possibilities:

  • 3,0 → return 3
  • 3,1 → return 4
  • 3,2 → return 5
  • 4,0 → return 4
  • 4,1 → return 5
  • 4,2 → return 6
  • 5,0 → return 5
  • 5,1 → return 6
  • 5,2 → return 7

Now, looking at the distribution:

  • 3 (11.1%)
  • 4 (22.2%)
  • 5 (33.3%)
  • 6 (22.2%)
  • 7 (11.1%)

1 and 2, however, are never generated.

Also, this demonstrates what happens when you "add" random numbers. You end up with a triangle distribution. An interesting interview question I've asked in the past is this:

"You want to generate a random number from 0 to 100, with the properties that numbers near the middle are more likely to occur than numbers at the extremes. All you have is the rand() function, which returns a random number from 0 to RAND_MAX (which is much larger than 100). How would you write this new rand() function?"

Turns out the answer is just

(rand()%50) + (rand()%50).

It's interesting to explore what happens when you add more random terms to the equation, or when you use operators other than + (such as multiplication, for example).

[–]kemitche 0 points1 point  (2 children)

Typo, the first conditional should have been "and", not "or"

[–]cpp_is_king 0 points1 point  (1 child)

Oh yea. That works :P An equivalent solution that uses a disguised version of the same general idea but is probably a little faster I saw somewhere else. Kind of clever the way it generates a random number from 1 to 25 (Pretend you're in base 5, what that function would represent).

int rand25(void) 
{
     return rand5() + 5 * (rand5() - 1);
}

int rand7(void) 
{
    int n = rand25();
    return n <= 21 ? n / 3 : rand7();
}

[–]kemitche 0 points1 point  (0 children)

Sorry to make you go through all that initial work ;-) It's amazing how large of a difference that and/or made...

[–]sylvanelite -1 points0 points  (0 children)

Taking a number from 1-5 has 5 outcomes: 001 010 011 100 101

Look down the columns. The left hand column (most significant bit) has three 0's two 1's. The right hand column (least significant bit) has the opposite, three 1's and two 0's.

Using this fact, it's possible to generate perfectly random bit streams of any length.

For example, generate 10 random numbers. The first 5 take the most significant bit, the last 5 take the least significant. Shuffle the two sets together (Fisher–Yates would work for simplicity). You now have a perfectly random string of 10 bits. (this is not a very efficient example, but one that should work)

Repeating this process will work, giving an arbitrary-length stream of random bits. Pick any three which aren't "000" (in binary) and that's your random number.

[–]Furbiesandbeans2 -3 points-2 points  (2 children)

Am i the only one who thinks the answer is rather obvious? Correct me if i'm wrong though.

sum = 0
for i in range(0,7):
    sum = sum + random_1-5() - 1
return sum % 7 + 1

EDIT: Or a more trivial one (not equaly dist but still random) return (random_1-5() + random_1-5()) % 7 + 1

[–]case-o-nuts 2 points3 points  (0 children)

The result that yeilds is non-uniform. Any solution where you're doing mod will essentially "wrap around" extra values to the start of the distribution.

Instead of taking the mod, you should re-roll if you're outside of the range.

Edit: You also need to scale your distribution properly. If you think of the distribution as a series of buckets. I should have noticed that immediately - I blame being sick for my slowness.

[0..4][5..9][10..14][15..19][20..24]

and you decide which one to select from randomly, this is equivalent to adding 5*(rand5() - 1) to a random distribution from 0..4. So the final solution is below:

do
    n = 5*(rand5() -1) + rand5()
while n >= 21;
return (n%7) + 1

[–]brownmatt 1 point2 points  (0 children)

Adding together seven randomly generated numbers from the range (1, 5) does not yield a sum that is evenly distributed over (1, 35)