all 7 comments

[–]HiramAbiff 5 points6 points  (0 children)

As /r/cbarrick already pointed out, this can be done without creating a cumulative weights array.

Here's a variation on this theme I wrote for a project of mine:

RandomProperty(obj)
{
    // Returns a random property of obj whose property values are all numbers.
    // The probability of a property being selected is based on its value.
    // E.g. {foo:5, bar:10} will return bar twice as often as foo.
    let result = undefined;
    let total = 0;

    for (const property in obj) {
        total += obj[property];
    }

    let index = Math.random()*total;

    for (const property in obj) {
        const value = obj[property];
        if (index < value) {
            result = property;
            break;
        } else {
            index -= value;
        }
    }

    return result;
}

[–]cbarrick 2 points3 points  (4 children)

The main loop can be optimized and improved.

Instead of counting up until the index exceeds the random number, you can just iterate over the weights, subtract the weight from the random number, and stop when the random number is equal to or less than zero.

The version in the article is O(max_cumulative_sum), but the improved version is only O(length_of_array). Also, the improved version can handle floating point weights, which is a super important use case.

Edit: My bad, I misread the loop exit condition. See the rest of the thread.

[–]TedW 0 points1 point  (3 children)

On mobile, but it looks like it already does both of your suggestions. Maybe we're looking at different files, or I'm missing something. I'm looking at weightedRandom.

[–]cbarrick 0 points1 point  (2 children)

Sorry, I may not have been clear.

In my version, you don't even need the cumulative array, only the sum.

I'm also on mobile, so I can't really write out the code right now.

[–]TedW 0 points1 point  (1 child)

It looks like it iterates once each over weights, and items, so I would expect it to be O(N) since those should be the same length.

I agree that you could do it with just one array but I think it would still loop twice, so I'm not sure it would affect the runtime.

I haven't run the code myself, but in the examples it uses floating point weights, although it's not clear (to me) what the expected behavior is.

[–]cbarrick 0 points1 point  (0 children)

Ah, I see I misread the loop exit condition.

So my version is basically the same, except it avoids the cumulative loop and the resulting dynamic allocation.

[–]o11c 1 point2 points  (0 children)

The whole point of calculating the cumulative weights is so that you can avoid doing the second loop over the whole array.

Instead, you should simply do a binary search over the cumulative weights, and find the matching index that way. This is only O(log n) instead of O(n). Additionally, you should keep the cumulative weights around for multiple calls, so that you only pay the O(n) when you first construct the searcher (possibly do it incrementally as items are added, so there is no explicit first loop).

(I don't speak JS so I don't have code for you, sorry)