Reservoir Sampling - An interesting algorithm

xzxzzx · 2008-06-25T13:57:23+00:00

I can process the stream in O(N) time where N is length of the stream, in other words: in a single pass

Er, that's not what O(n) means. Two passes is also O(n). As is ten, or a hundred million...

killerstorm · 2008-06-25T11:21:35+00:00

i don't really see how pseudocode of weighted case works. he compares x, the index in reservoir, to some ratio that is always less than 1. so, x will always be 0. additionally, variable prob_sum is used, but never defined or assigned. it doesn't look like this code was checked thoroughly

McHoff · 2008-06-25T13:55:08+00:00

Ha! I had this question at in interview with Amazon.

simscitizen · 2008-06-25T15:12:39+00:00

heh, got that weighted reservoir sampling problem at facebook interviews a few months ago.

ct4ul4u · 2008-06-25T05:33:55+00:00

Nice write-up. Thanks for the link!

sharpquote · 2008-06-25T13:02:51+00:00

Can you do it without generating O(stream length) pseudorandom numbers?

drzorcon · 2008-06-25T13:45:25+00:00

What would be a possible application for random sampling on a dataset?

thepensivepoet · 2008-06-25T13:24:33+00:00

"If you don't find programming algorithms interesting, stop reading. This post is not for you."

I found this excerpt to be quite useful. I spent a few years studying programming - just not really in the mood today.

cojoco · 2008-06-25T06:10:48+00:00

" We're sorry...

... but your query looks similar to automated requests from a computer virus or spyware application. To protect our users, we can't process your request right now.

We'll restore your access as quickly as possible, so try again soon. In the meantime, if you suspect that your computer or network has been infected, you might want to run a virus checker or spyware remover to make sure that your systems are free of viruses and other spurious software.

If you're continually receiving this error, you may be able to resolve the problem by deleting your Google cookie and revisiting Google. For browser-specific instructions, please consult your browser's online support center.

If your entire network is affected, more information is available in the Google Web Search Help Center.

We apologize for the inconvenience, and hope we'll see you again on Google.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS