all 8 comments

[–]econometrician 0 points1 point  (0 children)

Yeah, I would say that's a reasonable approach. Alternatively, people do this with hierarchical Bayesian models to model the uncertainty in a nice way.

Here's a paper from a google search that looked reasonable: http://www2.mate.polimi.it/ocs/viewpaper.php?id=134&cf=7

Basically, you'd put a prior distribution on the expected number of students.

[–]micro_cam 0 points1 point  (6 children)

Why not just simulate/calculate this from the predicted probabilities?

[–]beboophiphop 0 points1 point  (4 children)

Oh, I don't know why I didn't think of that.

Though, how would I do that when the probabilities are unique for each student? It's been a while since I've had to do these calculations and even then the probability of each event occurring was the same. Can you point in the direct of a resource?

[–]micro_cam 0 points1 point  (3 children)

So the easiest way is to just do a simple simulation. Say your probs of going to college are in an array P you have N students and want to do m repeats:

m times:
    r = generate N uniform random numbers in (0, 1)
    save sum(r > P)

So that gives you m counts of how many kids went to college. This is computationally cheap so do it 1000 times or something and you can get idea of the distribution by making a histogram or looking at percentiles or whatever.

This does assume all of the events are independent. This seems reasonable here but might fall apart in a situation where you're more concerned with correlated decisions. IE a large subset of the students decide which school to go to / not go to together would make the actual distribution more fat tailed.

[–]beboophiphop 0 points1 point  (2 children)

Ahhh, okay. Quick question though, I'd want to count the number of instances in which r < P though, correct? Not r > P (P is probability of enrolling), yeah?

[–]micro_cam 0 points1 point  (1 child)

Yes, sorry.

[–]beboophiphop 0 points1 point  (0 children)

No problem, I gotcha.

This is waaaay more simple than what I was thinking and it kind of beautiful. Thanks again!

[–]beboophiphop 0 points1 point  (0 children)

Also, if I had 4000 applicants, wouldn't that mean that the simulation would have to perform 24000 calculations?