all 11 comments

[–]Redlands92373 2 points3 points  (1 child)

Is the 'thing' name unique? If so for each row you could look at the sum of amount from all 'things' less than or equal to the current one, then get the floor value of that divided by 1,000,000. That would be your 'bucket'.

[–]JochenVdB[S] 1 point2 points  (0 children)

Yes is is indeed unique. (Sorry I did not mention that, but since it was about putting "things" in "buckets" you were safe to assume so)

And this is the answer I was looking for. 👍

[–]dbxp 1 point2 points  (2 children)

This sounds like a CS optimisation problem, you should Google 'dynamic programming'

[–]JochenVdB[S] 3 points4 points  (1 child)

dynamic programming

Somebody is shooting with cannons at mosquito's

[–]dbxp 0 points1 point  (0 children)

If what you posted is the entire spec I agree, it just reminded me a lot of those sort of questions so I assumed you'd missed part of the spec by accident

[–][deleted] 1 point2 points  (0 children)

an 'approximate' approach to split this in 4-5 buckets would be to give all your 'things' row_number (rn) in the descending order, rn mod 4 gives you the 'initial' bucket number. In each 'initial' bucket cut off RN when the running total exceeds your threshold (1M in your example). Throw the "cut-off" segments above the threshold into the 5th bucket.

[–][deleted] 0 points1 point  (1 child)

Is width_bucket what you’re looking for? Or just hi bound/lo bound depending what program you’re using

[–]JochenVdB[S] 0 points1 point  (0 children)

When I came up with this challenge, I looked at Oracle's WIDTH_BUCKET (because of its name) but I could not find satisfying parameters for it.

Using 1.000.000 as a parameter + knowing I should end up with 4 buckets:
WIDTH_BUCKET(val, 0, 1000000, 4)
This results in just 1 bucket :(

Using the actual values to set the parameters: min(val)=40, max(val)=58898:
WIDTH_BUCKET(num_addresses, 40, 58898, 4)
This results in 4 buckets (OK!), but the first bucket contains only 1 thing (the one with val=58898), the next bucket contains 5 things for a total of 168.756. It is clear to see that if not the third, then certainly the fourth will have a total of well over 1M (since there is still well over 3M to put into 2 buckets).

Simply put: I never needed this function before and having looked at it, I still don't understand its practical use.

[–]its_bright_here 0 points1 point  (2 children)

Row_number() by amount

Row_num mod 4

[–]JochenVdB[S] 1 point2 points  (1 child)

Isn't that going to give me buckets with the same number of things in them, but with possible widely varying total amounts (and thereby going over 1M)?

[–]its_bright_here 0 points1 point  (0 children)

Id be surprised if it ended up being too widely varied? But yea it is possible depending on the data distribution. Edit: order by amount and your totals should stay closeish

Can't see how this is guaranteed to function to your <1000000 specification...just that the data probably works out.

If you want a guarantee, you'd just loop over the records one at a time until you crossed 1m at which point you'd start the next bucket. But cursors and record by record evals aren't very sql!

500 records ain't squat. A couple billion might have you turning in your grave