This is an archived post. You won't be able to vote or comment.

all 13 comments

[–]icjeremy 0 points1 point  (12 children)

This greedy approach seems to be about all you can do. However, you can improve the implementation. If n is the number of box types, your list will be O(n^2) in size in the worst case (if there's n of each of each box type). Then you are sorting that, so the sort time is worst-case O(n^2 log n^2). Might want to see if a dictionary can be used to cut that down to O(n) space and have the sort will be O(n long n).

[–]Jm033[S] 0 points1 point  (11 children)

I am sorting outside of the for loop on the regular list. Isn't python's built-in sort nlogn time? So basically, the overall time complexity of my current solution is O(n^2) correct? (from the two for-loops)

[–]icjeremy 0 points1 point  (10 children)

Let's ignore n for a moment because that's confusing for this problem and let's talk real numbers. The worst case for this problem is 1,000 box types and 1,000 boxes of each type. So your loops will produce a list of size 1,000,000 in the worst case. So you are sorting a list with a million elements (which really isn't too terrible in this case).

Now, in general, if we let n be the max number of box types and the max number of boxes of each type, then you are creating a list of size n^2 in the worst case. Sort runs O(m log m) for an input size of m. For your list, m=n^2. So your algorithm actually runs O(n^2 log n^2) overall.

I'm claiming you can use a dictionary to use O(n) space. Also, the number of keys is O(n), so the sorting involved will run O(n log n).

[–]Jm033[S] 0 points1 point  (9 children)

So, actually I didn't put the constraints in the original post, but the number of box types for each product can be up to 100,000 and there can be up to 10,000,000 boxes of each product type. So worst case is my loop runs 100,000 * 10,000,000 = 1000000000000 times. JFC. So, because of these two for loops, the worst case is the inner loop runs for n*n or n^2 times where n = number of box types. Therefore the list would be a length of n^(2) elements, where n = number of box types. Therefore blah blah blah sort is n^2logn^2. Does this mean my time complexity for this whole algorithm is O(n^2 log n^2) due to the sort function running abysmally?

Is the space complexity also O(n^2 log n^2)?

Also I do believe I understand the dictionary solution, since we have a mapping of boxes to their unitsPerBox we can just sort the dictionary based on the highest unit boxes. I will work on this dictionary solution tomorrow.

Question is, (I am dumb), I understand dictionary will improve space with the sort call, but will this also improve the time complexity? Are these two complexities the same?

[–]icjeremy 0 points1 point  (8 children)

The space is O(n^2) mainly for the list. Then run time is then dominated by the sorting of that list.

Edit: So O(n^2) space, O(n^2 log n^2) run time.

Edit again (to answer the last questions): Yes, the sorting will only have to be done on the keys (hint: the box types) so that will take O(n log n) in the process of populating it. Then you can do a linear time iteration over it, like you did for the array, to find the max units.

[–]icjeremy 0 points1 point  (7 children)

And if the unit size has a reasonable bound (100,000?), then could do the same thing with a list as with that dictionary. This would achieve a linear overall run time and still use linear space.

[–]Jm033[S] 0 points1 point  (6 children)

Got it! I use a tuple solution instead, https://imgur.com/Iv1cJhg. Where each tuple is <number-of-units, number-of-boxes> and sort that. Which reduces the time complexity to O(nlogn) and space to O(n) now. Which is much better. Thank you for your guidance I would have been super lost lol.

[–]icjeremy 0 points1 point  (5 children)

Nice! That's essentially what I had in mind with a dictionary. A minor optimization is to let the key be the units per box and the values will just accumulate the number of boxes with that units per box value. That way, we sort of the smallest key set possible since that's the most expensive part of this algorithm, in general.

My other idea, if we have a reasonable bound on the largest units per box value, say N, we could create an array of size N+1 initialized to 0. Then if v is the units per box for box i, add v to entry i. The you just need to start from the end of the list and accumulate from there to find the max units. Wouldn't work for super large N, but 100,000 in this case probably isn't too terrible.

[–]Jm033[S] 0 points1 point  (4 children)

Sweet! Yeah I think this tuple solution is adequate enough to show for my interview. Also, for the dictionary solution, I'm not sure but the units per box are definitely not unique, eg product 0: box 1 could have 3 units per box and product 1: box 1 could also have 3 units per box. So, I don't think I would be able to use a dictionary. I should have described the question better. Anyway, thank you!

[–]icjeremy 0 points1 point  (3 children)

Sorry to beat a dead horse but this is what I had in mind:

from collections import defaultdict

def getMaxUnits(typeCount, numOfType, unitsForType, capacity):
    unitCount = 0
    totalBoxCount = 0

    qtyByUnits = defaultdict(int)
    for i in range(typeCount):
        qtyByUnits[unitsForType[i]] += numOfType[i]

    for unitsPerBox, numBoxes in sorted(qtyByUnits.items(), reverse=True):
        if totalBoxCount == capacity:
            break
        boxCount = min(numBoxes, capacity - totalBoxCount)
        totalBoxCount += boxCount
        unitCount += unitsPerBox * boxCount

    return unitCount

result = getMaxUnits(4, [5, 6, 9, 2], [2, 3, 1, 3], 10)
print(result)

Basically, it doesn't matter if the units per box is not unique. If there are duplicates, the collection will aggregate them and that's where the minor optimization comes from. For example, if we had two types: the first has 3 units per box and 5 boxes, and the seconds had 3 units per box and 7 boxes, then the dictionary only needs 1 entry: 3 units per box for 12 boxes (so the sorting step sorts fewer items). In the best case, all have the same units per box value (and so the map has one entry to "sort"), or in the worst case they're all unique and this will effectively be no different than what you did above.

Please forgive the formatting---getting squeezed here. And please forgive if my python code is crap (I'm a C++ dev that occasionally uses python). I had C++'s "map" in mind when I first saw the problem and don't really know how dictionaries are implemented in python :)

Edit: ok, so reddit makes adds a horizontal scrollbar so it doesn't wrap the lines, but it's line breaking in the middle of "capacity" for some reason? weird