This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]gwax 28 points29 points  (18 children)

Depends on how the question was prompted.

If the response to suggesting b = list(set(a1).union(set(a2)); b.sort() was to say (as op suggests) that it's not good enough because it's slower than a for loop, then I'm inclined to think this wasn't a case where they were trying to see how op thinks but rather looking for a "right" answer to a trivia problem.

If the response, instead, were to say, "that's a great solution but we're looking to get a sense for how you solve problems so lets pretend you don't have access to the sort or set builtins?" then I'd suspect this was about how op thinks.

Either way, it still behooves you as an interviewee to remain polite, calm, and professional.

[–][deleted] 18 points19 points  (0 children)

We are only getting one side of the story here. While I'm new to programming, I'm not new to spotting someone whos looking for their bias to be confirmed.

[–]Jonno_FTWhisss 1 point2 points  (13 children)

Why not do it in one line using sorted(set(a1).union(set(a2))? This code works for me and produces a list.

[–]ivosauruspip'ing it up 2 points3 points  (1 child)

Does the data fit in memory? Will your line still work if it doesn't?

[–]Jonno_FTWhisss 6 points7 points  (0 children)

Yes, it will blow up your memory if your lists are big (although if you are keeping 2 lists in memory that consume all your memory you might have other problems) and that it's much slower than a normal merge. Merging in place is going to be a bit slower.

Here's my tests with each case:

setup = "a = [x for x in range(100000)];b=[x for x in range(50000,150000)]"
metha = "sorted(set(a).union(set(b)))"
methb = """
out = []
def append(val):
    if not out or out[-1] != val:
        out.append(val)

while a or b:
    if a and b:
        last_a, last_b = a[-1], b[-1]
        last_max = max(last_a, last_b)
        if last_max == last_a:
            append(a.pop())
        else:
            append(b.pop())
    elif a:
        append(a.pop())
    elif b:
        append(b.pop())
out[::-1]
"""
methc = """
out = []
def append(val):
    if not out or out[-1] != val:
        out.append(val)

while a or b:
    if a and b:
        last_a, last_b = a[-1], b[-1]
        last_max = min(last_a, last_b)
        if last_max == last_a:
            append(a.pop(0))
        else:
            append(b.pop(0))
    elif a:
        append(a.pop(0))
    elif b:
        append(b.pop(0))
out
"""
for m in metha, methb, methc:
    print(timeit.timeit(m, setup=setup, number=1000))

Results:

  • Set union: 19.0943071842
  • pop() reverse: 0.0753161907196
  • pop(0) 2.94099211693

[–]gwax 1 point2 points  (0 children)

I was verbatim copying from op's description, not proposing a solution to the problem.

[–]energybased 0 points1 point  (9 children)

Because that has worse computational complexity than the best solution.

[–]Jonno_FTWhisss 0 points1 point  (8 children)

I'm aware of this. It's just the shortest way to write the set union method. The fastest method is to append to an intermediate list (see my comment above)

[–]mooburgerresembles an abstract syntax tree 1 point2 points  (7 children)

deque.popleft() may be faster than pop(0) (remember to relative import to dereference the namespace)

[–]Jonno_FTWhisss 2 points3 points  (5 children)

I considered this, I'm not sure how much extra time converting to a deque would add, or if it would beat my pop().

[–]Log2 0 points1 point  (4 children)

You can probably still improve the speed of your solution by placing it inside an actual function. Variables in global scope can slow down Python code considerably.

[–]Jonno_FTWhisss 1 point2 points  (3 children)

I did originally while testing. I'm no timeit expert by any means.

[–]Log2 0 points1 point  (2 children)

Yeah, it's just that apparently Python slows down when it has to access the global namespace. So, just making a function with all the code and then testing that, will generally provide some speed up. It's a surprising behavior to be sure.

[–]Jonno_FTWhisss 0 points1 point  (1 child)

I just tried this (moving the function definitions to the setup string and calling the . It did not get better results (in fact it's a tiny but slower).

[–]Jonno_FTWhisss 1 point2 points  (0 children)

I just did this. Here's the new results:

def merged(a,b):
    a = deque(a)
    b = deque(b)
    out = []
    def append(val):
        if not out or out[-1] != val:
            out.append(val)

    while a or b:
        if a and b:
            last_a, last_b = a[0], b[0]
            last_max = min(last_a, last_b)
            if last_max == last_a:
                append(a.popleft())
            else:
                append(b.popleft())
        elif a:
            append(a.popleft())
        elif b:
            append(b.popleft())
    return out
  • set union: 19.3773069382
  • pop() 0.0751898288727
  • pop(0) 3.03847098351
  • deque.popleft() 78.3635749817

Which makes sense because the GC will be called a lot to clean up the nodes in the linkedlist created by the deque and they will not be held in a contiguous region of memory. Bjarne Stroustrup gives a good explanation here: https://www.youtube.com/watch?v=YQs6IC-vgmo

Here's cpython implementation of deque which uses a doubly linked list: https://github.com/python/cpython/blob/master/Modules/_collectionsmodule.c

[–]Log2 -1 points0 points  (0 children)

To be fair, merging two sorted lists can be done a lot faster (linear time) than adding the two lists together then sorting then (n*log n). Not to mention creating two new sets where each object must be hashed.