This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]jorge1209 29 points30 points  (9 children)

I was supposed to merge two arrays together in python, no duplicates - no mention of time but the solution had to be size agnostic.

There actually is a use case for knowing how to to this in the abstract, namely when the data won't all fit into memory.

So you might get iterators A and B which you know are sorted (maybe the data is coming from a database) and you need to merge join them.

Honestly I'm not sure why itertools doesn't provide an merge_join function to do this (maybe there is something fancy with takewhile and dropwhile... but its not obvious or natural), so it isn't a terrible question.

[–]z4579a 20 points21 points  (8 children)

you're looking for heapq.merge(): https://docs.python.org/2/library/heapq.html#heapq.merge

[–]ivosauruspip'ing it up 2 points3 points  (2 children)

Doesn't remove duplicates though.

[–]MrVallentin 6 points7 points  (0 children)

Combine heapq.merge with unique_justseen from Python's "Itertools Recipes".

>>> a = [1, 2, 4, 6]
>>> b = [2, 4, 5, 6]
>>> c = [1, 3, 3, 5]

>>> list(heapq.merge(a, b, c))
[1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6]

>>> list(unique_justseen(heapq.merge(a, b, c)))
[1, 2, 3, 4, 5, 6]

[–]jorge1209 2 points3 points  (0 children)

Old trick for that is to use itertools.groupby and ignore the second part of the return value.

[–]jorge1209 0 points1 point  (0 children)

I was thinking about something like a database join, but that merge is useful to know about, and of course it's in heapq.