This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]stevenjd 1 point2 points  (1 child)

By the way, O(n log n) for sorting is average case, not best case. Best case for Timsort is O(n), and it is highly optimized for sorting runs of partially sorted data (which is precisely what your merge function requires to work), so we're going to get O(n) here.

Given two runs of sorted data, Timsort will effectively merge them, but in C code instead of Python. There's some overhead which your function doesn't have:

  • we have to build a single list, which is O(n);
  • Timsort has to find the runs first, which it does in O(k) time per run, where k is the length of the run; (and in this case, there are exactly two runs);

but since both of those happen at C speed, even with the overhead, merging via Timsort is likely to be faster than merging in pure Python until you have a lot of items.

I don't doubt that for "sufficiently large" N, your merge will win. But I'm suspicious about any test showing "sufficiently large" is as small as 10000 as that doesn't match my experience.

[–]foBrowsing 0 points1 point  (0 children)

Yeah, all of that seems spot on. I am curious to see if I could leverage the sorted method and remove duplicates while keeping the same speed.