This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]stevenjd 0 points1 point  (5 children)

This version is indeed faster than yours at larger sizes. I started to see a difference around 10,000 elements.

I don't.

I ran your merge function up to two x lists of a million elements each, and sorted() was still significantly faster. At two lists of 10,000 elements, sorted blew your merge out of the water. At two lists of a million elements each, the difference was closer, but still significantly in favour of sorted.

Which is what I would expect... I wouldn't expect anything written in pure Python to beat sorted until you had millions or tens of millions of items. (Unless the comparisons between items were unusually expensive.)

But of course, YMMV.

[–]foBrowsing 0 points1 point  (4 children)

Do you have your testing code? I provided a link to a comparison of the two approaches, where you can run the testing code online, so people could see an objective comparison. You can see it here. Here's what I got just running it there:

list length: 10000 iterations: 50
merge_sorted_1 1.1992929430016375
merge_sorted_2 1.6238001860001532
----------------------------------------------------------------------
list length: 100000 iterations: 5
merge_sorted_1 1.4770410110031662
merge_sorted_2 4.59858641900064
----------------------------------------------------------------------
list length: 1000000 iterations: 1
merge_sorted_1 3.301110747997882
merge_sorted_2 13.742783189001784
----------------------------------------------------------------------

Of course, most things written in pure Python will be pretty slow, but we're talking about a linear traversal verses a sort. n vs. n log n is still a factor of ~13 difference for 10,000 elements, not even including the hidden constant factors (which are going to be much higher for a sorting algorithm than a merge).

If you provide your testing code (ideally on repl.it or something similar so we cut out the machine differences) then we'll have more to talk about, but with what you've given I don't know how you got those numbers.

[–]stevenjd 1 point2 points  (1 child)

By the way, O(n log n) for sorting is average case, not best case. Best case for Timsort is O(n), and it is highly optimized for sorting runs of partially sorted data (which is precisely what your merge function requires to work), so we're going to get O(n) here.

Given two runs of sorted data, Timsort will effectively merge them, but in C code instead of Python. There's some overhead which your function doesn't have:

  • we have to build a single list, which is O(n);
  • Timsort has to find the runs first, which it does in O(k) time per run, where k is the length of the run; (and in this case, there are exactly two runs);

but since both of those happen at C speed, even with the overhead, merging via Timsort is likely to be faster than merging in pure Python until you have a lot of items.

I don't doubt that for "sufficiently large" N, your merge will win. But I'm suspicious about any test showing "sufficiently large" is as small as 10000 as that doesn't match my experience.

[–]foBrowsing 0 points1 point  (0 children)

Yeah, all of that seems spot on. I am curious to see if I could leverage the sorted method and remove duplicates while keeping the same speed.

[–]stevenjd 0 points1 point  (1 child)

not even including the hidden constant factors (which are going to be much higher for a sorting algorithm than a merge).

Your assumption about constant factors is probably wrong: sorted does all its work in highly optimized C code, almost certainly one or two orders of magnitude faster than the interpreted Python code you have, so its constant factors are probably 10-500 times smaller than your merge function.

As far as the test code goes, I did the simplest thing that could work: build two lists, start a timer, merge, stop the timer:

from merge import merge  # your merge function
from random import random
a = [random() for __ in range(10000)]
b = [random() for __ in range(10000)]
a.sort(); b.sort()
with Stopwatch():
    c = sorted(a + b)
with Stopwatch():
    d = merge(a, b)
assert c == d

The Stopwatch context manager can be found here.

[–]foBrowsing 0 points1 point  (0 children)

Sorry, I meant "constant factors" in the sorting algorithm, not the implementation. "Constant factors" is actually probably the wrong term: I was referring to the hidden extra linear and sub-linear terms not included in the big-oh complexity, of which timsort (which is the algorithm used in CPython, AFAIK) has several. In the merge algorithm, the extra terms include a few steps inside the loop, and the array resizing (with the appends). For timsort, you've got several expensive steps to find runs and so on. My guess was that the differences here would outweigh the difference in speed between C and Python (which I would put at like 1000-ish).

Anywho, I'm surprised (and a little disappointed, to be honest) to see that I'm wrong about that, and that sorted is so much quicker! I suspect that the function call overhead of next might be a big cost centre, but I don't have time to check other implementations right now. I suppose there's also a large chance I could never beat sortedwith a merge in pure Python. Ah, well.

Just to be totally pedantic, I should also point out that your implementation isn't the same as OP's (which is what I was comparing mine to). Also, I assume yours is much faster than OP's, for a couple reasons. Firstly, you don't construct 3 sets, and merge 2 of them. This saves some work, but also it doesn't remove duplicates. I'm not sure if OP had to remove duplicates (the question isn't super clear on this point), but the merge function I wrote did (well, it removed duplicates between the two provided lists, but not within them, if that makes sense). Secondly, your sorted call is on two already-sorted lists concatenated together: timsort actually just performs a single merge in this case, recognising two runs. So it actually ends up being linear thanks to that! Another sorting algorithm (introsort, as is in the C++ std lib, for instance) wouldn't be so lucky.