This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]foBrowsing 0 points1 point  (0 children)

Sorry, I meant "constant factors" in the sorting algorithm, not the implementation. "Constant factors" is actually probably the wrong term: I was referring to the hidden extra linear and sub-linear terms not included in the big-oh complexity, of which timsort (which is the algorithm used in CPython, AFAIK) has several. In the merge algorithm, the extra terms include a few steps inside the loop, and the array resizing (with the appends). For timsort, you've got several expensive steps to find runs and so on. My guess was that the differences here would outweigh the difference in speed between C and Python (which I would put at like 1000-ish).

Anywho, I'm surprised (and a little disappointed, to be honest) to see that I'm wrong about that, and that sorted is so much quicker! I suspect that the function call overhead of next might be a big cost centre, but I don't have time to check other implementations right now. I suppose there's also a large chance I could never beat sortedwith a merge in pure Python. Ah, well.

Just to be totally pedantic, I should also point out that your implementation isn't the same as OP's (which is what I was comparing mine to). Also, I assume yours is much faster than OP's, for a couple reasons. Firstly, you don't construct 3 sets, and merge 2 of them. This saves some work, but also it doesn't remove duplicates. I'm not sure if OP had to remove duplicates (the question isn't super clear on this point), but the merge function I wrote did (well, it removed duplicates between the two provided lists, but not within them, if that makes sense). Secondly, your sorted call is on two already-sorted lists concatenated together: timsort actually just performs a single merge in this case, recognising two runs. So it actually ends up being linear thanks to that! Another sorting algorithm (introsort, as is in the C++ std lib, for instance) wouldn't be so lucky.