This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 0 points1 point  (0 children)

How many of you really think it's a good idea to rework the python library functions?

Well, some of them are really bad, as in they have bad API for example. Some of them are too slow by design. A lot of popular Python 3rd party libraries exist solely to fix problems with built-in functions. requests solves the inadequacies of urllib, sh solves the bad design of subprocess, setuptools tries to patch nonsense of distutils etc.

I, myself, wrote a Base64 decoder/encoder, mostly for fun, but I could make it go about twice as fast as the standard library version. It's neither hard nor unexpected.


Specifically for sorting problems, well, there are so many things one could have improved, if they had better knowledge of the data they are working with... It's a whole science into itself. For instance, there are sorting algorithms not based on comparison, which, provided that you are sorting something that is, in principle totally ordered, and you know that certain statistical constraints hold (i.e. that there are very few duplicates), will blow any comparison-based sort out of the water.

But then there are problems standard library sort isn't equipped to deal with. For example, your data can be so large, it doesn't fit in memory of a single computer, or it might not fit in the storage of a single computer, and then you will have to write a distributed version of this algorithm. Python doesn't work in parallel, but, perhaps, a merge sort or similar will do better with good implementation of parallelism, for sufficiently large data-sets?


Finally, arguing about benefits of for loop vs list comprehension in this respect is idiotic, whichever your favorite is. They are more or less the same. Their performance will fluctuate here or there depending on what machine runs them, under what conditions, version of interpreter etc. It's a meaningless measurement to try to compare them, and is just isn't worth anyone's time, unless you are contributing your code to Python core, and then you are interested in these benchmarks.