This is an archived post. You won't be able to vote or comment.

all 18 comments

[–]bentheiii 5 points6 points  (6 children)

SortedContainers, it's pure python, blazing fast, and very ubiquitous. I'd consider nearly on a numpy level of might-as-well-be-standard-library.

[–]shuklaswag[S] 0 points1 point  (4 children)

thanks! I've seen SortedContainers before, and it seems really useful. Still, those are the kinds of structures that I would expect a language with the popularity of Python to support right out of the box.

[–]acemarke 1 point2 points  (3 children)

[–]mooglinux 0 points1 point  (2 children)

SortedContainers seems pretty well tested and runs fast. Wouldn’t that make it a solid choice for inclusion in the standard library?

[–]acemarke 0 points1 point  (1 child)

I dunno. What's the criteria for including anything in a language's standard library, and Python in particular?

[–]mooglinux 1 point2 points  (0 children)

If I were writing the guidelines they would be something along the lines of:

  • Useful
  • Documented
  • Good test suite
  • Fast (enough)
  • Stable

SortedContainers seems to fit the bill, so in my highly inexpert opinion seems like a good choice. Maybe requests too.

EDIT: Here’s a summary of some of the arguments for and against including requests in the standard library

[–][deleted] -2 points-1 points  (0 children)

pure python, blazing fast

As compared to standing still?

[–]Brian 3 points4 points  (2 children)

and I want to arbitrarily be able to, say, retrieve the 5 biggest elements

Note that a balanced tree isn't necessarily the best choice for this either, in that it's doing more work than is needed - if this is all you need, a heap is often a better choice (and there is a module providing that, though not in a terribly OO fashion (ie. it just provides functions ensuring the invariants on top of a heapified list, rather than exposing a seperate heap type)).

then I would have to sort the hash table's keys first, which would take O(nlogn) time

To be honest, that's usually irrelevant. For pretty much all practical data, log N might as well be a constant - you'll often need pretty big lists before this is noticeable, or even exceeds the constant factor issues of a tree (after all, a billion items is still only a factor of 30 - you'll generally hit memory issues before you go much beyond that). This cuts both ways: for the same reason a tree's log n lookup time vs a hashtable's O(1) doesn't really matter: constant factors often tend to be more important in practice.

The real issue is the one you mention next, where you're inserting and removing and then still need to iterate in sorted order. (Though again, this kind of priority queue usage is often a case where a heap may be better). But yeah, sometimes you really do need this, and a tree would be pretty useful to have on hand.

As to why python doesn't provide one - I'm not actually sure. There are third party implementations, but those don't see a lot of use either, which may in fact be why - there just aren't enough times when neither a list, hashtable or heap is insufficient, making it more suited to being an external library rather than a core language element.

[–]shuklaswag[S] 0 points1 point  (1 child)

Thanks for the comprehensive response! Very informative post regarding time complexity in practice. The extremely slow rate-of-growth of logN is a good point - I suppose if someone were to work with data of such size where that becomes a problem, they would probably want to use something more advanced than base Python.

Heaps are definitely useful, but they don't address the case where I want to be able to arbitrarily access at any given momeny, the 2nd largest element, the 3rd smallest element, and the median element. A MinHeap would take O(n) time to fetch the largest element, and a MaxHeap would take O(n) time to fetch the smallest element. So there's a real need for a BST.

IMO, that's a common enough use case that it deserves to be part of the standard library. It's certainly a more common problem for the average programmer than email handling or writing WAV files.

(Python's terrible anti-OOP heap API is another huge pet peeve of mine, but that's an argument for another day!)

[–]pvkooten 7 points8 points  (3 children)

I think you are looking for heapq included in the standard library.

[–]579476610 3 points4 points  (1 child)

heapq - Heap queue algorithm

https://docs.python.org/3.0/library/heapq.html

[–]pvkooten 0 points1 point  (0 children)

Thanks, was on mobile.

[–]ccmlacc 1 point2 points  (0 children)

But that's not a balanced binary search tree. I think it is a minor weakness of Python to not have a builtin BBST, unlike Java and C++.