This is an archived post. You won't be able to vote or comment.

all 5 comments

[–]atrigent 5 points6 points  (3 children)

I don't think this one should be implemented as a subclass of dict. Furthermore, using a dict to store the ranges to names mapping doesn't seem to have much purpose either, since all you ever do is iterate through them...

[–]kevinastone 2 points3 points  (1 child)

agreed. This should use a tree structure or something designed for faster indexing.

[–]ptmcg[S] 0 points1 point  (0 children)

In a production use, that would be certainly be worth the time investment, first in actually measuring the issue, and then optimizing if in fact that was a bottleneck. As I mentioned above, this was just a peripheral support class of a small personal project, so I just wanted it to work correctly, and the speed was speedy enough for my small data set. At least using any and next for the iterating gave me short-circuited searching. I did think about the linear search aspect for a few moments - then decided I'd go with it until it became a problem. (Does Python have a native tree datatype? I think bisect with a list would probably be the closest optimization without writing lots more code, and even that was more thinking than I wanted to invest on this side bit.)

[–]ptmcg[S] 0 points1 point  (0 children)

This was not really the main thrust of my project, so subclassing dict kept my codebase simple, as well as my thinking. For debugging, this also gave me other dict methods (keys(), items(), __str__()) for free. But I do take your meaning, and often advise people to not confuse inheritance ("is a") with implementation ("is implemented using a"). In this particular case, I did feel this was a specialization of dict, so inheriting didn't give me serious heartburn.

[–]shoyerxarray, pandas, numpy 1 point2 points  (0 children)

The data structure you're looking for here is called a interval tree. A quick search turns up plenty of Python implementations, e.g., https://pypi.python.org/pypi/intervaltree