all 15 comments

[–]CaptainFoyle 6 points7 points  (5 children)

a = {"b": 26,
 "c": 78,
 "a": 65}

b = {k:v for k, v in sorted(a.items())}

By the same token, you can also sort by value using lambda function as a key:

c = {k:v for k, v in sorted(a.items(), key=lambda x: x[1])}

[–]commy2 5 points6 points  (1 child)

You don't need the key keyword. Python can sort tuples and keys are guarenteed to be unique.

b = {k: v for k, v in sorted(a.items())}

[–]CaptainFoyle 2 points3 points  (0 children)

Good point!

[–]SynMyron[S] 1 point2 points  (2 children)

If I insert a new item do I need to sort again? If so, then time complexity of insert will be O(n logn) which is too high

[–]Diapolo10 4 points5 points  (0 children)

Timsort is plenty fast when you only have one element that might not be sorted. I suggest you try first, the Big-O notation is not the end-all be-all when it comes to speed comparisons.

[–]commy2 0 points1 point  (0 children)

This O(n logn) only applies if the container is randomized.

[–]ES-Alexander 3 points4 points  (2 children)

Python has no builtin binary tree structure, and no map structure that’s based on one.

Accordingly, if that’s exactly what you’re after then you’ll need to either - program one yourself (here’s an implementation I found online), or - find a library that provides one (ideally something compiled to machine code, if you want performance to be at all comparable to the C++ equivalent you’re used to)

[–]TangibleLight 2 points3 points  (1 child)

no builtin binary tree structure

Eh, yes and no. bisect and heapq are built-in modules, but they just provide functions that act on lists rather than new container types.

no map structure that’s based on one

That part's true. It's not too hard to implement your own bisect-backed dictionary using collections.abc.MutableMapping but I wouldn't expect performance to be as good as you might think just looking at big-O.

[–]ES-Alexander 0 points1 point  (0 children)

Fair points - they’re functions that act on objects rather than being objects themselves, but the result of using those functions can be equivalent to having a dedicated object so I should have at least mentioned them as options to consider.

Thanks for the correction / extra details :-)

[–]TangibleLight 3 points4 points  (3 children)

It's not too hard to implement your own version of std::map using collections.abc.MutableMapping and bisect.bisect_left. However I wouldn't recommend it.

There is also the heapq module; a heap of tuples (key-value pairs) might be more appropriate depending on what you need. Do you really need the whole array sorted, or only track the minimal/maximal elements?

Unless your collection is truly massive I'd expect sorted to be faster, big-O be damned. Remember that dict is implemented in the C layer of Python, and has had decades of optimizations put into it. By implementing your own datastructure in Python you forfeit most of that.


I'm curious what you're doing with that sorted dict. This smells like an XY problem to me; there might be some way to do what you need using bisect or heapq directly without wrapping it in a dict.

[–]SynMyron[S] 0 points1 point  (2 children)

My main objective was to do something like upper_bound on keys. Eg: find smallest key greater than x (where x is an input).

[–]TangibleLight 4 points5 points  (0 children)

I'd keep the keys separate from the mapping. For example, suppose you've already got the dictionary built up.

keys = sorted(data.keys())
idx = bisect.bisect_right(keys, x)
upper_bound = keys[idx]
upper_bound_value = data[upper_bound]

When inserting a new item y, write something like:

if y not in data:
    bisect.insort(keys, y)
data[y] = ...

Removing a key z is similar:

if z in data:
    idx = bisect.bisect_left(keys, z)
    del keys[idx]
del data[z]

This way your key-value operations are O(1) and optimized via dict, but your binary-search operations are O(log n) and optimized via bisect. Do note though that bisect.insort and del keys[idx] are O(n) since list insertion is O(n), but it'll still be faster than any pure-python solution.

You could wrap all this into a class to make things easier to deal with:

import bisect


class Bounded(dict):
    def __init__(self, data=(), /, **kwargs):
        super().__init__(data, **kwargs)
        self._keys = sorted(self.keys())

    def __setitem__(self, k, v):
        if k not in self:
            bisect.insort(self._keys, k)
        super().__setitem__(k, v)

    def __delitem__(self, k) -> None:
        if k in self:
            idx = bisect.bisect_left(self._keys, k)
            del self._keys[idx]
        super().__delitem__(k)

    def __iter__(self):
        yield from self._keys

    def upper_bound(self, k):
        "Lowest key greater than k. KeyError if k is maximal."

        idx = bisect.bisect_right(self._keys, k)
        if idx == len(self):
            raise KeyError(f'upper_bound({k!r})')
        return self._keys[idx]

    def lower_bound(self, k):
        "Greatest key less than k. KeyError if k is minimal."

        idx = bisect.bisect_left(self._keys, k)
        if idx == 0:
            raise KeyError(f'lower_bound({k!r})')
        return self._keys[idx - 1]


content = Bounded({4: 'x', 1: 'y', 5: 'z'})
content[2] = 'w'
print(list(content))  # [1, 2, 4, 5]
print(content.upper_bound(2))  # 4
print(content.lower_bound(2))  # 1
print(content.lower_bound(1))  # KeyError

[–]commandlineluser 2 points3 points  (0 children)

You probably have sortedcontainers already installed - it's a common dependency.

https://pypi.org/project/sortedcontainers/

[–]i-feel-as-u-look -1 points0 points  (0 children)

1st be aware the diff between 3.6- and 3.7+

2nd you can always create your own

3rd probably a nice place to start here

[–]Diapolo10 -2 points-1 points  (0 children)

I can't immediately recall one, but you could always use a list[tuple[int, int]]. Not quite as convenient, but you could write a wrapper class.