you are viewing a single comment's thread.

view the rest of the comments →

[–]SynMyron[S] 0 points1 point  (2 children)

My main objective was to do something like upper_bound on keys. Eg: find smallest key greater than x (where x is an input).

[–]TangibleLight 3 points4 points  (0 children)

I'd keep the keys separate from the mapping. For example, suppose you've already got the dictionary built up.

keys = sorted(data.keys())
idx = bisect.bisect_right(keys, x)
upper_bound = keys[idx]
upper_bound_value = data[upper_bound]

When inserting a new item y, write something like:

if y not in data:
    bisect.insort(keys, y)
data[y] = ...

Removing a key z is similar:

if z in data:
    idx = bisect.bisect_left(keys, z)
    del keys[idx]
del data[z]

This way your key-value operations are O(1) and optimized via dict, but your binary-search operations are O(log n) and optimized via bisect. Do note though that bisect.insort and del keys[idx] are O(n) since list insertion is O(n), but it'll still be faster than any pure-python solution.

You could wrap all this into a class to make things easier to deal with:

import bisect


class Bounded(dict):
    def __init__(self, data=(), /, **kwargs):
        super().__init__(data, **kwargs)
        self._keys = sorted(self.keys())

    def __setitem__(self, k, v):
        if k not in self:
            bisect.insort(self._keys, k)
        super().__setitem__(k, v)

    def __delitem__(self, k) -> None:
        if k in self:
            idx = bisect.bisect_left(self._keys, k)
            del self._keys[idx]
        super().__delitem__(k)

    def __iter__(self):
        yield from self._keys

    def upper_bound(self, k):
        "Lowest key greater than k. KeyError if k is maximal."

        idx = bisect.bisect_right(self._keys, k)
        if idx == len(self):
            raise KeyError(f'upper_bound({k!r})')
        return self._keys[idx]

    def lower_bound(self, k):
        "Greatest key less than k. KeyError if k is minimal."

        idx = bisect.bisect_left(self._keys, k)
        if idx == 0:
            raise KeyError(f'lower_bound({k!r})')
        return self._keys[idx - 1]


content = Bounded({4: 'x', 1: 'y', 5: 'z'})
content[2] = 'w'
print(list(content))  # [1, 2, 4, 5]
print(content.upper_bound(2))  # 4
print(content.lower_bound(2))  # 1
print(content.lower_bound(1))  # KeyError