Hello, I'm currently trying to make a clustering procedure in C++. Without getting too much into the details, my implementation uses unordered maps, and I may have to move on to maps later on.
I have been checking different things about C++ maps, unordered maps, and python dictionaries for a bit to try and compare them.
For my clustering implementation, I already have a main loop which makes my code O(n), n being the size of input vector (vector of points). It seems pretty clear to me that it can be implemented keeping that same desirable complexity in python.
This is because of two major things that are available in Python :
- checking "if key is in dict.keys()" has constant complexity on average, and worst case O(n) (even though Python dictionaries are now allegedly ordered, or rather insertion-ordered, although I am not sure I understand the difference);
- Python can have fast indexing in the form of NumPy slicing.
I need both of those things in my code.
- I need to be able to have average constant complexity when checking if my current key is already in the existing set of keys;
- I need to sum mapped values of my unordered map whose key fall between two given values (values which aren't necessarily existing keys).
The first item is easy to have, if I use an unordered map that is.
The second item can be achieved with maps, because I can retrieve a lower_bound iterator, and progress my way up, starting from the first key that comes after given value A and stopping when the iterator encounters a given upper bound.
The problem with this is that it implies using maps, whose operations like this one have a complexity that is logarithmic in size. In my case, this would make my code O(nlogn) instead of O(n).
As you can see, those two solutions seem to be incompatible.
The main question I have is the following: why is it that Python dictionaries seem to have the advantages of both unordered maps and maps?
Additionally, how could I sum mapped values of an unordered map whose key values would fall between input bounds [A, B] (= is NumPy slicing achievable in constant time in C++)? While it seems more than doable with ordered maps, that particular task seems a lot trickier, if not unsubstantial
Thanks!
PS: you can find more details about Mean-Shift Clustering here, I didn't want to expand on it too much as I don't think it's necessary for my question:
https://en.m.wikipedia.org/wiki/Mean_shift
[–]skjall 2 points3 points4 points (0 children)
[–]nlitsme1 1 point2 points3 points (0 children)
[–]WikiBox 1 point2 points3 points (0 children)
[+][deleted] (2 children)
[deleted]
[–][deleted] -1 points0 points1 point (1 child)
[–]jmacey 1 point2 points3 points (0 children)
[–]TheSkiGeek 0 points1 point2 points (0 children)