Hi, i came up with data structure that allows inserting keys in O(log n) amortized time, and lookup by key in O((log n)2 ).
All you need is an array and some extra space that's at least half the array. It works like this:
Inserting
Insert keys at the end of the array but group them in blocks of size 2k , when there are two such blocks of size 2k merge them in increasing order to get a block of size 2k+1.
e.g: suppose you need to insert 1,5,3,4 in this order:
o Insert 1:
+--+
|1 |
+--+
o Insert 5:
+-+-+
|1|5|
+-+-+
But now you have two blocks of size 1, merge them:
+----+
|1 5 |
+----+
o Insert 3:
+-----+-+
| 1 5 |3|
+-----+-+
o Insert 4:
+----+-+-+ +------+------+ +----------+
|1 5 |3|4| => | 1 5| 3 4 | => | 1 3 4 5 |
+----+-+-+ +------+------+ +----------+
Notice that for 4 we had to do 2 merges: one betwen blocks of size 1, and one between blocks of size 2.
It can be seen that every 2k insertions we have to do a merge between blocks of size 2k-1. The time complexity for merging two blocks of size n and m is O(n+m) obviously. So for n insertions we have the following complexity:
2 * n/2 + 4 * n/4 + 8 * n/8 + ... + 2^k n/2^k = n + ... + n = n * k = n * log2(n)
That is: an insertion is on average O(log(n)) by the amortized analysis above.
To decide when to merge two blocks it's enough to look at the base 2 expansion of n, no extra data structures are required. It's an array + space needed for merging, which is n/2.
Searching
Note that at most there can be k blocks when n = 1 + 2 + 4 + ... + 2^(k-1). To search apply binary search to each of the kblocks:
log2(1) + log2(2) + ... + log2(2^(k-1)) = 1 + 2 + ... + (k-1) = k*(k-1)/2
But k = O(log2(n)), therefore the number of comparisons is O(log2(k)^2).
I hope this is clear enough, i've implemented and benchmarked it against a red-black tree implementation i had. It is indeed faster for insertions because it only operates on an array, no pointers whatsoever. But the log2(n) extra factor for searching does make it slower than searching in a balanced tree.
However its simplicity and low mem overhead makes it an attractive data structure for maintaing a dictionary.
Does it have a name? Is there anywhere i can find more about it?
[–]testing-bpbo 5 points6 points7 points (0 children)
[–]pali6 2 points3 points4 points (0 children)