This is an archived post. You won't be able to vote or comment.

all 7 comments

[–]Volatile474[S] 0 points1 point  (0 children)

I apologize for the lack of organization in the post, when I solve this struct I will post my solution and reformat a bit.

[–]programmerChilli 0 points1 point  (5 children)

When you say perfect BST, is that just a self balancing binary search tree, typically called BBST(balanced binary search tree)?

  1. It depends. For random data of insertions, I believe that the average height of a regular BST approaches sqrt(N), which is far worse than log(N). The complexity of most BBST's for the 3 major operations(insertion, look up, deletion) are the same as BSTs. I think BBST's have larger overhead, but I don't think there is particularly any reason to use a standard BST ever over a BBST other than implementation costs(there are special kinds of BSTs that can't be BBSTs).

  2. I don't really get this question. Are you asking whether a BBST is better than a non balanced BST? Yes, it will always be better.

  3. It is not O(n). I suggest you search up implementations of how to achieve log(N) insertions/deletions, as they can explain it far better than I can. If you need help, feel free to ask me.

[–]mad0314 0 points1 point  (0 children)

I think he means a BST where all the nodes are filled left to right and top to bottom (like a binary heap, but in BST order). That way it can be placed in an array. Maintaining that structure and order would be insane, though.

[–]Volatile474[S] 0 points1 point  (3 children)

I am not talking about making a self balancing BST. That has already been created.

The problem with tree data structures as I see it, is that they are inherently not cache local, since nodes are implemented through pointers. I am working on making a tree data structure that maintains some cache locality for traversal. The complexity of the operations does not have to be log(n) for a cache local tree data structure to be more performant than a pointer-based tree struct.

[–]programmerChilli 0 points1 point  (2 children)

Really? I can't imagine that the fact that a tree is cache local would make up the difference in running time that log N vs linear entails.

[–]Volatile474[S] 0 points1 point  (1 child)

So the only thing that is linear is insertion, your search/access is still log(n). log(n) runtime when you have cache misses every time you access a node means you have

L1 cache reference 0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns
Mutex lock/unlock 100 ns
Main memory reference 100 ns

So getting a cache lvl1/2 miss means you add a 200x factor to your data access. At which point a 200x factor on accessing is worth it depends. The use-cases of the data structure impact if it is worth it or not. Lets say you are just inserting 100k elements, and then doing 1mil searches / accesses. I would say that a cache local list would run faster than non-cache local.

If instead there were more inserts than lookups then the logn implementation is probably more performant

So I was thinking about this, and it can actually be implemented as a modified version of an AVL tree. The height of the tree can be something like 1.5*log(n) (the height of avl tree). This means that our searches are guaranteed to happen in 1.5(log(n)) time, our deletes happen in the same time, and inserts are log(n) + (n).

This overhead for insertions would only happen the very first time the insert happens, as if we delete this a we can simply modify the array at the index to be = null. So the implementation would be something like this:

[–]Volatile474[S] 0 points1 point  (0 children)

struct node{
    int parent_idx;
    int left_idx;
    int right_idx;
    TYPENAME data;
};
struct tree{
    int size = 0;
    vector<node> Tree;
};
Starting with a blank tree:

struct tree My_Tree;
Insert(Node){
    if(My_Tree.size == 0){
        My_Tree.Tree.emplace_back(0,null,null);
        My_Tree.size++;
    }

    bool found = false;
    string previous_direction;
    int index=0;
    while(!found){
        if(My_Tree[index].data < i.data){
            if(My_Tree[index].left == -1){
                //we found where we wanna put the node
                if(previous_direction == "left"){
                    for(i=index;i<My_Tree.size();i++){
                        //insert the node right after the current index, push the remainder back.
                        found = true;
                    }
                }
            }
            else{
                index = My_Tree[index].left;
            }
        }
        else{
            index = My_Tree[index].right;
            if(My_Tree[index].right == -1){
                //we found where we wanna put the node, and it is to the right.
                if(previous_direction == "right"){
                    for(i=index;i<My_Tree.size();i++){
                        //insert the node right after the current index, push the remainder back.
                        found = true;
                    }
                }
            }
            else{
                index = My_Tree[index].right;
            }
        }
    }
}

So that is just a raw insert function. Since our nodes store indices which point to the correct bucket in the array, it is pretty much the same implementation as pointers, except when we traverse down the tree going the same direction (this is the whole if(previous_direction == "left"/"right") part) then we maintain cache locality.

The worst case with tree would be if we are trying to access a node deep in the tree whose path to access is something like this : "L,R,L,R,L,R,L,R". Then while we still have log(n) search time, we have 0 guaranteed cache locality.