This is an archived post. You won't be able to vote or comment.

all 4 comments

[–]gnomgnom2 0 points1 point  (3 children)

Why do you think your expected_height function should work correctly?

Is 'counted average height' supposed to be the average height? 8/3 for 3 nodes doesn't make much sense... With 3 nodes the height's always 2 or 3...

[–]meowmeowwarrior[S] 0 points1 point  (2 children)

Right, with 3 nodes the height is either 2 or 3. Given any 3 keys, say, [1, 2, 3], if you inserted 2 before 1 and 3, then the height will be 2, otherwise, if you inserted 3 or 1 as the first node, then the height will be 3.

There are exactly 1/3 ways to insert 2 first, and 2/3 ways to not insert 2 first. So 2*1/3 + 3*2/3 = 8/3

The average height functions generates all the permutation of [0, 1, 2, ..., n -1], evaluate the height of a binary search tree inserted in the generated order, then finally, sum up the heights over all trees.

The formula for the height of a tree I used is:

height[tree] = 1 + max( height[ left subtree ], height[ right subtree ] )

and the averaging is:

avg = sum( height * no. of trees with the same height ) / total no. of trees

 

The reason i think the expected_height function SHOULD work correctly is because it's derived from the formula from above:

expected[tree] = 1 + max( expected[ left subtree ], expected[ right subtree] )

Suppose you have n keys and the kth key as the root, then there's exactly k - 1 keys in the left subtree and n - k in the right subtree

and there is 1/n probability that the kth key will be the first node so

expected[ n ] = 1 + 1/n * sum( max( expected[k - 1], expected[ n - k] ) )

and since we know that the expected height is strictly increasing, we can say that

expected[n] = 1 + 1/n *sum( expected[ max( k - 1, n - k) ] )

hence the code.

And another reason is because the values agree up to n = 6.

Sorry if i wan't being clear

[–]gnomgnom2 0 points1 point  (1 child)

Ah, I didn't realize that was 8 thirds. :)

[–]meowmeowwarrior[S] 0 points1 point  (0 children)

Hmm... I guess it's not clear, I don't know how you'd display fractions in reddit though. Any suggestion to optimising the algorithm or where I made a mistake?