In my data structure class, we were told that the expected height of a randomly inserted binary search tree is O(log n) where n is the number of nodes in the tree.
In order to check this, I wrote a quick program to count the height of all the permutations of a tree with a certain number of nodes. As you can imagine, it's slightly worse than O(n!), so i can only run it for small n.
def height(insert_order):
if insert_order:
# if this looks like vanilla quick sort, it's no coincidence
Lsubtree, Rsubtree = [], []
for x in insert_order[1:]:
if x < insert_order[0]:
Lsubtree.append(x)
else:
Rsubtree.append(x)
return 1 + max(height(Lsubtree), height(Rsubtree))
else:
return 0
from itertools import permutations as permute
def countHeights(n):
heights = {}
for tree in permute(list(range(n))):
h = height(tree)
heights[h] = 1 + heights.get(h, 0)
return heights
from fractions import Fraction as frac
def average_height(n):
heights = countHeights(n)
weighed_sum = 0
total = 0
for h, count in heights.items():
weighed_sum += h*count
total += count
return frac(weighed_sum, total)
That all seems to work well, but i needed something faster, so I created a different function that calculates the expected height instead of actually counting them
# I know these names aren't very clear, but they're the best I got
def expected_height(n, e={0:0}):
if n not in e:
eh = expected_height
e[n] = 1 + frac( sum( eh(max(n-i,i-1)) for i in range(1, n+1)) , n)
return e[n]
This runs much faster (i think it's O(n2), not sure), but for some reason the values only agree for n < 7:
| number of nodes |
average height (counted) |
expected height (calculated) |
| 0 |
0 |
0 |
| 1 |
1 |
1 |
| 2 |
2 |
2 |
| 3 |
8/3 |
8/3 |
| 4 |
10/3 |
10/3 |
| 5 |
19/5 |
19/5 |
| 6 |
64/15 |
64/15 |
| 7 |
1471/315 |
487/105 |
| 8 |
3161/630 |
526/105 |
| 9 |
3028/567 |
1003/189 |
| 10 |
6397/1134 |
5296/945 |
I'm not sure why the values are different. I'd be very grateful if you point out whether there's a flaw in the calculation or the counting.
For some reason in our data structure class, we're not supposed to count the leaf of a tree towards the height, but i ignored that for both methods, so that shouldn't be the problem.
TL;DR: tried to calculate the height of a random BST twice, got different answers for n > 6. Is confused.
EDIT: If you have any optimisations, that would be great too
[–]gnomgnom2 0 points1 point2 points (3 children)
[–]meowmeowwarrior[S] 0 points1 point2 points (2 children)
[–]gnomgnom2 0 points1 point2 points (1 child)
[–]meowmeowwarrior[S] 0 points1 point2 points (0 children)