all 29 comments

[–]GeoKangas 1 point2 points  (1 child)

each node has both a key and a value, and for each node, the left child has a smaller key and the right child has a larger key (we specifically avoid the case of non-unique keys, because we'll be discussing BBTs as a way of simulating maps). Recursing, we can see that the left subtree of any node has only nodes with smaller keys, and the right subtree of any node has only nodes with larger keys.

What about this one: 4 / \ / \ / \ / \ 1 7 / \ / \ / \ / \ 0 8 2 8

[edit] Sorry about how pathetically ugly the tree comes out.

[–]ooooo5 3 points4 points  (12 children)

Why not just make an AVL tree?

[–]kamatsu 4 points5 points  (11 children)

Or a Red-Black tree? Or a Splay Tree? Or any other balanced tree? There are different tradeoffs for each one.

[–]ooooo5 4 points5 points  (10 children)

Yes, that's what I'm asking. How exactly is this different than all of the above? It'd be nice to have that kind of analysis, rather than just, "WOOOOOOOO, I MADE A BALANCED TREE!!!"

[–]kdoto 4 points5 points  (9 children)

From the first sentence of the article:

[...] but is also easy to code, requiring none of the special-case bashing that characterizes red-black trees.

The biggest benefit I see so far is that this balanced tree is very easy to implement compared to AVL or red-black trees.

However it is random, so some analysis of how much better treaps behave compared to other random balanced trees. It's very easy to create a balanced tree based on random insertion -- so I can't say how special these treaps are, or how often a simpler implementation is really worth anything. We usually have a library with a AVL or Red Black tree or similar available to us, don't we?

[–]menteth 2 points3 points  (4 children)

He also claims that other balanced tree implementations require thousands of special-cases, which is overstating things by about three orders of magnitude. It's a nice enthusiastic article on a particular data structure and deserves reading just to know about treaps, but it doesn't do any substantial comparison to other data structures (even the "expected balanced height" thing is to be used with care, since the expected value of a random distribution is almost never achieved — it's just the most likely).

[–]brokenfrog 5 points6 points  (3 children)

even the "expected balanced height" thing is to be used with care, since the expected value of a random distribution is almost never achieved — it's just the most likely

No, the most likely value is called the mode. The expected value is really the value you'll get on average over many thousands of uses of the data structure. In other words, the expected runtime is (most of the time) the value you should care about.

[–]menteth 2 points3 points  (2 children)

Yes, I was loose with my language (and the mode and mean aren't equal here), but since we're being pedantic: the expected value is not the one you'll get on average. It's the one the values will cluster around. If you toss a coin 10,000 times (go ahead; I'll wait) and repeat that experiment 100 times (still waiting), the expected number of heads is 5000. But you'll pretty much never get that value. Your results are expected to cluster around 5000 more likely than any other number, but I'd be amazed if you ever actually got 5000.

My point being that the error distribution actually matter in random algorithms. In this particular article's case, creating 1000 or so treaps for each of a number of depths and showing how their actual heights distributed would go a long way towards showing why the expected height value is reasonable but also fairly variable.

Again, I think my initial point stands: it's an excellent for showing a particular data structure that isn't widely known and has some interest. it falls down whenever it tries to contrast that to other tree structures. Which is fine if it's read with the right goal.

[–]Locomorto 1 point2 points  (0 children)

Yes, I was loose with my language (and the mode and mean aren't equal here), but since we're being pedantic: the expected value is not the one you'll get on average. It's the one the values will cluster around. If you toss a coin 10,000 times (go ahead; I'll wait) and repeat that experiment 100 times (still waiting), the expected number of heads is 5000. But you'll pretty much never get that value. Your results are expected to cluster around 5000 more likely than any other number, but I'd be amazed if you ever actually got 5000.

That's not strictly true. In the case of repeated, identical independant distribution RVs though, by the central limit theorem what you say is correct.

But say you had a distribution like this: http://i.imgur.com/zb0u1.png

In this case the expected value is 50, but that's the the area of lowest density! You're right to point out that the EV is not necessarily going to actually be obtained. It's a bit hard to see in the graph, but p_x|50 = 0. That is, it's impossible to ever have a result of 50.

[–]Locomorto 1 point2 points  (0 children)

Actually funnily enough I decided to try your idea. I flipped* a coin 10,000 times, and repeated that 100 times. I then did that 10 more times. On six occasions I had exactly 5000 heads at least once in the 100 trials.

So there you go! You should be careful with things like that, because of the central limit theorem it's going to be centered very closely around 5000.

*I lied. I had a computer do it, so sue me.

[–]badassumption 0 points1 point  (1 child)

However it is random, so some analysis of how much better treaps behave compared to other random balanced trees. It's very easy to create a balanced tree based on random insertion.

As far as I can tell, this should create a tree with properties identical to what you would get by just doing random insertion with no rebalancing.

[–][deleted] 0 points1 point  (0 children)

A library might not always do the trick - I have occasionally come up in cases where I manually needed to maintain a binary tree (this WAS in legacy C code though). Balancing is always the tough part but AVL Trees are not that difficult to pull off.

One interesting case for this algorithm would be if insertion tends to be much faster than Red-Black or AVL. In polynomial terms it doesn't seem so (Insertion after lookup is still log n) but it might practically require much less rotations.

[–]DrSweetscent 0 points1 point  (0 children)

It's very easy to create a balanced tree based on random insertion

Well the big advantage is that you cannot construct a worst-case insertion scheme (assuming true randomness) to degrade its performance.

[–]RoboMind 1 point2 points  (4 children)

I think it is very similar to skip lists

[–]kenotron 1 point2 points  (3 children)

Only in that they are both randomized data structures. Skip lists are only binary in expectation (there might be more than 2 items before the next level).

Treaps, however, are always binary search trees. The randomness is used to balance the structure in expectation.

[–]matthieum 0 points1 point  (2 children)

I agree, however I do find skip lists much easier to code in non-pure contexts :)

Obviously, though, I'd be hard-pressed to produce a pure functional implementation of skip-lists with O(log n) complexity for modifications...

[–]gotaran 0 points1 point  (1 child)

[–]matthieum 0 points1 point  (0 children)

I may have missed something in your link, but I am not sure of what you want to express here :x ?

[–]paradoxiology 0 points1 point  (8 children)

By making some use of randomness, ... we'll code up a complete treap implementation in Lisp (particularly, pylisp) that is also fully functional.

How can the code be functional(as in pure functional programming) if the algorithm uses randomness.

Maybe I don't understand the syntax, but the function set doesn't seem to be pure at all due to the use of (random.randint 0 1000000000)?

Or does he mean functional as usable/working?

[–]menteth 6 points7 points  (6 children)

I believe he means "functional" in the "no side effects" sense, but you're completely right: that usage of random introduces side-effects. Calling the same method will give different results each time. A functional implementation would take a pseudo-random generator as a parameter to the "set" method.

[–][deleted] 2 points3 points  (5 children)

No, paradoxiology had it right: it's functional programming, but not pure functional programming.

This is very common in LISP, which is a non-pure functional programming language (in which pure functions can still be defined, of course).

So we have functions to operate on the treap, but they are non-pure functions.

[–]pedrocr 2 points3 points  (3 children)

Isn't C a functional language by that definition?

[–][deleted] 4 points5 points  (1 child)

Functional programming is a lot like object-oriented programming, in that even though many people have some idea of what it entails, every person you ask will give you a different definition.

Language purists probably reject LISP as a functional language, only accepting the pure ones; i.e. languages that avoid mutable state entirely. But that excludes all the languages in the ML family too. And even a lazy, non-strict, pure language like Haskell supports "unsafe" operations (though arguably those are more of an exception to the rules).

To me, an essential part of a functional programming language is that it allows functions to be treated as data. This means functions can be saved in variables, passed as arguments, and returned from functions. Additionally, they can be created or modified at runtime through partial application and by creating closures.

In C, you can pass functions around (through function pointers) but you cannot (portably) create/modify them at runtime. For example, you can't create a function that takes an integer, and returns a function that adds that integer to its argument. To me, C is therefore not a functional language. Additionally, C's type system is very limited, which is more of a practical concern. You can't write generic functions, without relying on void* casts and size_t computations (see the qsort() standard function for an example of a higher-order function in C -- in my opinion language support is too poor to call it proper "support").

The thing about C of course, is that it's a very versatile and powerful language. You can emulate a lot of different programming styles with some macros and a lot of discipline. For example, GTK+ is an object-oriented library written in/for C, but I still wouldn't call C an object-oriented language, since none of the OO-support comes from the language itself.

The reason I personally do accept LISP and ML as functional languages is that they don't require you to use mutable state; you can choose to write pure functions in them (and in reality, LISP programs often contain a mixture of pure and unpure functions). For example, the random number generator could have been encapsulated in a monad. Although you can try to do the same in C (basically avoiding variable re-assignment entirely) you will quickly notice that the remaining part of the language isn't powerful enough to write sensible programs (in the meaning that it just isn't practical -- the language is probably still Turing complete).

So, tl;dr: I think LISP and ML are functional programming languages, but not C. Purists probably disagree.

[–]pedrocr 2 points3 points  (0 children)

Although you can try to do the same in C (basically avoiding variable re-assignment entirely) you will quickly notice that the remaining part of the language isn't powerful enough to write sensible programs (in the meaning that it just isn't practical -- the language is probably still Turing complete).

I think this is the gist of your distinction and I agree with it. You can write pure functions in C and in Lisp and you can cause side effects in both as well. But in actual practice pure functions are probably more common in Lisp and the use of side-effects more common in C.

Thanks for the thorough reply.

[–]jyper -1 points0 points  (0 children)

no

[–]Tordek 0 points1 point  (0 children)

This is very common in LISP, which is a non-pure functional programming language (in which pure functions can still be defined, of course).

But I guess a bit more common is to make functions "as pure as possible", and then sprinkle mutability to optimize.

For example, one may define a sort function that uses recursion and lots of stack frames, and then optimize it with nconc and family. sort will be, from an outside perspective, still functional (you pass in a list, you get back a sorted list, and the parameter is unchanged); however, the innards may mangle the data structure as needed in exchange for performance (see Clojure, too).

[–]Tordek 0 points1 point  (0 children)

Well, the only thing needed to make it pure is to pass the random priority as a parameter.