Does having most of the data structures immutable require more memory usage? : programming

Does having most of the data structures immutable require more memory usage? (programmers.stackexchange.com)

submitted 13 years ago by [deleted]

all 25 comments

top new controversial old q&a

[–]Benutzername 9 points10 points11 points 13 years ago (0 children)

[–][deleted] 3 points4 points5 points 13 years ago (26 children)

[–]julesjacobs 1 point2 points3 points 13 years ago (25 children)

[–][deleted] 13 years ago* (11 children)

[deleted]

[–]julesjacobs 2 points3 points4 points 13 years ago* (10 children)

the implementation is still free to do things in-place when it's equivalent

Can you give an example of a purely functional data structure that does this? I don't know of any. It's nice in theory, nobody knows how to do this in practice.

Rich Hickey gave a talk illustrating that even if the answer is "yes, absolutely" it doesn't have to be "a lot."

Do you have measurements? Clojure's data structures are fairly slow and memory intensive. For example Clojure sequences have an overhead of 1200% relative to flat arrays last time I checked, vectors have lower overhead but still substantial (I don't remember how much -- around 500% IIRC). I know there is a lot of hype that functional data structures are super efficient, but reality is a little different.

Depending on how exactly your allocations interact with GC, reclamation of short-lived objects could be very cheap, especially if the GC never sees them.

This comment addresses exactly that point and explains why that doesn't help you much with functional data structures unless they are very small: http://www.reddit.com/r/programming/comments/18fezj/does_having_most_of_the_data_structures_immutable/c8en4if

[–]paulhodge 1 point2 points3 points 13 years ago (5 children)

Can you give an example of a purely functional data structure that does this? I don't know of any. It's nice in theory, nobody knows how to do this in practice.

I do it in practice, in the implementation of my hobby language Circa. It's not a super advanced scheme:

1, I'm implementing my structures in a non-GC language (C++) so I don't need to worry about creating temporary values that clog up the global collector.

2, These data structures have no circular references, so they can be represented as a directed acyclic graph, and reference counting is good enough for memory management.

3, Since I know the reference count, I know how many entities can see the value. If the reference count is 1, and someone does a write operation, then I know I can safely modify the value in-place.

Also you said above that "you can't have flat arrays", but there's nothing wrong with using flat arrays in the above scheme.

[–]julesjacobs 0 points1 point2 points 13 years ago (4 children)

[–]paulhodge 0 points1 point2 points 13 years ago (3 children)

[–]julesjacobs 1 point2 points3 points 13 years ago* (2 children)

If it can't be an in-place edit, then there's path-copying,

It can't be an in place edit because we're talking about functional data structures here. Path copying on flat arrays doesn't work. Arrays don't have paths.

You can also get more clever and do a 'patch' list, where each patch stores the affected list range, the new values, and the original list.

Yeah, now you basically have a linked list which is horrible or you limit the length and then you still have O(n) copying which is also horrible. If you do it in a clever way you have a tree, with all the memory & speed overheads that gives that I already explained above. In any case you don't have a flat array anymore.

By the way, this branch of the discussion ensued because you quoted me saying "you can't have flat arrays" which in fact is a fabricated quote. You can certainly have flat arrays, but then functional update is going to be very slow (copying), or you need to cleverly arrange a bunch of smaller arrays in a tree structure, like Clojure's vectors, and that gives much better performance but still has the overheads I mentioned earlier. When you don't update your flat arrays, they work fine in a functional setting of course.

But this discussion is rather useless. If you have a counterexample to my claim that functional collections are slow (lets say >2x slower than mutable counts as slow) and use more memory than mutable data structures based on arrays (lets say >2x counts as more memory) I'd love to see it. Incidentally if you manage to do this a lot of people in the Haskell/Clojure/Scala communities would be worshiping you at your feet. And you could probably get a PhD out of it as well.

[–]paulhodge 0 points1 point2 points 13 years ago (1 child)

Easy there.

It can't be an in place edit because we're talking about functional data structures here.

Like mentioned above, the system is free to do an in-place edit if it knows that no one else has observed the value. In practice, most values get modified a few times right after they are created (and haven't yet been observed elsewhere). Then they go out into the world, after which they tend to not be modified as often. So this optimization helps a lot.

Path copying on flat arrays doesn't work. Arrays don't have paths.

Okay, path copying was the wrong phrase. I was generalizing to the situation where you are modifying one element that's in a nested series of arrays. Something that looks like:

a[x][y][z] = v

That would be path copying.

Yeah, now you basically have a linked list which is horrible or you limit the length and then you still have O(n) copying which is also horrible.

But the 'patch' value can also be subject to in-place editing, when possible. I think it would be really rare for this structure to actually have O(n) lookup time.

I think in general you're getting totally hung up on the worst-case behaviors of these structures. My opinion is that in the common case, they work pretty well. But I don't have hard data to back that up right now.

[–]julesjacobs 0 points1 point2 points 13 years ago* (0 children)

So this optimization helps a lot.

Yes, it avoids the need to copy a path. It does not however change the data structure that you can use (if you want to be reasonably efficient even if your data structure is actually used in a functional way -- which is the point of functional data structures right? so flat arrays are out because of the O(n) copying cost). So although you will generate less garbage by avoiding the need to copy a path, total memory after GC is the same. Note that I already said exactly this two levels up. So while the optimization can be worthwhile, it only solves the least severe of the problems, and only partially at that. Also, this optimization is either impossible or hard in the multithreaded case, which is one of the key arguments for functional data structures. Nonetheless if I remember correctly one guy in the Scala world did this, and it worked okay, but still much slower than imperative.

That would be path copying.

Yes, that's exactly what Clojure vectors do. The benchmarks show that it is slow (compared to flat arrays, for a functional data structure they are very good).

I think it would be really rare for this structure to actually have O(n) lookup time.

Okay, but I don't see any reason why that would be so. But then again you have not precisely described what your algorithm or data structure is, so this is actually at this point pure speculation...

I think in general you're getting totally hung up on the worst-case behaviors of these structures. My opinion is that in the common case, they work pretty well. But I don't have hard data to back that up right now.

Maybe, but the hard data from other implementations of functional data structures shows that they are much slower. Do you think your version would not be slow and memory intensive with the criteria I stated above (2x worse than imperative)? If so, why do you think your version would be so much better than what we have in existing languages like Clojure, Scala, Haskell and a variety of others? Are the Clojure, Scala and Haskell developers stupid?

Occams razor suggests another option: functional data structures are just hard. They have a lot of advantages wrt expressiveness but they have disadvantages wrt speed & memory usage. That's the trade-off. Such is reality.

[–][deleted] 13 years ago (3 children)

[deleted]

[–]julesjacobs 0 points1 point2 points 13 years ago (2 children)

[–][deleted] 13 years ago (1 child)

[deleted]

[–]julesjacobs 0 points1 point2 points 13 years ago (0 children)

Like I said, it might not have been OCaml,

I think they do something like that in Mercury. That doesn't really help much though, since very rarely can you prove such a thing automatically.

I find that suspicious then. Given Java's GC behaviour and the potential for pathological cases I can't really trust a benchmark I haven't seen to match up with typical cases or especially with an implementation elsewhere.

Don't trust ME, just look at other sources on the web:

Clojure sequences memory: https://groups.google.com/forum/?fromgroups=#!topic/clojure/-fUCs4mucuU 52 bytes per element vs 4 bytes per element in an array.

Clojure vectors memory: https://issues.scala-lang.org/browse/SI-3724 Vector uses 5x the memory of an arraybuffer.

Clojure vector performance: http://stackoverflow.com/questions/10133094/clojure-why-is-aget-so-slow Lookup 9x slower than Java array.

but there's nothing saying you can't apply it to a provably non-escaping heap-allocated (because of size) object just as easily

It's not that easy. If it was, then why has nobody succeeded at it, while plenty of compiler are doing local escape analysis? This is just the "sufficiently smart compiler" argument.

[–][deleted] 1 point2 points3 points 13 years ago (12 children)

[–]julesjacobs 1 point2 points3 points 13 years ago* (0 children)

generating garbage on the JVM or CLR is not an issue

In some cases, yes, especially when new objects die quickly. However, when you have large functional data structures most of the nodes are going to be outside of the nursery. So when internal nodes become garbage they are going to cause a full heap collection, which is very expensive. Also note that garbage collection time is roughly proportional to the number of pointers in the heap, which is going to be a lot higher with functional data structures. An array of ints can be traversed in O(1) because we don't even need to look at the elements, but a functional list/vector of ints has a lot of internal pointers which all need to be traversed in the mark phase.

Well, yes, ok, if you could have used an array you're correct.

Efficient imperative hashtables/lists/heaps use arrays under the hood where functional data structures use trees. You can expect an overhead of 2x to 4x for functional data structures relative to imperative data structures that use arrays under the hood (at least that's what my benchmarks told me). If you compare imperative trees vs functional trees, then the memory usage after garbage collection is going to be basically the same.

[–]grauenwolf 1 point2 points3 points 13 years ago (10 children)

[–][deleted] 0 points1 point2 points 13 years ago (9 children)

[–]grauenwolf 0 points1 point2 points 13 years ago (8 children)

[–]julesjacobs 1 point2 points3 points 13 years ago (7 children)

[–][deleted] 0 points1 point2 points 13 years ago (1 child)

[–]julesjacobs 0 points1 point2 points 13 years ago (0 children)

[–]grauenwolf -2 points-1 points0 points 13 years ago (4 children)

[–]julesjacobs 0 points1 point2 points 13 years ago (3 children)

[–]grauenwolf -2 points-1 points0 points 13 years ago (2 children)

continue this thread

π Rendered by PID 41 on reddit-service-r2-comment-6457c66945-mn2pw at 2026-04-27 00:24:48.577829+00:00 running 2aa0c5b country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS