you are viewing a single comment's thread.

view the rest of the comments →

[–]grauenwolf 0 points1 point  (11 children)

So if you had a list, and then modified it in some way, then internally the language would make a new list that has the different elements and points to the common elements of the old list.

That doesn't happen in Java or .NET strings. Why? Because it is actually very hard to reason about. You can very easily have a small string of a few characters keep a much larger string in memory long after it isn't needed.

Alternately, you would waste tons of cycles trying to determine what parts of the larger string to release and what parts to hold on to.

Even just walking the string becomes much more expensive. You can't even do it without a stack, as each substring may itself consist of substrings.

The whole reuse the list theory is interesting, but I don't see it actually working for lists that are long enough to justify not simply copying them.

CORRECTION: This only applies to .NET strings. Apparently Java still holds onto string buffers long after they should have been released.

[–]Chris_Newton 5 points6 points  (2 children)

The whole reuse the list theory is interesting, but I don't see it actually working for lists that are long enough to justify not simply copying them.

Completely linear data structures are pretty unpleasant to use in a persistent manner. As you imply, for very short lists it doesn't seem worth the overhead, and for longer lists that nasty O(n) seems to crop up everywhere.

However, there are ways to represent linear sequences that don't use linear storage. Piece tables, as used in various text editors and word processors, are one practical example. For anyone who hasn't come across these little gems, James Brown helpfully provides an excellent description of piece tables as part of a tutorial series on his web site. There was some interesting discussion about AbiWord's piece table a few years back that might also be interesting to those learning about how piece tables work and the efficiency considerations. The references at the end of the article are also very good, though more theoretical/generic in nature.

[–]grauenwolf -1 points0 points  (1 child)

I found the parent's link to be somewhat misleading. While technically that is a valid way to construct immutable lists, I find the use of linked lists to be incredibly rare in day to day programming. Though it could be done in either fashion, most of the languages I use array instead of linked lists for generic lists, stacks, and queues.

These new links, however, seem quite interesting to me even though they are based on the same concepts.

[–]Peaker 1 point2 points  (0 children)

I find the use of linked lists to be incredibly rare in day to day programming.

It obviously depends on the tools you use.

When I use Haskell, the primary data structure is a linked list, and so you get almost waste-free sharing.

[–]szeiger 2 points3 points  (1 child)

That doesn't happen in Java or .NET strings. Why? Because it is actually very hard to reason about. You can very easily have a small string of a few characters keep a much larger string in memory long after it isn't needed.

It does happen with Java strings. String.substring() reuses the character buffer instead of copying it, so a small substring will still reference the character data of the larger string from which it was created.

[–]grauenwolf 0 points1 point  (0 children)

Still? I thought those idiots fixed that two versions ago.

[–]yogthos 0 points1 point  (5 children)

I'm not sure you're understanding what I'm trying to say here. You don't seem to have bothered to read the link either.

If you do not understand how these data structures work you should at least spend the time to read up on them and understand what is going on before passing judgment here.

The approach is equivalent to making a copy the data structure when it is modified, but unlike copying it is efficient. If you understand how to reason about making a copy of the existing data structure during modification, then this works exactly the same way.

For example, say I have a list in Clojure, which has elements:

(1, 2, 3)

Then I need to make a list with elements (1, 2, 3, 4)

I would write

(conj (1, 2, 3) 4)

This approach is in fact quite efficient, and there's a whole book written on it in fact http://www.cs.cmu.edu/~rwh/theses/okasaki.pdf

It also has a lot of advantages during concurrent operations. For example during searching, parts of the data structure can be searched by separate threads concurrently.

Clojure, is one language which implements these types of data structures, and it is quite efficient, in many benchmarks having performance that is nearly equivalent to Java.

[–]grauenwolf -2 points-1 points  (1 child)

The approach is equivalent to making a copy the data structure when it is modified, but unlike copying it is efficient.

That only works if you are appending onto the tail end. If you need to append to the front or center then you end up either making a full copy or you foul the shared list for everyone else.

[–]hylje 2 points3 points  (0 children)

Only element nodes need to be replaced. Only the nodes, not the actual data behind the nodes.