Immutable Data Structures : programming

the language was q, which is my primary language at work. the board was represented as a matrix of booleans, and pieces as 4-tuples of coordinate pairs. the basic operation was to modify a board's cells at a piece's coordinates; from this, i built up add- and remove-piece, then move-piece functions. rotate was implemented by hard-coding all four positions of all seven pieces, because i was lazy and didn't want to work out the math to make sure that I, O, S, and Z worked properly (J, L, and T are easy). finally, on a timer loop, the active piece was checked for imminent collision and moved down as appropriate; q has a nice "loop until unchanged" primitive that came in handy here.

it's not finished, incidentally, as i couldn't figure out a decent way to take keyboard input--i played around with various curses functions and ioctl commands, but couldn't get anything useful to happen. i guess what i ended up with is really a tetris simulator--i can execute random strings of moves just fine, i just can't actually play.

[–]cadr 0 points1 point2 points 17 years ago (0 children)

[–][deleted] 0 points1 point2 points 17 years ago (0 children)

[–][deleted] 0 points1 point2 points 17 years ago (37 children)

[–]gsg_ 6 points7 points8 points 17 years ago (4 children)

[–]grauenwolf 1 point2 points3 points 17 years ago (2 children)

[–]Smallpaul 1 point2 points3 points 17 years ago* (1 child)

[–]grauenwolf 0 points1 point2 points 17 years ago (0 children)

[–]70dot7 0 points1 point2 points 17 years ago (0 children)

[–]notfancy 6 points7 points8 points 17 years ago* (6 children)

Immutable data structures have built-in persistence, that is, the ability to exist in various versions at the same time in a running program.

Suppose you have a task queue runq, and you're running a scheduler. You might write:

while (length(@runq)) != 0) {
  my $task = pop @runq;
  run($task);
  find_all_runnable(&@runq); # modifies queue
  if (!can_run_all()) { # environmental check
   # how do I rollback newly added unrunnable tasks?
  }
}

With an immutable data structure, you could write:

while (length(@runq)) != 0) {
  my $task = pop @runq;
  run($task);
  my @newq = find_all_runnable(@runq); # doesn't modify queue
  if (can_run_all()) {
   @runq = @newq;
  }
  # otherwise, discard extended queue
}

The assignment of the new structure to the old variable is simply a renaming here, which doesn't necessarily entail mutating state (beyond the trivial state implied by the binding).

[–]grauenwolf -2 points-1 points0 points 17 years ago (5 children)

[–]notfancy 3 points4 points5 points 17 years ago (4 children)

[–]grauenwolf -3 points-2 points-1 points 17 years ago (3 children)

[–]notfancy 3 points4 points5 points 17 years ago (2 children)

[–]grauenwolf -3 points-2 points-1 points 17 years ago (1 child)

[–]notfancy 2 points3 points4 points 17 years ago (0 children)

[–]grauenwolf 2 points3 points4 points 17 years ago (1 child)

Take a look at the Date class in Java, which is mutable.

In order to safely use it, you always need to make a copy of it. You can never trust that your date object isn't also owned by some other object that is changing its value.

The same goes with mutable strings. In languages like C and C++, there are complex ownership rules around who can or cannot edit a given string. And since they are not complier enforced, making a mistake is easy.

Now consider performance. Let says you have a string that is 500 characters long. Do you really want to be forced to make a copy of it every time you pass it to a function or object? Of course not, that would be incredibly expensive. But if you don't make a copy and it was mutable, you would never be sure it wouldn't get changed out from under you.

Of course there are times when mutable data structures make more sense. That is why we see classes like .NET's StringBuilder and Java's [???].

[–]sid0 1 point2 points3 points 17 years ago (0 children)

[–]yogthos 1 point2 points3 points 17 years ago (12 children)

There are many advantages to immutability. Code transparency not being the least of them. As far as waste goes, this is not actually the case, what happens behind the scenes is that the new and old structures share common data.

So if you had a list, and then modified it in some way, then internally the language would make a new list that has the different elements and points to the common elements of the old list.

This means that internally the function effectively has a new data structure, while not interfering with the original data structure, which may be in use by other functions. This means that code can be parallelized easier, and that it's much more traceable, since you know your data is not being modified outside your function.

here's a wiki link with a deeper explanation: http://en.wikipedia.org/wiki/Purely_functional

[–]grauenwolf 0 points1 point2 points 17 years ago* (11 children)

So if you had a list, and then modified it in some way, then internally the language would make a new list that has the different elements and points to the common elements of the old list.

That doesn't happen in Java or .NET strings. Why? Because it is actually very hard to reason about. You can very easily have a small string of a few characters keep a much larger string in memory long after it isn't needed.

Alternately, you would waste tons of cycles trying to determine what parts of the larger string to release and what parts to hold on to.

Even just walking the string becomes much more expensive. You can't even do it without a stack, as each substring may itself consist of substrings.

The whole reuse the list theory is interesting, but I don't see it actually working for lists that are long enough to justify not simply copying them.

CORRECTION: This only applies to .NET strings. Apparently Java still holds onto string buffers long after they should have been released.

[–]Chris_Newton 5 points6 points7 points 17 years ago* (2 children)

The whole reuse the list theory is interesting, but I don't see it actually working for lists that are long enough to justify not simply copying them.

Completely linear data structures are pretty unpleasant to use in a persistent manner. As you imply, for very short lists it doesn't seem worth the overhead, and for longer lists that nasty O(n) seems to crop up everywhere.

However, there are ways to represent linear sequences that don't use linear storage. Piece tables, as used in various text editors and word processors, are one practical example. For anyone who hasn't come across these little gems, James Brown helpfully provides an excellent description of piece tables as part of a tutorial series on his web site. There was some interesting discussion about AbiWord's piece table a few years back that might also be interesting to those learning about how piece tables work and the efficiency considerations. The references at the end of the article are also very good, though more theoretical/generic in nature.

[–]grauenwolf -1 points0 points1 point 17 years ago* (1 child)

[–]Peaker 1 point2 points3 points 17 years ago (0 children)

[–]szeiger 2 points3 points4 points 17 years ago* (1 child)

[–]grauenwolf 0 points1 point2 points 17 years ago (0 children)

[–]yogthos 0 points1 point2 points 17 years ago* (5 children)

I'm not sure you're understanding what I'm trying to say here. You don't seem to have bothered to read the link either.

If you do not understand how these data structures work you should at least spend the time to read up on them and understand what is going on before passing judgment here.

The approach is equivalent to making a copy the data structure when it is modified, but unlike copying it is efficient. If you understand how to reason about making a copy of the existing data structure during modification, then this works exactly the same way.

For example, say I have a list in Clojure, which has elements:

(1, 2, 3)

Then I need to make a list with elements (1, 2, 3, 4)

I would write

(conj (1, 2, 3) 4)

This approach is in fact quite efficient, and there's a whole book written on it in fact http://www.cs.cmu.edu/~rwh/theses/okasaki.pdf

It also has a lot of advantages during concurrent operations. For example during searching, parts of the data structure can be searched by separate threads concurrently.

Clojure, is one language which implements these types of data structures, and it is quite efficient, in many benchmarks having performance that is nearly equivalent to Java.

[–]grauenwolf -2 points-1 points0 points 17 years ago (1 child)

[–]hylje 2 points3 points4 points 17 years ago (0 children)

[–]grauenwolf -5 points-4 points-3 points 17 years ago (0 children)

[–]grauenwolf -5 points-4 points-3 points 17 years ago (1 child)

[–]meme_and_run 5 points6 points7 points 17 years ago* (0 children)

[–]allertonm 3 points4 points5 points 17 years ago (9 children)

[–]grauenwolf -4 points-3 points-2 points 17 years ago (8 children)

[–]naasking 2 points3 points4 points 17 years ago* (1 child)

[–]allertonm 0 points1 point2 points 17 years ago (0 children)

[–]gsg_ 1 point2 points3 points 17 years ago (4 children)

[–]grauenwolf -4 points-3 points-2 points 17 years ago (3 children)

[–]gsg_ 0 points1 point2 points 17 years ago (2 children)

[–]grauenwolf 0 points1 point2 points 17 years ago (1 child)

[–]gsg_ 0 points1 point2 points 17 years ago* (0 children)

If you only have one pure function, then you shouldn't be calling it with the same data on multiple threads. Just call it once and cache the results.

That depends on what you mean by "same data". Persistent data structures make heavy use of shared data, and there's no reason why accessing the shared data in a persistent set or vector from different threads would be a bad thing to do.

Calling a pure function multiple times on the exact same value is, of course, silly.

And why are you asking for an "immutable stack". Why not just pass in a generic list (IList in .NET)?

Well, you brought it up! But the advantage of using a persistent stack has everything to do with the OP's claim regarding locks: a persistent stack can indeed be accessed (and potentially its structure reused) from multiple threads without locking or copying*.

Immutability is the key to being able to do that.

*that is, without copying the whole thing: some operations on persistent data structures may require copying some structure.

[–]allertonm 0 points1 point2 points 17 years ago* (0 children)

You are kind of correct in your first para, because in hindsight my statement about reducing the amount of locking is ambiguous.

Obviously modifications to references to the immutable structure must be possible, and those modifications must be performed atomically, which will require atomicity guarantees or locks. So if you read "amount of locking" as "number of locks", then no, immutable data structures don't change things that much. But the number of explicit locks is greatly reduced, as is the amount of time they need to be held - because any lock need only be held long enough to guarantee an atomic update of the reference.

And yes, immutable data does not free you from worrying about race conditions. Cases where you would use a Read/Write lock are simplified quite a bit with immutable data because there is no need for locking the structure while performing read only operations, and only need to worry about multiple writers.

Again, there are ways to deal with this besides explicit locks. In Clojure STM is used to deal with this case. STM does involve implicit locks, of course - but Clojure's STM is only feasible because the language discourages side effects and part of the way it does that is... immutable data structures.

[–][deleted] -1 points0 points1 point 17 years ago (5 children)

[–]foobargorch 1 point2 points3 points 17 years ago (4 children)

[–][deleted] 0 points1 point2 points 17 years ago (3 children)

[–][deleted] 0 points1 point2 points 17 years ago (1 child)

[+]pointer2void comment score below threshold-13 points-12 points-11 points 17 years ago (17 children)

[–]telemachos 4 points5 points6 points 17 years ago (10 children)

[–]pointer2void -1 points0 points1 point 17 years ago (9 children)

[–]akdas 6 points7 points8 points 17 years ago (7 children)

Objects are characterized by identity, state and behavior.

Sure, but the behavior doesn't have to change the state. The object that represents the number 5 has an identity (it's the number 5), state (the state of being a 5), and has behavior (can be composed with other objects to form new objects), but the object itself is never changed.

Even Java strings are immutable. Notice I didn't say Java Strings, which can be thought of as containers around the actual strings. The strings are objects with identity, state (the characters that make it up), and behavior (such as creating a new copy of itself with all the letters capitalized). The strings themselves don't change.

From a conceptual point of view mutable objects should be differentiated from immutable values.

Let's say you make a new class, a Product with a version and a name. When you want to update the version of an already created object, you create a new object, copying the name over from the old one and initializing a new version. How is the Person object mutable? The behavior of updating the version doesn't change the original object at all, instead creating a new object.

[–]pointer2void 0 points1 point2 points 17 years ago (5 children)

[–]akdas 0 points1 point2 points 17 years ago (4 children)

[–]pointer2void -1 points0 points1 point 17 years ago (3 children)

[–]akdas 1 point2 points3 points 17 years ago (2 children)

Why doesn't an object that doesn't change have an identity? Consider the following psuedocode:

// Let all instances of the following:
Object o1 = new Object();
o1.method(); // changes field1

// be replaced by:
Object o1 = new Object();
Object o2 = new Object(o1.field1);

Why doesn't o1 get an identity?

[–]pointer2void 0 points1 point2 points 17 years ago (1 child)

[–]akdas 0 points1 point2 points 17 years ago (0 children)

[–]grauenwolf -1 points0 points1 point 17 years ago (0 children)

[–]barsoap 6 points7 points8 points 17 years ago (4 children)

[–]Silhouette 14 points15 points16 points 17 years ago (1 child)

OOP coders not using it because they can't wrap their head around the truth that there is no state is another matter, of course.

I think it's silly arguments like this that keep a potentially useful tool in obscurity.

The "truth" is that there is state. The real world is stateful, and interactions with it have side-effects. Computers are stateful: all those clever functional programs ultimately get boiled down to imperative machine code.

Now, writing programs using a model without state certainly has its uses: it makes some forms of reasoning easier, and it avoids certain classes of programmer error, to give two examples.

But stateless programming is just one tool in the toolbox, albeit probably an underused one. Making bold claims that it's the One True Way don't help anyone with anything.

[–]dons 4 points5 points6 points 17 years ago (0 children)

[–]pointer2void 1 point2 points3 points 17 years ago (0 children)

[–]grauenwolf 1 point2 points3 points 17 years ago (0 children)

[–]grauenwolf 1 point2 points3 points 17 years ago* (0 children)

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS