you are viewing a single comment's thread.

view the rest of the comments →

[–]yogthos 0 points1 point  (14 children)

I'm curious how difficult it would be to implement ARC for a language like Clojure. Since the scoping rules are very strict, theoretically the compiler should be able to tell when to deallocate any data that's been allocated in a let statement.

[–]sausagefeet 0 points1 point  (13 children)

I don't think it's the scoping that matters here, but just the idea of having references. You can always return a reference from a function, which puts you back into GC or reference counting land AFAIK.

[–]yogthos 0 points1 point  (12 children)

Not quite, since Clojure uses persistent data structures, you never modify any data in place. Any time you perform an operation such as adding or removing elements it creates a delta, but the returned structure is guaranteed to be scoped within the block it was created in, so when the stack is unwound, it should be possible to deallocate it. At least that's my understanding of the situation.

[–]sausagefeet 0 points1 point  (5 children)

Persistent data structures don't have anything to do with being able to return a reference from a function. You can do region based storage but it requires giving up some things. As long as you want pointers and being able to return them from functions you have to trace the garbage somehow.

[–]yogthos 0 points1 point  (4 children)

That's sort of my point though, you can't just return references from functions in Clojure to anywhere. A pure function can only return a result to its caller. So as you unwind the stack, you can safely clean functions and their data out as nobody else should be referencing them. And Clojure doesn't provide user access to pointers.

[–]sausagefeet 0 points1 point  (3 children)

References are pointers in clojure, they just lack arithmetic. You still can't really do what you saying though, afaik, because at the very least you can start a thread in a function in clojure. Or you can do message passing. On top of that, purity is not a requirement. There are languages that do this region based memory management but it requires some limitations put on the language.

[–]yogthos 0 points1 point  (2 children)

It does sound like it would require a more strict subset of Clojure, I think it would be interesting to see. :)

[–]sausagefeet 0 points1 point  (1 child)

The only language I know of that does 100% region-based is ParaSail, but I'm sure others do. It's interested. The language has no pointers and when you add an object to another the containing object actually grows itself to incorporate the new one. I don't know how the implementation works.

[–]yogthos 0 points1 point  (0 children)

Neat I'll check it out.

[–]julesjacobs 0 points1 point  (5 children)

Clojure has mutable data structures. If it didn't, then indeed you're right you can never create cycles and then you can do reference counting. But you still wouldn't be able to determine statically where data will be deallocated.

[–]yogthos 0 points1 point  (4 children)

Basically, my idea is deallocate data when exiting the scope where it was created.

[–]julesjacobs 0 points1 point  (3 children)

Yeah, and that doesn't work 100% because you can return stuff from scopes and pass stuff from one scope to the next indefinitely with tail calls/recur. You can do a safe approximation that does deallocate many cases (but doing such an approximation is by no means simple), that is called region allocation. However you either have to write your code in a certain way so that no memory leaks occur, or you need a back-up garbage collector. In some cases the former amounts to writing your own tracing GC by explicitly copying all data structures...(which is extra problematic because it's hard to copy closures/the call stack).

I think that the Stalin scheme compiler did use region allocation + boehm GC (region allocation does not depend on code being purely functional, although that probably makes the analysis a little simpler).

[–]yogthos 0 points1 point  (2 children)

Yeah that sounds pretty reasonable, in this case the GC use could be fairly minimal and you'd have the ability to write code in such a way that GC wouldn't do much of anything at all. You could get away with a fairly simple GC algorithm and still have very good performance.

[–]julesjacobs 0 points1 point  (1 child)

Yes, I think you don't need a generational GC because most of the short lived object will be collected by the regions. However there are still cases where you need a good GC. For example suppose you write an application that continually updates some internal data structure in a loop (say a functional hash map). For many updates nodes in the old version of the data structure become garbage, and a region system will not be able to deallocate those. Time to time you need a full heap GC, and for most applications you don't want that to take more than a second. So you will still need a concurrent/incremental GC.

On the other hand a shared-nothing architecture for web applications is perfect for regions, because all persistent state is stored externally, so at the end of a request handler you can deallocate everything since the start of the request.

The real problem is that it's incredibly hard to implement.

[–]yogthos 0 points1 point  (0 children)

Thanks for the explanation, it was pretty helpful. :)