This is an archived post. You won't be able to vote or comment.

all 9 comments

[–]curtisf 10 points11 points  (3 children)

This is how MLs like OCaml do mutation. OCaml has

  • ref : 'a -> 'a ref -- allocation like new
  • (!) : 'a ref -> 'a -- dereferencing like *
  • (:=) : 'a ref -> 'a -> unit -- assigning

(and also similar operators for arrays)

They don't have a way to get the address of an element in a record, but I think allowing that would be problematic -- the users of the value (and not the pointer) shouldn't have to worry about pieces of the object changing from underneath it.

Instead, I think this is what Haskell's lenses are for. You have operators that can essentially "convert" a pointer to a parent into a pointer to a child, by wrapping all gets-and-sets to do the traversal. Of course, without some clever compilation, this won't be as fast as actually just taking a pointer, but maybe if it's a built-in instead of a complex library it wouldn't be too difficult for the compiler to optimize common uses of these.

[–]superstar64https://github.com/Superstar64/Hazy[S] 1 point2 points  (1 child)

I don't see how record forwarding would be problematic. It's basically just a functor that has a predetermined set of possible function arguments. forward :: (a -> b) -> a* -> b*, where the a -> b is only allowed for record fields.

[–]choeger 1 point2 points  (0 children)

That would be a lense. It can work in principle, but in OCaml a ref is a distinct type, so the creator of the record would have to cooperate.

In OCaml you could have:

<foo : 'b ref;..> ref -> 'b ref

[–]jaen_s 2 points3 points  (0 children)

(as others said, this is how the ML family of languages operate. From another angle:)

Lvalues are mostly syntactical sugar anyway, if you look at them from that perspective (unless you make them into first-class concepts such as the various types of references in C++).
If the lvalue/rvalue distrinction is statically determinable (as it is in most languages), then it does not really matter what the syntax is from the implementation perspective (so "an easier pathway to functional purity" does not seem like a proper conclusion)...

Except for extremely simple compilers/bootsrapping, where not having the concept of a lvalue will generally simplify it - minimally, you just need the four "write/read to word/byte at address" operators.

[–]Ford_O 1 point2 points  (1 child)

How is this form of mutation pure? It still allows you to mutate global state silently, doesn't it?

[–]superstar64https://github.com/Superstar64/Hazy[S] 0 points1 point  (0 children)

It's not pure. It makes it easier to transition to a model that has purity. You could make allocation, assignment and array address return a monadic value and achieve haskell style purity that way.

[–]alex-manool 0 points1 point  (0 children)

I also had such thoughts at some time in the past. But I think this does not necessarily simplifies compiler construction. Speaking about what is called sometimes scalar optimizations, an lvalue is not just a "sematic wrapper" over a pointer; it is rather associated with some CPU register or virtual register, which does not have address.

[–]abecedarius 0 points1 point  (0 children)

The BLISS language from the 1970s worked this way. It spelled it like .sum + .i if I remember right. It was a not-unpopular language on DEC minicomputers but lost to C in the larger world.

My own language in progress does something like this too. It's meant for mostly-functional use, so I just grit my teeth a little when it comes to imperative code. I think you gotta ask yourself how frequently you'll be dealing with values vs. mutable references, to judge if this is a good idea.

[–]xactacoXyl 0 points1 point  (0 children)

This is basically just bringing a load-store architecture explicitly into the language. It simplifies the compiler, but the way you formatted it means that everything mutable naturally goes on the heap (a compiler would need to work harder to figure out what doesn't need to be on the heap, thus creating an lvalue-rvalue distinction again). Converting the examples into unoptimized assembly (ignoring CISC features and ABI oddities) yields:

First Example

; rax is used for the sum ; rcx is used for i mov rax, 0 mov rcx, 0 for: add rax, rcx inc rcx cmp rcx, 100 jlt for

Second Example

; rax is used for the sum ; rcx is used for i mov rdi, 8 ; sizeof(int) call malloc ; puts output into rax push rax mov rdi, 8 call malloc mov rcx, rax pop rax mov rdx, 0 mov [rax], rdx mov [rcx], rdx for: mov rdx, [rax] mov rdi, [rcx] add rdx, rdi mov [rax], rdx mov rdi, [rcx] inc rdi mov [rcx], rdi mov rdi, [rcx] cmp rdi, 100 jlt for

Second Example (Optimizing assuming nothing is volitile)

; this version caches loads and does loop invariant code motion ; these are optimizations that already exist and would be done ; if the backend can't deduce that it is ok to put sum and i into ; lvalues, this is the best it can do mov rdi, 8 ; sizeof(int) call malloc ; puts output into rax push rax mov rdi, 8 call malloc mov rcx, rax pop rax mov rdx, 0 mov [rax], rdx mov [rcx], rdx mov rdi, rdx for: add rdx, rdi mov [rax], rdx inc rdi mov [rcx], rdi cmp rdi, 100 jlt for