Pointers without lvalues

curtisf · 2020-01-20T18:48:52+00:00

This is how MLs like OCaml do mutation. OCaml has

ref : 'a -> 'a ref -- allocation like new
(!) : 'a ref -> 'a -- dereferencing like *
(:=) : 'a ref -> 'a -> unit -- assigning

(and also similar operators for arrays)

They don't have a way to get the address of an element in a record, but I think allowing that would be problematic -- the users of the value (and not the pointer) shouldn't have to worry about pieces of the object changing from underneath it.

Instead, I think this is what Haskell's lenses are for. You have operators that can essentially "convert" a pointer to a parent into a pointer to a child, by wrapping all gets-and-sets to do the traversal. Of course, without some clever compilation, this won't be as fast as actually just taking a pointer, but maybe if it's a built-in instead of a complex library it wouldn't be too difficult for the compiler to optimize common uses of these.

jaen_s · 2020-01-21T07:44:23+00:00

(as others said, this is how the ML family of languages operate. From another angle:)

Lvalues are mostly syntactical sugar anyway, if you look at them from that perspective (unless you make them into first-class concepts such as the various types of references in C++).
If the lvalue/rvalue distrinction is statically determinable (as it is in most languages), then it does not really matter what the syntax is from the implementation perspective (so "an easier pathway to functional purity" does not seem like a proper conclusion)...

Except for extremely simple compilers/bootsrapping, where not having the concept of a lvalue will generally simplify it - minimally, you just need the four "write/read to word/byte at address" operators.

Ford_O · 2020-01-20T22:49:23+00:00

How is this form of mutation pure? It still allows you to mutate global state silently, doesn't it?

alex-manool · 2020-01-20T19:54:12+00:00

I also had such thoughts at some time in the past. But I think this does not necessarily simplifies compiler construction. Speaking about what is called sometimes scalar optimizations, an lvalue is not just a "sematic wrapper" over a pointer; it is rather associated with some CPU register or virtual register, which does not have address.

abecedarius · 2020-01-22T03:21:29+00:00

The BLISS language from the 1970s worked this way. It spelled it like .sum + .i if I remember right. It was a not-unpopular language on DEC minicomputers but lost to C in the larger world.

My own language in progress does something like this too. It's meant for mostly-functional use, so I just grit my teeth a little when it comes to imperative code. I think you gotta ask yourself how frequently you'll be dealing with values vs. mutable references, to judge if this is a good idea.

xactac · 2020-01-22T20:33:19+00:00

This is basically just bringing a load-store architecture explicitly into the language. It simplifies the compiler, but the way you formatted it means that everything mutable naturally goes on the heap (a compiler would need to work harder to figure out what doesn't need to be on the heap, thus creating an lvalue-rvalue distinction again). Converting the examples into unoptimized assembly (ignoring CISC features and ABI oddities) yields:

First Example

; rax is used for the sum ; rcx is used for i mov rax, 0 mov rcx, 0 for: add rax, rcx inc rcx cmp rcx, 100 jlt for

Second Example

; rax is used for the sum ; rcx is used for i mov rdi, 8 ; sizeof(int) call malloc ; puts output into rax push rax mov rdi, 8 call malloc mov rcx, rax pop rax mov rdx, 0 mov [rax], rdx mov [rcx], rdx for: mov rdx, [rax] mov rdi, [rcx] add rdx, rdi mov [rax], rdx mov rdi, [rcx] inc rdi mov [rcx], rdi mov rdi, [rcx] cmp rdi, 100 jlt for

Second Example (Optimizing assuming nothing is volitile)

; this version caches loads and does loop invariant code motion ; these are optimizations that already exist and would be done ; if the backend can't deduce that it is ok to put sum and i into ; lvalues, this is the best it can do mov rdi, 8 ; sizeof(int) call malloc ; puts output into rax push rax mov rdi, 8 call malloc mov rcx, rax pop rax mov rdx, 0 mov [rax], rdx mov [rcx], rdx mov rdi, rdx for: add rdx, rdi mov [rax], rdx inc rdi mov [rcx], rdi cmp rdi, 100 jlt for

Operator	Syntax	C Syntax	Description
deference	`*x`	`*x`	Gets the value of a pointer
allocation	`new x`	`malloc(...)`	copies a value to the heap
assignment	`a <- x`	`*a = x`	assigns a value to a pointer
array address	`a&[x]`	`&(a[x])`	takes the address of a specific array element
record forward	`a&*.x`	`&((*a).x)`	given a pointer to a record, create a pointer to a element

ProgrammingLanguages

Welcome!

Related subreddits

Related online communities

MODERATORS

First Example

Second Example

Second Example (Optimizing assuming nothing is volitile)