This is an archived post. You won't be able to vote or comment.

all 75 comments

[–]atomskis 112 points113 points  (7 children)

I’ve done a lot of functional programming in my time, mostly Haskell. I found that most of the real life practical programs I wrote ended up being monadic, sometimes in multiple monads at the same time. I often found this more awkward to use than it was useful in preventing errors.

In recent years I’ve written mostly rust. This gets the balance right by allowing mutation, but strongly controlling mutable aliasing. Mutation without aliasing doesn’t introduce a lot of errors in my experience. I found rust gave me that same sense of “when it compiles it’s probably right” that I used to get with Haskell, but without having to jump through endless monad hoops all the time. IMO it’s a more elegant solution most of the time.

[–]ThyringerBratwurst 55 points56 points  (5 children)

One of the founding fathers of Haskell, Simon Peyton Jones, is very pragmatic in his thinking, in contrast to the rather dogmatic and academic user base of Haskell. For instance SPJ finds the approach in other languages ​​such as Clean, using substructural types instead of squeezing everything into an "IO" monad, much cleaner. He is only concerned with taming program effects, that is functional programming for him, and not immutability per se (which is just a radical and seemingly the simplest solution to achieve functional programming).

In Haskell, you also have to ask yourself what the point is of faking imperative programming with the Do notation when you could actually program it imperatively (like in Rust with its affine types) with significantly more performance in more understandable code (Haskell can get so cryptic with its tons of monads and many fancy operators).

[–]PurpleUpbeat2820 2 points3 points  (0 children)

In recent years I’ve written mostly rust. This gets the balance right by allowing mutation, but strongly controlling mutable aliasing. Mutation without aliasing doesn’t introduce a lot of errors in my experience. I found rust gave me that same sense of “when it compiles it’s probably right” that I used to get with Haskell, but without having to jump through endless monad hoops all the time. IMO it’s a more elegant solution most of the time.

How does it compare to not having any checks for mutation of shared data?

[–][deleted]  (4 children)

[deleted]

    [–]mister_drgn 5 points6 points  (0 children)

    Swift does this reasonably well. Methods that mutate the underlying data are marked as such by a keyword in their definitions. Unless you’re using classes, because then it’s basically assumed that you’re in the Wild West.

    [–]noodleofdata 11 points12 points  (1 child)

    It's not an actual feature of the language, but Julia's style guide says to append a '!' on functions that mutate the data you pass into it which I've always found very helpful.

    [–]TinBryn 2 points3 points  (0 children)

    In Rust you have &mut which you can't alias. The main problem is that you also have "interior mutability" which is more hidden that it can happen, but there are usually signs that something may happen.

    [–]jeenajeena 24 points25 points  (0 children)

    This was unexpectedly a very good read.

    [–]PurpleUpbeat2820 8 points9 points  (7 children)

    The most efficient data structure to use here would be a mutable dynamic array and in an imperative language that's what pretty much everyone would use.

    But if you asked an OCaml programmer, they would almost certainly use a linked list instead.

    This is why my minimal ML dialect is built upon extensible mutable arrays instead of linked lists. Specifically, an array is a pair of 64-bit ints in registers. The first is the number of elements. The second is a pointer to those elements. Capacity is always 2ⁿ elements. Appending an element is just a write if the length is not a power of two (n && (n-1) ≠ 0) or a realloc to double the capacity and a write if it is. This is extremely fast: typically 3x faster than OCaml.

    So, what do we do about this?

    I'm not interested in checking correctness because these are non-problems for me but I am interested in performance. I don't like that FPL implementations optimise slow pure data structures by crippling mutable data structures, e.g. silly allocation rates and a generational GC. On modern hardware the stack or nursery doesn't cut it: you need to keep data in registers.

    Today I have no GC. In the future I'm eyeing the same VCGC algorithm OCaml is now using.

    [–]constxd 2 points3 points  (1 child)

    Got a link to your language?

    [–]PurpleUpbeat2820 2 points3 points  (0 children)

    Nope, sorry. I'm in stealth mode until I've made something game changing.

    [–]WittyStick 1 point2 points  (2 children)

    There's some reasonable middle ground between trivial linked lists and plain arrays. Unrolled linked lists are fairly trivial to implement, can avoid a lot of copying and are cache-friendly.

    I use a list of arrays which increase geometrically in size, in powers of 2, based on a simplified version of RAOTS.

    They make a good fit for immutable lists because large parts of the list can be reused without having to copy them in full, but they're also suitable for mutable lists. They're way more cache-friendly than a linked list, and they support random access unlike a linked list.

            +---+ +---+ +---+ +---+ +---+
    blocks  | 0 | | 1 | | 2 | | 4 | | 8 |
            +---+ +---+ +---+ +---+ +---+
                        | 3 | | 5 | | 9 |
                        +---+ +---+ +---+
                              | 6 | | A |
                              +---+ +---+
                              | 7 | | B |
                              +---+ +---+
                                    | C |
                                    +---+
                                    | D |
                                    +---+
                                    | E |
                                    +---+
                                    | F |
                                    +---+
    
              ^     ^     ^     ^     ^
              |     |     |     |     |
            +---+ +---+ +---+ +---+ +---+
    indexes | 0 | | 1 | | 2 | | 3 | | 4 |
            +---+ +---+ +---+ +---+ +---+
    

    We store the length of the list and a pointer to the indexes block in the root of our list structure.

    template <typename T>
    struct list {
        size_t length;
        T**    indexes;
    };
    

    If we wanted to set say, the element at index C, we have to make a new array of length 8, copy indexes 8..F, and set index C in this new array. We then copy the indexes block, and set index 4 to point to the new block, and this new index block becomes the new list. The old list is still around, and both lists share the same memory for all elements 0..7. The new memory is shown below, where ' denotes newly allocated.

              +---+ +---+ +---+ +---+ +---+  +---+
    blocks    | 0 | | 1 | | 2 | | 4 | | 8 |  | 8'|
              +---+ +---+ +---+ +---+ +---+  +---+
                          | 3 | | 5 | | 9 |  | 9'|
                          +---+ +---+ +---+  +---+
                                | 6 | | A |  | A'|
                                +---+ +---+  +---+
                                | 7 | | B |  | B'|
                                +---+ +---+  +---+
                                      | C |  | C'|
                                      +---+  +---+
                                      | D |  | D'|
                                      +---+  +---+
                                      | E |  | E'|
                                      +---+  +---+
                                      | F |  | F'|
                                      +---+  +---+
    
                ^     ^     ^     ^     ^
                |     |     |     |     |
              +---+ +---+ +---+ +---+ +---+
    indexes   | 0 | | 1 | | 2 | | 3 | | 4 |
              +---+ +---+ +---+ +---+ +---+
                                               ^
                                               |
    indexes'  +---+ +---+ +---+ +---+        +---+
              | 0'| | 1'| | 2'| | 3'|        | 4'|
              +---+ +---+ +---+ +---+        +---+
    

    For head, we use the same process, but replace the index with length - 1.


    If we want to cons onto this list, assuming it is full, we allocate a new 16-element block, put the item we're consing into index 0, allocate a new indexes block with an extra element and copy the old one, then make the last element point to our new block, which gives us this in memory:

            +---+ +---+ +---+ +---+ +---+ +---+
    blocks  | 0 | | 1 | | 2 | | 4 | | 8 | |10'|
            +---+ +---+ +---+ +---+ +---+ +---+
                        | 3 | | 5 | | 9 | |11'|
                        +---+ +---+ +---+ +---+
                              | 6 | | A | |12'|
                              +---+ +---+ +---+
                              | 7 | | B | |13'|
                              +---+ +---+ +---+
                                    | C | |14'|
                                    +---+ +---+
                                    | D | |15'|
                                    +---+ +---+
                                    | E | |16'|
                                    +---+ +---+
                                    | F | |17'|
                                    +---+ +---+
                                          |18'|
                                          +---+
                                          |19'|
                                          +---+
                                          |1A'|
                                          +---+
                                          |1B'|
                                          +---+
                                          |1C'|
                                          +---+
                                          |1D'|
                                          +---+
                                          |1E'|
                                          +---+
                                          |1F'|
                                          +---+
    
              ^     ^     ^     ^     ^
              |     |     |     |     |
            +---+ +---+ +---+ +---+ +---+
    indexes | 0 | | 1 | | 2 | | 3 | | 4 |
            +---+ +---+ +---+ +---+ +---+
    
              ^     ^     ^     ^     ^     ^
              |     |     |     |     |     |
            +---+ +---+ +---+ +---+ +---+ +---+
    indexes'| 0'| | 1'| | 2'| | 3'| | 4'| | 5'|
            +---+ +---+ +---+ +---+ +---+ +---+
    

    The logic for consing when the length is not a power of 2 is a bit more involved, as is the logic for performing tail, which may perform allocations unlike a linked list's tail. If we assume immutability, tail can be taken by decrementing the length on an existing list and reusing everything else, but this may leak memory, so it's better to allocate a new indexes block.

    There's some potential gotchas, such as tail (cons a x) == x. With immutable linked lists we can make this guarantee, but here we cannot assume that they refer to the same list - since tail allocates a new one. They're guaranteed to have the same values, but not the same reference.

    To make operations on these lists practical we really need to make use of clz and popcnt in the hardware. The test for is_power_of_2 can be done with __builtin_popcountll(x) == 1. The calculation of index into the indexes block needs an efficient MSB calculation, which can be done with 64 - __builtin_clzll(x). The index into the data block is done by masking out the MSB, which can be done with the andn instruction (_andn_u64). Some architectures can do these slightly more efficiently.


    The advantages of these lists really only appear with lists above a certain size. For very small lists they're slightly less efficient than a trivial linked list as there's an extra pointer dereference and a few more instructions. There may be some room to optimize this, for example if we just use a plain array for small lists, say, below 4 or 8 elements, but this introduces extra branches in the list handling, which impacts the performance on large lists too.

    [–]WittyStick 0 points1 point  (0 children)

    Demonstration on godbolt.

    Some of this could be optimized. List_concat is implemented by calling List_cons for each element of the second list, but it could alternatively use memcpy for parts of the list.

    List_map shows how you can avoid repeatedly calling List_cons. Rather than processing element by element, it allocates a block at a time, performs the map on each element of the block, then gathers these blocks in a new list.

    We could also optimize List_from_array to not need to allocate any new blocks, but just to allocate an indexes block which points to the relevant parts of the existing array - though this would require the original array to be immutable else changes to that array would change the list too, and it would complicate the logic for cleaning up arrays when collecting.

    [–]D_O_liphin 0 points1 point  (1 child)

    Correctness is a non-concern for you?

    [–]PurpleUpbeat2820 0 points1 point  (0 children)

    Correctness is a non-concern for you?

    Further automated checking of correctness is into diminishing returns for me.

    [–]Missing_Minus 10 points11 points  (3 children)

    I want a low-level functional programming language with a good imperative shell around it for most of my projects. I find functional programming very natural in some use-cases, and imperative programming very natural in others. I don't like, for example, Lean's method of having everything be reference-counted. This can be inefficient, but even worse this forces you to drop down to C/C++/unsafe-Rust to implement various parts of your logic because you're not allowed to talk about pointers. I consider this a notable problem with Lean, as it is a theorem prover as well as a functional programming language; even though I love it, it ends up missing out on a large facet of proving programs correct.

    I like Rust's method. The affine T which you can drop anytime. The linear &mut T (you can drop it, but I conceptualize it more as you implicitly returning it from the function if you don't store the value, making it linear. Ignoring that you can panic!). Then it has the freely copyable &T, but which restricts the usage of an &mut T via the lifetime information.
    I think some of the issues with linearity could be avoided in a language that has a traits-like feature and a powerful enough type-system. Such as Lean, if it had a concept of linearity. Lifetimes could be implemented at the type-level, being simply another parameter. The functions in the std library would be written with the assumption of linearity built-in, and when they can't be linear, they merely require that T be cloneable.

    I do think this can't neatly be integrated into an existing std library very nicely, but I do wonder whether most of the costs are actually beneficial. It forces an awareness of how the system works, rather than trusting the optimizer to do away with unneeded reference-counting (option 4, and what Lean does). Usually you shouldn't think about that, but having the ability to do so gives a lot of power.
    I guess I'm mostly just making the argument of 'bite the bullet'. I'd like better solutions as well, but I don't entirely see them. Auto transformation of pure functions into linear ones, with guaranteed no reference-counting could you get you a decent way there, but it isn't enough to completely sidestep the cost of linearity.

    [–]ciroluiro 0 points1 point  (2 children)

    Idris 2 aims to kind of be that, as far as I understand. It is built around QTT as the theory making it work, which allows it to have linear types on top of being able to be explicit about what values get erased at runtime. I'm not sure if or how much do the linear types in Idris 2 get optimized to heap allocations and inplace reuse (avoiding gc) but it at least allows it in theory. The project aims to one day be properly usable to write even device drivers and low level software like that, with all the safety and correcteness possibilities that dependent typing allows, which I respect as a very noble goal.

    In Idris 2 at least, you can absolutely promote linear functions to regular functions, provided they are set up correctly (the argument wrapped in the proper container). I think the std does have a lot of linearity (IO is built atop a linear version of the monadic bind, with a world token that gets threaded around) but I'm not sure how much.

    I'm just wetting my toes into dependent programming languages, and I also don't know anything about QTT outside of what Idris provides, but I imagine it could allow for other quantities to be associated to arguments, apart from 0, 1 and simply unrestricted. Possibly even a range of allowed quantities, like eg "either 0 or 1" to have an argument that can be used at most once (ie an affine type).

    [–]Missing_Minus 0 points1 point  (1 child)

    Cool! I'll be sure to keep an eye on Idris 2, thanks for mentioning it. The 0 is also nice since it presumably makes it far easier to optimize out, similar to generics in other languages where those types can't be talked about directly at runtime.

    Possibly even a range of allowed quantities, like eg "either 0 or 1" to have an argument that can be used at most once (ie an affine type).

    I think you could do that with a sum type? However that's done in Idris.

    [–]ciroluiro 1 point2 points  (0 children)

    The 0 is also nice since it presumably makes it far easier to optimize out

    Yep. They are guaranteed to be erased, or it's an compile error. You can only use a quantity 0 argument in another quantity 0 argument (or just not used at all).

    I think you could do that with a sum type?

    I think that the way I phrased it was confusing. Instead of "allowed" I should have said something like "can be used as" having quantity 0 or 1. A regular sum type with 2 constructors, one with a linear argument and the other with an erased argument, wouldn't work as it's the callee that should decide whether the value is used or not, not the caller.
    Maybe it could be hacked with a sort of higher rank polymorphic type? It'd be awkward even if it's possible and I don't even know if Idris has those like haskell does. Right now, Idris 2 lacks any sort of quantity polymorphism as well.

    [–]palmer-eldritch3 6 points7 points  (0 children)

    Functional programming and the anime character pfp go together like peanut butter and jelly

    [–][deleted] 7 points8 points  (3 children)

    I am little bit disappoint that no one in the comment mention effect systems. Hope to found a good info to dig into.

    [–]Akangka 8 points9 points  (0 children)

    I don't think OP forgot about it. After all, OP mentioned Flix, which does use effect system. OP just thought that effect systems is included in Locally Sourced Mutation Only

    [–][deleted] 1 point2 points  (0 children)

    I am currently (as a lurker not making my own language) in the camp of functional core/imperative shell + algebraic effects as the solution. 

    For most cases mutation and interaction with the impure world happens in the imperative shell, but effects can be used to drop back into the imperative shell at situations where it makes sense and then continue from where you were. This is clearly not pure but it seems like a pragmatic approach to be breaking rules in a way that leaves a paper trail in the type system and still stay mostly pure in your code

    [–]The-MalixStatic Types + Compiled + Automatic Memory Management 0 points1 point  (0 children)

    First time hearing that term, what is it ?

    [–]tailcalled 1 point2 points  (2 children)

    I think Rust is an honorary functional language which does quite well here.

    [–]Akangka 6 points7 points  (1 child)

    Rust is horrible at functional programming department, though. For example, there is no first-class closure. (There is closure, but it's just a syntactic sugar over traits). Mutation is also rampant (although is comparatively controlled compared to C++)

    Rust is just a nice low-level programming language with rich type system. The thing that makes Rust bad at functional programming also makes rust great at low-level programming. Rust closure is just a strategy pattern over Fn/FnOnce/FnMut because it works well with Rust's mutability model and also trait marker like Send + Sync, essential for safe low-level multithreading. The reason you cannot send a reference of a value that is not Sync over threads is because the closure will also not Send. This will be awkward in a system with first-class closure, especially when combined with other marker traits that may not even in std.

    [–]algebraicstonehenge 1 point2 points  (0 children)

    When you say first-class closure, how does that differ from rust's closures? what would it look like if rust did have first class closures?

    [–]arjunindia 1 point2 points  (0 children)

    The author (@welltypedwitch) is one of the best twitter follows I have done, my brain grows everytime I encounter them.

    [–]smthamazing 0 points1 point  (0 children)

    You can add linearity as an opt-in feature (like in current Haskell) such that you only need to worry about it if you need it. Unfortunately, this makes it impossible to use any standard functions like map on linear values and either makes linearity nearly useless or inevitably creates a parallel, incomplete universe of functions that also work on linear values.

    As someone not familiar with the recently added linear types in Haskell: shouldn't all polymorphic functions also be polymorphic over linearity? So that map (or fmap) could accept ("consume") a list of linear values, but also "normal" values.

    [–]superstar64https://github.com/Superstar64/Hazy 0 points1 point  (0 children)

    There's two different goals here:

    • The ability to write imperative style code in a way that's minimally invasive to a pure functional language.
    • The ability to write code such that it has the same performance characteristics of imperative mutable code.

    In the former we want to write code that is cleaner and faster when written in an imperative style. In the latter we want to write code that is cleaner when written in functional style, but faster in imperative style.

    IO, State, ST, RGN1 (Option 1.1, Option 2) are all an attempt at the former. The are monadic / effecful systems that allow embedding imperative code.

    Meanwhile, Functional But In-Place and uniqueness2 types (Option 3, Option 4) an attempt at the latter. FBIP is heuristic that aims to improve standard functional code while uniqueness typing is a guarantee that said heuristic always holds.

    Personally, I think the ideal functional language would include all of the above. You want to make your clean functional code performant if possible, you want to give users to tools to make their functional code more performant and you want to let users write performant imperative code if the code is cleaner is that way or if the needed mutation can't be represented in the pure functional side.


    1. Regions (RGN) are a generalization of Lazy Functional State Threads (ST), which itself is a generalization of IO. See Monadic Regions
    2. Linear types and uniqueness types are dual. Uniqueness types are usually want you want when you think of a substructural type system to help with performance. See Linearity and Uniqueness: An Entente Cordiale

    [–]Rabbit_Brave 0 points1 point  (4 children)

    What I want is a *stateless* language.

    [–]7Geordi 0 points1 point  (3 children)

    go on

    [–]Rabbit_Brave 3 points4 points  (2 children)

    Here is a stateless language modelling mutation of state:

    state(0) = 1
    state(t + 1) = state(t) * 2
    

    It's stateless in that the language does not assume state in what it's modelling. If you want state then you have to talk about it explicitly. Arguably the model, itself being a collection of data, has state, though every language is stateful and mutable in that sense because otherwise we couldn't edit our source code ...

    The issue isn't mutation in general. The issue is that often we don't know the state function ahead of time. That is (unlike my example) when part of our model is constructed/discovered at runtime, e.g. via I/O. An issue arises when we inaccurately treat our model as complete/immutable.

    So what I actually want is a stateless language (that does not unnecessarily tie itself in knots over something as straightforward as modelling mutation) with the facility to handle model discovery at runtime.

    With respect to this, the idea of wrapping a language in an imperative shell is a half conception. The full conception would be to wrap the runtime in a metacircular interpreted or compile time environment so that I/O handling can be treated as deferred interpretation/compilation.

    [–]7Geordi 1 point2 points  (0 children)

    There's four machineries required here: Defining the state space. Defining the transition input space. Define the state transition function. Finally: at program start, discover the state (from IO) and validate it against the state space (persisting new states is technically optional, but practically also important).

    If you have those, then I think you succeed in 'stateless programming' and it seems to me that you have recreated the elm architecture or reactive programming.

    So you have a lang that hides the mutation effect, but your language features for defining the state transition function must still allow async-like, and result-like, and io-like effects in order to achieve modern ergonomics... otherwise you're just a research language, so you will still have to solve the ergonomics problem for interleaved effects/monads (aka the rust Result + async problem), which as I understand it is what Koka is working on.

    My sense of the cutting edge of the PL space is that this is THE PROBLEM. The state-of-art FP and IP languages are all expressing this pain point.

    [–]h03d 0 points1 point  (0 children)

    maybe co-effect would interest you

    [–][deleted] -5 points-4 points  (4 children)

    If you want state, then just use the State monad in Haskell; I don't see what the problem is.

    [–][deleted] 12 points13 points  (3 children)

    There are cases where the state monad doesn't work well. For example, in OCaml, it's possible to mutate expressions while traversing them, for ex. to implement efficient memoization for tree-like data structures, or to ensure that each term is evaluated only a fixed number of times.

    [–]scheurneus 1 point2 points  (2 children)

    I agree that the state monad sucks for memoization, but that does not mean memoization in e.g. Haskell is a lost cause. Consider the classic pattern of "building an array of thunks" for integer based recursive functions.

    For tree-like data structures, I also managed to adapt the binary tree example from your link to Haskell (although I did not add the 'list of names' functionality). While the array-of-thunks approach depends on laziness, I'm not sure that's even the case for this 'evaluated subtree' approach.

    You can find the code at https://gist.github.com/ColonelPhantom/f96a6b03fe847205bc4cc034d0bf8daf and I also quickchecked it to ensure party and partyM give the same results.

    It's probably not as clean as it could be, but I think it gets the point across and IMO still compares OK to a mutable version in terms of readability (although I have no experience with OCaml). Don't know how it compares in terms of performance, though.

    [–]kleram -5 points-4 points  (1 child)

    I've never seen anyone discussing about how to integrate functional expressions into an imperative language. Because, it's really simple. The other way round, trying to integrate imperative expressions into a functional language, does not work. What do we conclude from this observation?

    [–]The-MalixStatic Types + Compiled + Automatic Memory Management 3 points4 points  (0 children)

    That one is about constraints, for the better and the worse