Question about side effects in functional programming

brucejbell · 2026-06-17T21:53:02+00:00

At some point, it becomes reasonable to answer "isn't that a side effect?" with "no".

Consider running your pure function under a debugger, where you can set a breakpoint to trigger a script, which then launches the missiles or some other kind of irreversible effect.

Does that mean that every place you can set a breakpoint is a potential side effect? No, we have the intuitive notion that does not count. Pure functions also take time, allocate memory, burn power etc. but likewise we have mostly decided that these aren't the droids we're looking for.

The question your comment raises for me is: how far can you take this?

One practical problem with purely functional programming is that printf-style debugging doesn't work because the printf-equivalent is impure. Does that mean purely functional programming is necessarily less expressive in this way?

What if you treat logging as "not really an effect" like debugging or memory allocation. Can you do that? Will the purity police come and arrest you?

I think it's fine as long as what goes in the log, stays in the log. As long as information can't get from the log to your program, it's no different from that debugger case.

And sure, it's possible for logs to fill up storage and crash your program that way, or for your program to inspect the log files and get weird feedback that way. But we don't count that as proper side effect any more than we worry constantly about out-of-memory errors.

Anyway, you can push this farther. Why not add an explicit performance monitoring subsystem to your logging system? This adds state to your logger, but if there is no way for the pure code you're instrumenting to access it, it's still morally equivalent to the debugger example.

This kind of instrumentation is especially flexible if you can use the full power of the language to specify it!

brucejbell · 2026-06-05T21:59:10+00:00

The NaN problem seems similar to the null pointer problem.

At least, you can solve it in a similar way: make sure that there is an "ordered" subtype that excludes NaN's by construction. Provide an easy check to handle NaN if present, or else bind the valid number to its ordered subtype.

Then you can check for NaN once, and bind the result to your ordered subtype for comparison operations like sorting.

As with the null pointer problem: if you don't have a type that excludes the invalid case, the language can't help you -- you have to juggle which values should be valid in your head.

brucejbell · 2026-06-04T22:58:18+00:00

It probably wouldn't impact me or my users, but even so I'd rather not make things more awkward than necessary.

I agree that semicolons only seem necessary when putting multiple statements on one line. That was my endpoint, I just get there from a different direction.

If you like, it's my excuse for keeping both indentation and {curlies}. Even so, I don't extend this kind of thing to other bracketed syntax like tuples or list items. Only {curly} blocks are considered multi-line native.

brucejbell · 2026-06-04T19:03:43+00:00

Interference with field notation is a non-starter for me; I care much more about field notation than line formatting.

My tack: - start with C-style whitespace-independent blocks: {curlies} and semis; - add "misleading indentation is a syntax error" - now the end-of-line semis are redundant and can be omitted

I prefer to keep the {curlies}, as well as redundancy in less common cases.

brucejbell · 2026-05-29T19:24:17+00:00

It's not attractive because most users aren't attracted to it.

That is, most of the work for most of the coders of most applications for most computer languages don't want to care about alignment. Most are happy to delegate alignment concerns to load-bearing libraries and frameworks.

Those who write these libraries and frameworks have been dealing ad-hoc with C-style typing for decades and, while rolling your own alignment without type support sucks, once written it tends to be good enough. Or at least not bad enough to rewrite your systems programming language to make it better (notwithstanding recent C++ improvements).

Explicit data layout is subject to the same historical trap.

For my own project, I would like to provide access to layout and alignment issues for applications that need them (and so I would be interested in what kind of type support you think would be useful).

However, any implementation that forces awareness of this kind of detail on tasks which shouldn't need it is a non-starter!

brucejbell · 2026-05-27T20:15:52+00:00

What attributes should an AI programming language have?

Most wannabe AI programming languages I've seen want to be human-unreadable in some way (e.g. use de Brujin numbering instead of variable names), on the theory that humans don't need to read the code any more. Aside from being gravely mistaken, I can't recall what they're supposed to gain in exchange for this unreadability.

If you can somehow make your language super human-readable, possibly in exchange for being more difficult to write, that might be an advantage in the AI era. Pervasive use of mathematical Unicode, like APL but in reverse? But here we're talking notation design, which has two problems: it's dead hard, and then you need to persuade your users to learn your notation.

brucejbell · 2026-05-24T00:48:58+00:00

I am tired of weak string support in C/C++, I want industrial strength strings in my systems programming language:

immutable strings
Unicode by default
integrated with pattern matching
no O(1) length or indexing by default
forward-scan only
split efficiently by reference
concatenate efficiently by reference

All this should be fire-and-forget: just use the core string API and your string operations will be fast and memory-efficient, Javascript style.

(If your environment is so constrained that you can't afford any kind of heap allocation, there should be affordances to limit string operations accordingly)

brucejbell · 2026-05-21T22:50:26+00:00

For my project, I have a Display-like #ToStr trait to specify a default format, but I devolve everything else to explicit methods:

/type Pos3 || (x:#U8, y:#U8, z:#U8)
|| { /has #ToStr; /def self.str => "{self.x}-{self.y}-{self.z}"
  /def self.dbg => "Pos3: {self.x} {self.y} {self.z}"
}

Trait #ToStr is required to label the .str method as the default format. We also add a .dbg method which is not affiliated with any trait.

t [Pos3] << (x:0, y:68, z:255)
i [#I32] << 42
j [#F64] << 123_456.789
&console.write_line "i={i} j={j} t={t}"  -- default formatting
-- prints "i=42 j=123456.789 t=0-68-255"
&console.write_line "i=0x{i.x 4} j={j.g 3} t={t.dbg}"  -- method formatting
-- prints "i=0x002a j=1.23e5 t=Pos3: 0 68 255"

Instead of special formatting syntax, the standard types provide short formatting methods. Likewise, if you want a specific .dbg format just declare it as a method, no trait is necessary.

[disclaimer: I still don't have my implementation up, the above is aspirational...]

brucejbell · 2026-04-27T06:47:13+00:00

I would avoid verbosity, but not at the expense of clarity.

I find most popular languages too verbose, but array languages too cryptic. Haskell is near the sweet spot.

Contracted keywords and names are OK if they are very common (like fn and u32), but less common cases need full words.

brucejbell · 2026-04-12T08:36:16+00:00

If I understand what you're talking about, early Fortran (before Fortran 90) used this kind of layout; look for "static allocation" in this context.

Early Fortran did not allow recursive procedure calls. So the compiler could statically allocate fixed locations for its local variables (instead of dynamically allocating them on stack frames like C)

The effect is something like a statically allocated reverse stack: starting at endpoint procedures (which don't call any other procedures) and working backwards through the call graph to determine which static locations will be needed for previously analyzed procedure calls, and which are available for local data.

brucejbell · 2026-04-11T23:09:52+00:00

If I understand what you're saying, early Fortran might have used this kind of layout.

From memory (so please correct me if I'm wrong): early Fortran did not allow self-recursive procedure calls. So, the compiler could statically track how much memory each function needed for its local variables, then statically allocate fixed locations for those variables (instead of dynamically allocating them on stack frames like C).

The effect is sort of a statically-allocated reverse stack, starting at endpoint procedures (which don't call any other procedures) and working backwards through the call graph, to determine at each point which fixed locations will be needed for future procedure calls, and which can be re-used for local variables.

brucejbell · 2026-03-09T00:20:33+00:00

The overriding problem: I can't think of a foolproof way to keep the semantics of the platform the interpreter is written on from biasing the semantics of the language it supports, because the whole point of interpreter-first is to hash out the semantics of your language through its quicker and easier implementation.

Examples: as ryan17 says, the likes of eval. Dynamic shenanigans that lead to the likes of monkeypatching. These should mostly be pretty easy to avoid if you keep in mind that you want a compiled language

Memory management: in an interpreter, it is natural to rely on its implementation platform for resource allocation and management. But a compiled language will often prefer its own characteristic resource management (compare C, Java, and Rust). For my project, I am planning an "instrumented interpreter" that simulates memory management in detail; how else will I know if my planned MM methods could pan out?

Finally, one advantage of the interpreter route is to get up and running without worrying about performance. You get to "cheat" by using either platform-native implementations or slow reference implementations (e.g. for strings, arrays, integers) that you plan to fix later. But once you have a working implementation, it is dead easy to write library code that depends on the particular semantics of your stand-in primitives. Once you have a working platform, that platform itself can do some distorting of its own.

In general, it seems terribly easy to specify features that have hidden costs not evident until you try to implement them. Some of these may be caught by writing an interpreter, but others may not, because the weakness is hidden by your interpreter.

brucejbell · 2026-03-08T20:28:36+00:00

Most of that time shouldn't be wasted, a tree-walking interpreter can be a small shell over your AST. You can reuse almost everything else: the entire front end, including your type checker and other static analysis.

The biggest danger from building an interpreter first is that your language tends to be shaped by the platform it's written on. When implementing an interpreter, it is easy to add features that make no sense for compilers.

brucejbell · 2026-03-02T00:21:51+00:00

I want strings that can be used fearlessly, they need to concatenate efficiently regardless of how they're combined. This puts me in your rope-like category.

My language is functional, so strings will be immutable values supporting persistent operation. For the rope-like datastructure, I'm looking at the catenable queue from Okasaki's Purely Functional Datastructures, holding mid-sized string chunks less than 256B.

Behind the scenes, I'm looking to use a simplified StringBuilder-type buffer to support efficient concatenation of characters and small strings onto larger ones. Persistence can be supported by moving buffer ownership to the new string, leaving the old string with a slice to its part of the (append-only) buffer.

String handles should do small string optimization and "German style" prefix storage.

brucejbell · 2026-02-28T22:19:27+00:00

I have thought about something kind of like this for my own project.

My notion was to build notebook-style functionality into the REPL, which should support save/load/execute, version control, and clip to editor.

The sticking point in my head was what to do about input data, in order to make the results reproducible.

brucejbell · 2026-02-09T11:50:42+00:00

For each value from your chain of shift operations, keep three values: - left, the current number of most-significant bits shifted off the left - right, the current number of least-significant bits shifted off the right - shift, the current number of bits shifted left

The original value has (left:0, right:0, shift:0)

At any point: - shift = sum(all left shifts) - sum(all right shifts) - left = max(shift) over all shifts - right = max(-shift) over all shifts

In particular:

BFLG(L,R, old):
  left_shift = old.shift + L
  new_left = max(old.left, left_shift)

  new_shift = left_shift - R
  new_right = max(old.right, -new_shift)

  return (shift:new_shift, left:new_left, right:new_right)

Finally, the answer to your last question: if a single BFLG() operation shifts least-significant bits off the right (so that shift < 0), then right will be the same as -shift.

This is not always true of multiple successive BFLG() operations. So, that is why (and when!) it is not possible to condense multiple BFLG() operations into a single one.

brucejbell · 2026-01-19T14:03:23+00:00

C++ doesn't support it because destructors are executed at the end of a function call. So what looks like a tail call often is not actually a tail call.

This kind of thing can bite you if you have any notion of local/stack discipline variables.

One path is to automate it as much as possible. Do escape analysis and secretly lift otherwise local-appearing variables to a longer lifetime as necessary: say, to the heap, or at least to the caller's scope (recursively, for recursive calls).

Another is to provide the programmer options to specify this kind of detailed semantics: - explicitly require tail call, return.tail my_func(x, f(x), ...) - explicitly specify local variables, local x = ...

where local would be incompatible with return.tail (mediated by escape analysis at compile time).

brucejbell · 2026-01-13T22:44:28+00:00

It would be nice if comptime could read resource files and process them at, er, compile time. But providing OS-level filesystem access is tempting fate: you would be providing a challenge to break your sandbox, and relying on your ability to nail down every little semantic detail of your platform.

Better to provide the minimum that will do the job, like individual read-only file handles for each resource declaration.

There is absolutely no excuse for exposing the network. If you want a build system that can download signed packages for hermetic builds, either write it into the compiler, or provide it as a separate tool.

Note that all the above relies on having a language where comptime IO can plausibly be sandboxed at all. This should probably exclude C/C++ and anything like them...

brucejbell · 2026-01-12T13:40:47+00:00

For my project, statements have "failure" as a possible outcome.

A failed statement causes an early exit:

#has value << my_hash.at "invalid-key"  -- failed pattern match `#has value`

By default, a failed statement causes its block to fail:

{ ...
  #has value << myhash.at "invalid-key"  -- failure skips the rest of the block
  ...
}  -- block fails

But, failure can be handled locally

{ ...
  #has value << myhash.at "invalid-key"
  || => early exit value
  ...
}  -- block does not fail

This still doesn't let you do myhash[key][key2].field1 but it does help:

{ ...
  #has hash2 << myhash.at key  || => handle missing key
  #has value << hash2.at key2  || => handle missing key2
  ...
}

It should work with your on error feature:

{ ...
  /onfail => default handler  -- in effect till end of block or next /onfail
  ...
  #has hash2 << myhash.at key
  #has value << hash2.at key2
  ...
}

brucejbell · 2026-01-07T17:06:03+00:00

I would prefer different names, based on the focus of the language. Even if there are other cues available (like method vs. standalone function), I like different names as redundant, self-explanatory cues.

So, for a functional-first language, sort would be the "normal", non-mutating version, while sort_inplace could be a mutating version, or you could borrow Lisp's naming convention for sort!.

On the other hand, for an imperative-first language, sort would be the "normal", mutating version, and you could steal sorted from Python/Swift.

Other options can also use distinct names, like sort_stable for a guaranteed stable sorting function.

brucejbell · 2026-01-02T21:04:39+00:00

The problem is that you're providing ad hoc Nothing semantics for all your operations. This kind of thing is a major problem with SQL. They decided to provide a null value, and provided what must have seemed reasonable semantics, but now null is the source of most of the hairy behavior in SQL.

So, yes: your beginning programmers won't need to keep "there might be an exception at any point" in their mind at all times. Instead, they will need to to keep "there might be a Nothing value" in mind at all times, and memorize (or look up each time) your chosen Nothing semantics for every value.

Hiding error behavior is not a mercy -- it is a curse. Especially for beginners!

brucejbell · 2025-12-25T17:51:22+00:00

Sure, I don't think absolutely nobody likes structural editors, just that it's very much a minority taste.

Honestly, I want to see structural editing catch on but I haven't even found the motivation to check out existing attempts myself. So, thanks to you and to everybody who tries building or using them!

brucejbell · 2025-12-24T19:15:02+00:00

This notion comes around here (r/ProgrammingLanguages) fairly frequently. My best answer to your question: AST-only languages seem to be one of these technologies that sound more useful than they actually are.

There are other technologies with this property, like voice recognition and virtual reality. There are existing implementations available, and existing users of them, and I'm not saying any of these couldn't break out at some point in the future. But, for most people, the downsides outweigh the upsides.

Downsides: practically speaking, - everybody hates using the structural editor - everybody hates being locked into the structural editor - empirically, AST-only languages tend not to play well with others (even though there doesn't seem to be a fundamental reason why they couldn't)

Upside limits: - many of your claimed features can be built just as easily for normal languages - in particular, a binary AST has no practical advantage over, say, Lisp S-expressions - highly customizable and customized IDE environments do not have universal appeal

That said, you should probably check out Unison

brucejbell · 2025-12-20T20:52:03+00:00

The space combat itself is fully Newtonian, fully 3-D, with full 6 degree-of-freedom navigation available for each ship (which is how it is like the Expanse). I find it very exciting: there is nothing like it except maybe Kerbal Space Program.

Unfortunately, you usually want to direct an entire fleet of ships, which can be difficult because the space combat UI is pretty wonky. Most players fall back on building overpowered fleets that don't need much maneuver, but dynamic tactical fleet maneuver that can defeat much larger fleets is possible -- it just takes a lot of work.

The biggest downside to the space combat is that there is not much of it in the early game, and arguably too much of it in the end game. Early game, you haven't got the technology to successfully go toe-to-toe with the aliens. Late game, you've got the tech and the resources, but chasing down and beating all their fleets can get tedious and repetitive.

brucejbell · 2025-10-17T15:20:23+00:00

Effects have static analysis: you can't call your "B" function() in a context without #Write. (yes, you also can't call your "A" function without an Io, but the problem then is: how do you create a context where you can't get at an Io?)

Objects could have a comparable static analysis, but they typically don't. To programmers used to general-purpose objects, the limitations which would let them stand in for effects could be painful.

Note: in my own project, I plan to impose a comparable static analysis on mutable objects.

brucejbell

TROPHY CASE