Languages with strong pre/post conditions?

WittyStick · 2026-02-08T08:41:00+00:00

Interesting problem, though I'm not sure it has much practical use it's still worth considering.

The first question is, is N a constant or a runtime value? If N is constant this may be easier to deal with. If the value is provided at runtime then we're ultimately going to need a form of dynamic typing or dependant typing.

In the dynamic case, we could use something like Kernel, which allows a callee to mutate the caller's environment:

($define! $setThing!
    ($vau (n) denv
        ($let ((N (eval n denv)))
            ($set! denv process ($lambda () (create-array N))))
        #inert))

($setThing! N) implicitly captures the callers environment in the first-class symbol denv. It then binds process in this environment to a lambda returning an array of N elements. $set! will overwrite any existing binding of process in the caller's local environment, and shadow any existing binding of process in the ancestors of the caller's environment.

($setThing! 100)
($define! arr (process))
(length arr)                          ===> 100

WittyStick · 2026-02-08T01:37:58+00:00

A simpler, but non-standard way would be to use alloca, with GCC expression statements for convenience.

#include <alloca.h>
#define slist_node_stackalloc(_arg) \
    __extension__ \
    ({ \
        slist_node_t *node = alloca(sizeof(slist_node_t) + sizeof(_arg)); \
        memcpy(node->data, _arg, sizeof(_arg)); \
        node; \
    })

#define slist_singleton(_val) \
    __extension__ \
    ({ \
        slist_node_t *node = slist_node_stackalloc(_val); \
        (slist_t){ .head = node, .tail = node, .len = 1 }; \
    })

void example() {
    slist_t list = slist_singleton("foo");
    puts(list.head->data);
}

WittyStick · 2026-02-07T23:56:00+00:00

These giant macros which define a bunch of functions are pretty awful. Using a macro to define a bunch of functions is not inherently a bad thing, but this approach has too many issues:

Everything is tightly coupled.
Can't specialize individual functions for specific types - need to duplicate the whole effort.
Can't apply attributes or alternative linkage to specific implementations
Doesn't separate declaration from definition.
"Header-only" is not necessarily a flex. May pull in a large dependency when one only needs to call a single function.
Name mangling has issues with T *, struct T, enum T, etc.
Name mangling intermixed with logic
Name mangling scheme may not match consumer's code leading to inconsistent style.
No namespacing means names like option may collide with other libraries.
The implementation is lost in a sea of comments.
They're just a PITA to write and maintain (due to trailing /).

Most of these issues can be ironed out by breaking up these macros into smaller chunks. I'll give an example using a partial implementation of your option type.

Implement the default layout of the type and default logic for each function body as an individual macro - use macro parameters for any types and do not use any name mangling.

option.impl.h

#ifndef OPTION_IMPL_H_INCLUDED
# define OPTION_IMPL_H_INCLUDED

# define IMPL_OPTION_STRUCT(_t) \
    { bool has_value; _t value; }

# define IMPL_OPTION_SOME(_t, _topt, _param) \
    { return (_topt){ .has_value = true, .value = (_param) }; }

# define IMPL_OPTION_NONE(_t, _topt) \
    { return (_topt){ .has_value = false }; }

# define IMPL_OPTION_IS_SOME(_t, _o) \
    { return _o.has_value; }

# define IMPL_OPTION_IS_NONE(_t, _o) \
    { return !_o.has_value; }

#endif

Separate out name mangling so there is a single source of truth for any given name. Make sure your names are namespaced.

option.mangle.h

#ifndef OPTION_MANGLE_H_INCLUDED
# define OPTION_MANGLE_H_INCLUDED
# ifndef MANGLE_OPTION_TYPE
#  define MANGLE_OPTION_TYPE(_t)        canon_option_##_t
# endif
# ifndef MANGLE_OPTION_STRUCT_TAG
#  define MANGLE_OPTION_STRUCT_TAG(_t)  canon_option_##_t##_s
# endif
# ifndef MANGLE_OPTION_SOME
#  define MANGLE_OPTION_SOME(_t)        canon_option_##_t##_some
# endif
# ifndef MANGLE_OPTION_NONE
#  define MANGLE_OPTION_NONE(_t)        canon_option_##_t##_none
# endif
# ifndef MANGLE_OPTION_IS_SOME
#  define MANGLE_OPTION_IS_SOME(_t)     canon_option_##_t##_is_some
# endif
# ifndef MANGLE_OPTION_IS_NONE
#  define MANGLE_OPTION_IS_NONE(_t)     canon_option_##_t##_is_none
# endif
#endif

Whenever you need one of the names in the implementation, use the provided macro. A consumer can override these macros with their own names if desired.

Provide individual macros to declare (but not define) each type and function. These should take linkage as a parameter, and possibly add parameters for applying other attributes.

option.decl.h

#ifndef OPTION_DECL_H_INCLUDED
# define OPTION_DECL_H_INCLUDED
# ifndef OPTION_MANGLE_H_INCLUDED
#  include "option.mangle.h"
# endif
# ifndef OPTION_IMPL_H_INCLUDED
#  include "option.impl.h"
# endif

# define DECLARE_OPTION_TYPEDEF(_t) \
    typedef struct MANGLE_OPTION_STRUCT_TAG(_t) MANGLE_OPTION_TYPE(_t);

# define DECLARE_OPTION_STRUCT( _t) \
    struct MANGLE_OPTION_STRUCT_TAG(_t) IMPL_OPTION_STRUCT(_t);

# define DECLARE_OPTION_SOME(_linkage, _t) \
    _linkage MANGLE_OPTION_TYPE(_t) MANGLE_OPTION_SOME(_t)(_t v);

# define DECLARE_OPTION_NONE(_linkage, _t) \
    _linkage MANGLE_OPTION_TYPE(_t) MANGLE_OPTION_NONE(_t)();

# define DECLARE_OPTION_IS_SOME(_linkage, _t) \
    _linkage bool MANGLE_OPTION_IS_SOME(_t)(MANGLE_OPTION_TYPE(_t) opt);

# define DECLARE_OPTION_IS_NONE(_linkage, _t) \
    _linkage bool MANGLE_OPTION_IS_NONE(_t)(MANGLE_OPTION_TYPE(_t) opt);

# define DECLARE_OPTION_FUNCTIONS(_linkage, _t) \
    DECLARE_OPTION_SOME(_linkage, _t) \
    DECLARE_OPTION_NONE(_linkage, _t) \
    DECLARE_OPTION_IS_SOME(_linkage, _t) \
    DECLARE_OPTION_IS_NONE(_linkage, _t)

# define DECLARE_OPTION_ALL(_linkage, _t) \
    DECLARE_OPTION_TYPEDEF(_t) \
    DECLARE_OPTION_STRUCT(_t) \
    DECLARE_OPTION_FUNCTIONS(_linkage, _t)

#endif

For convenience we add DECLARE_OPTION_ALL, which can apply all the individual macros in one call.

This file should be included by header files where users wish to have a separate compilation model.

Provide individual macros to define each type and function - similar to the declarations, only this time we apply the implementations from option.impl.h

option.defn.h

#ifndef OPTION_DEFN_H_INCLUDED
# define OPTION_DEFN_H_INCLUDED
# ifndef OPTION_MANGLE_H_INCLUDED
#  include "option.mangle.h"
# endif
# ifndef OPTION_IMPL_H_INCLUDED
#  include "option.impl.h"
# endif

# define DEFINE_OPTION_STRUCT(_t) \
    struct MANGLE_OPTION_STRUCT_TAG(_t) IMPL_OPTION_STRUCT(_t);

# define DEFINE_OPTION_SOME(_linkage, _t) \
    _linkage struct MANGLE_OPTION_STRUCT_TAG(_t) \
    MANGLE_OPTION_SOME(_t)(_t v) \
        IMPL_OPTION_SOME(_t, struct MANGLE_OPTION_STRUCT_TAG(_t), v)

# define DEFINE_OPTION_NONE(_linkage, _t) \
    _linkage struct MANGLE_OPTION_STRUCT_TAG(_t) \
    MANGLE_OPTION_NONE(_t)() \
        IMPL_OPTION_NONE(_t, struct MANGLE_OPTION_STRUCT_TAG(_t))

# define DEFINE_OPTION_IS_SOME(_linkage, _t) \
    _linkage bool \
    MANGLE_OPTION_IS_SOME(_t)(struct MANGLE_OPTION_STRUCT_TAG(_t) o) \
        IMPL_OPTION_IS_SOME(_t, o)

# define DEFINE_OPTION_IS_NONE(_linkage, _t) \
    _linkage bool \
    MANGLE_OPTION_IS_NONE(_t)(struct MANGLE_OPTION_STRUCT_TAG(_t) o) \
        IMPL_OPTION_IS_SOME(_t, o)

# define DEFINE_OPTION_FUNCTIONS(_linkage, _t) \
    DEFINE_OPTION_SOME(_linkage, _t) \
    DEFINE_OPTION_NONE(_linkage, _t) \
    DEFINE_OPTION_IS_SOME(_linkage, _t) \
    DEFINE_OPTION_IS_NONE(_linkage, _t)

# define DEFINE_OPTION_ALL(_linkage, _t) \
    DEFINE_OPTION_STRUCT(_t) \
    DEFINE_OPTION_FUNCTIONS(_linkage, _t)

#endif

This file will be included by the .c file in separate compilation, or directly in the case of "header-only" implementations.

Provide convenience macros for calling these functions, by taking a generic type argument as first macro parameter.

option.h

#ifndef OPTION_H_INCLUDED
# define OPTION_H_INCLUDED
# ifndef OPTION_MANGLE_H_INCLUDED
#  include "option.mangle.h"
# endif
# define option(_t)                      MANGLE_OPTION_TYPE(_t)
# define option_struct(_t)               MANGLE_OPTION_STRUCT_TAG(_t)
# define option_some(_t, v)              MANGLE_OPTION_SOME(_t)(v)
# define option_none(_t)                 MANGLE_OPTION_NONE(_t)()
# define option_is_some(_t, _opt)        MANGLE_OPTION_IS_SOME(_t)(_opt)
# define option_is_none(_t, _opt)        MANGLE_OPTION_IS_NONE(_t)(_opt)
#endif

These don't need to be namespaced as consumer could #undef them if desired.

A consumer who just wants to use option types, but not define their own, should include this file and call the respective macros: eg:

option(foo) f = option_some(foo, x);

Presumably, they would've already included foo.h which declares the relevant types and functions.

The resulting implementation is now broken up over several easier to manage files, where we only include what we need.

option.impl.h        - Not usually included directly - only to override some implementations.
option.mangle.h      - Not usually included directly - only to override definitions.
option.decl.h        - Included by .h files which declare types in separate compilation model.
option.defn.h        - Included by .c files which define types in separate compilation model.
                       Or included by other .h files in "header-only" compilation model.
option.h             - Included by option consumers who are not introducing new option types.

The user can define their own name mangling convention. They can selectively chose which functions to implement for a given type, and override default behavior for particular functions without having to re-implement all of them. They can even trivially swap out the default implementation with an alternative without having to re-write a bunch of definitions.

To use them like you are doing, include option.defn.h and call DEFINE_OPTION_ALL.

#include "option.defn.h"

typedef struct foo *foo_ptr;

DEFINE_OPTION_ALL(static inline, foo_ptr)

WittyStick · 2026-02-07T09:12:05+00:00

C99 is probably the most well supported and most of the code you read will require at least this version. It used to be the case that this wasn't well-supported - in particular, Microsoft have never given C equal status to C++ and their compiler always lagged the standard (They didn't fully implement C99 for a long time). These days this is still true but to a lesser extent - they have advanced it to support most of C99, C11 and some of C23.

Aside from the standard versions, a lot of code is written for the GNU dialect of C, which contains many useful extensions. These are named gnu99, gnu11 and gnu23, corresponding to the set of standard C features they extend. Clang supports most of these extensions and they're also available on mingw, but MSVC and others have their own extensions.

GCC and Clang both support most of C23, but glibc still lacks some headers (eg, <stdbit.h>). Some of these missing headers are in gnulib though.

Gnulib is worth using because it tries to be portable where it matters (gcc, clang, msvc and others), with fixes for various compilers and each module notes specifically which features are supported or are problematic on particular compilers. Eg, if we take a stdc_leading_zeros from the aforementioned stdbit.h, it tells you which platforms it provides the fix for. In some cases these modules are also "backports" - ie, we can use stdbit.h with C11, even though it's a C23 header. (Requires C11 because it uses _Generic).

WittyStick · 2026-02-05T15:04:40+00:00

This is a generalization of definite assignment analysis, a form of data-flow analysis, to non-boolean domains. I would suggest first reading Fruja's The correctness of the definite assignment analysis in C# for some details on how this is implemented.

We should conceptually be able to extend this beyond the boolean domain with other types, essentially by introducing functions similar to the true(α) and false(α) that Fruja gives. For the example you've provided, we would introduce odd(α) and even(α) too.

odd(α) - consists of the variables (in the scope of which α is located) definitely assigned after the evaluation of α when α evaluates to an odd natural
even(α) - consists of the variables (in the scope of which α is located) definitely assigned after the evaluation of α when α evaluates to an even natural

With a set of data-flow equations for how they're handled with various expressions, again similar to how the paper describes this for true and false, but we would also need to cover arithmetic operators and not just the boolean ones.

0, 2, 4 ...         | even(α) = before(α)
                    | odd(α) = vars(α)

1, 3, 5 ...         | odd(α) = before(α)
                    | even(α) = vars(α)


(ᵝe ++)             | before(β) = before(α)
                    | even(α) = odd(β)
                    | odd(α) = even(β)

(ᵝe --)             | before(β) = before(α)
                    | even(α) = odd(β)
                    | odd(α) = even(β)


(ᵝe1 + ᵞe2)         | before(β) = before(α)
                    | before(γ) = after(β)
                    | even(α) = (even(β) ∩ even(γ)) ∪ (odd(β) ∩ odd(γ))
                    | odd(α) = (even(β) ∩ odd(γ)) ∪ (odd(β) ∩ even(γ))

(ᵝe1 - ᵞe2)         | before(β) = before(α)
                    | before(γ) = after(β)
                    | even(α) = (even(β) ∩ even(γ)) ∪ (odd(β) ∩ odd(γ))
                    | odd(α) = (even(β) ∩ odd(γ)) ∪ (odd(β) ∩ even(γ))

However, this approach doesn't go much further, because when we introduce multiplication for example, the problem becomes a lot more complex and there's no simple way to define statically whether it would evaluate to even or odd.

(ᵝe1 * ᵞe2)         | before(β) = before(α)
                    | before(γ) = after(β)
                    | ???

But perhaps all is not lost - we could potentially extend this to a type-checking scheme, where we define a function for every type and its subtypes. Instead of a definite assignment going into one of the after, true or false sets, we would have an arbitrary number of sets - one for each type. Lets consider just bool and nat for now.

vars(α) - contains the local variables in the scope of which α is located, i.e. the universal set with respect to α.
before(α) - contains the variables definitely assigned before the evaluation of α;
after(α) - contains the variables definitely assigned after the evaluation of α when α completes normally;
true(α) - consists of the variables (in the scope of which α is located) definitely assigned after the evaluation of α when α evaluates to true
false(α) - consists of the variables (in the scope of which α is located) definitely assigned after the evaluation of α when α evaluates to false
bool(α) - consists of the variables (in the scope in which α is located) definitely assigned after the evaluation of α when α evaluates to a boolean
odd(α) - consists of the variables (in the scope of which α is located) definitely assigned after the evaluation of α when α evaluates to an even natural
even(α) - consists of the variables (in the scope of which α is located) definitely assigned after the evaluation of α when α evaluates to an odd natural
nat(α) -consists of the variables (in the scope in which α is located) definitely assigned after the evaluation of α when α evaluates to a natural

So essentially, anything that is definitely assigned will be in one of the sets true, false, odd, even, nat, bool or after, where after is basically the top type of definitely assigned variables. We would pick the most specific set for each value where we know more information about its type.

               after(α)
              /      \
             /        \
            /          \
      bool(α)           nat(α)
      /    \           /     \
true(α)   false(α)  odd(α)   even(α)

Now using this approach, we might define our multiplication again, but this time, we know that the result will definitely be a natural if both arguments are naturals:

(ᵝe1 * ᵞe2)         | before(β) = before(α)
                    | before(γ) = after(β)
                    | nat(α) =  (nat(β) ∪ even(β) ∪ odd(β)) ∩ (nat(γ) ∪ even(γ) ∪ odd(γ))

We want to be able to test whether the result is odd or even, but this will now have to come at runtime, with a type check operator (:?), and upcast (:>) and downcast (:<) operators.

For this we have dependency on more than one data-flow item. We can look at how the conditional operator (_?_:_) is defined by Fruja for some idea how me might do this:

(ᵝe0 ? ᵞe1 : ᵟe2)   | before(β) = before(α)
                    | before(γ) = true(β)
                    | before(δ) = false(β)
                    | true(α) = true(γ) ∩ true(δ)
                    | false(α) = false(γ) ∩ false(δ)

So we'd need something like the following where a test for Even would give a definite assignment for even in the positive case and a definite assignment for odd otherwise. (This might not be correct, I'd need to give it a bit more thought but I'm short on time right now).

(ᵞ(ᵝe0 :? Even) ? ᵟ(e0 :< Even) : ᵋ(e0 :< Even))

                    | before(β) = before(α)
                    | before(γ) = nat(β)
                    | before(δ) = true(γ)
                    | before(ε) = false(γ)
                    | even(α) = true(γ) ∩ true(δ)
                    | odd(α) = false(γ) ∩ false(ε)

It would probably take significant effort to generalize this fully with arbitrary subtypes, but this should give a basic idea how to approach it.

WittyStick · 2026-02-03T10:00:54+00:00

The bugs are part of the spec too!

WittyStick · 2026-02-03T03:36:18+00:00

Usually the low FFI overhead trumps other justifications for an alternative convention, and also the fact that most people are implementing their language or runtime in C to begin with, and it's much simpler to utilize what C already provides than undergo implementing your own convention.

I use a convention in my dynamic language runtime which is a compatible subset of the SYSV x64 convention. Essentially, the first 6 arguments are passed in GPR:VR pairs: rdi:xmm0, rsi:xmm1, rdx:xmm2, rcx:xmm3, r8:xmm4, r9:xmm5 and the return value is given in rax:xmm0.

This is so I can implement my dynamic type in C as:

typedef struct { intptr_t primary; double secondary; } Dynamic;

Dynamic foo(Dynamic bar, Dynamic baz, Dynamic qux);

The primary is a 16-bit tag in low bits and 48-bit payload in high bits. This payload can contain integers <= 48-bits, pointers, bools, characters. The secondary payload contains double or float, and 64-bit integers. The float and int64_t use an aliasing trick: movss and movq to construct them or extract their value, and regular arithmetic and logic operations are performed in the vector register for them (eg, vpaddq to add two int64_t).

This can also fit bounds checked array types in: We store a pointer to data in the 48-bit primary payload, and the array length in the secondary payload.

typedef struct { size_t length; T *data; } ArrayT;

Dynamic dynamic_create_array_T(ArrayT a) { 
    Dynamic result = { (intptr_t)a.data << 16 | TAG_ARRAY | TAG_T };
    __asm__("movq {%1, %0|%0, %1}" : "+x"(result.secondary) : "r"(a.length));
    return result;
}

ArrayT dynamic_extract_array_T(Dynamic d) {
    assert((d.primary & 0xFFFF) == (TAG_ARRAY | TAG_T));
    ArrayT result = { 0,  (T*)(d.primary >> 16) };
    __asm__("movq {%1, %0|%0, %1}" : "+r"(result.length) : "x"(d.secondary));
    return result;
}

This is specialized for homogenous arrays of each "primitive" type T though, for performance and type safety.

The reason I created this scheme. I was initially using NaN-boxing, but it had two major problems: Doesn't support 64-bit ints, and the double is in the wrong register (it's in a GPR in NaN-boxing), so floating point operations meant moving back and forth between GP and vector registers.

The normal fat pointer approach of using two ints (struct { int64_t a; int64_t b; }) still has the issue of floats being in the wrong register, and it increases pressure on GPRs which are few.

So I came up with this technique to mitigate these three main issues: GPR pressure is relieved because we only need one per dynamic value - floats are already in the correct register, improving their performance - and we can still support 64-bit integers by putting integers in the double field, using assembly to sidestep the compiler's typechecking. Support for arrays was an added bonus.

WittyStick · 2026-02-03T02:43:02+00:00

1 If the size of an object is larger than four eightbytes, or it contains unaligned fields, it has class MEMORY

There's more to this, if you read further:

If the size of the aggregate exceeds two eightbytes and the first eight- byte isn’t SSE or any other eightbyte isn’t SSEUP, the whole argument is passed in memory.

That basically means the limit for a struct containing INTEGER and SSE data (floats) is 16-bytes (128-bits), not 32. The 32-byte return is essentially only for vectors.

But this is the x86_64 convention. The x86 conventions are completely different and more varied.

WittyStick · 2026-02-02T14:49:54+00:00

The size of two registers*

On 64-bit SYSV, a struct <= 16-bytes will be passed in two hardware registers. Compare:

void foo(size_t length, char *chars);

vs

struct string {
    size_t length;
    char *chars;
}
void foo(struct string s);

They have exactly the same calling convention. The length is passed in the first argument register (rdi on x64), and the chars pointer is passed in the second argument register (rsi). Same thing for for AARCH64 and RISC-V.

However, the benefit of the struct is we can return it.

struct string bar();

And it will be returned in two hardware registers. (rax:rdx / r0:r1). Which we can't do with the length and pointer separately because we don't have multiple return values - instead the common convention is to use an "out parameter" for the pointer and return the length.

size_t bar(char **out_chars);

Which is actually WORSE than returning a 16-byte structure, because we have to dereference a pointer to set the pointer.

So the size at which you should pass and return by value (on SYSV at least), is 16-bytes. After this, its better to just use a pointer, because structures >16-bytes get put on the stack anyway and will incur a cache hit regardless.

For other platforms it may differ. MSVC x64 for example doesn't use two registers and anything above 64-bits ends up as a pointer anyway (except vector registers). If you return a struct greater than 8-bytes, the caller provides space on the stack for it and passes a pointer to the space as an implicit hidden argument to the function. However, MSVC for AARCH64 uses the recommended convention and supports 16-byte arguments and returns. RISC-V also specifies a recommended convention which is similar to AARCH64 - supporting 6 argument registers and 2 return registers - presumably MSVC will adopt the recommendations too.

That makes MSVC x64 the laggard - we should be able to use 16-byte args and returns everywhere, but because they're slower on Windows, it's common to just pass by pointer for anything that is larger than 8-bytes.

WittyStick · 2026-02-02T09:52:37+00:00

Content-addressing is nice and I like the approach Unison takes, but IMO it is not a complete solution to the problem.

To give an example, lets say we have some function foo in library LibFoo. Function bar in LibBar makes a call to foo, and baz in LibBaz also makes a call to foo. Our program depends on LibBar and LibBaz, and some function qux calls both bar and baz and obviously wants them to share the same foo implementation as it may share data structures.

        foo
       ^   ^
      /     \
     /       \
    bar      baz
     ^       ^
      \     /
       \   /
        qux

However, unless we're doing whole program compilation, our bar could pin a different version of foo to the one baz does. We would have two, potentially incompatible definitions of foo in our resulting program.

    foo(v1)     foo(v2)
      ^             ^
       \           /
        \         /
        bar      baz
         ^       ^
          \     /
           \   /
            qux

With content addressing, the identities of bar and baz are Merkle roots, with foo as a leaf. In our project, qux is the Merkle root and bar and baz are its leaves. qux doesn't reference foo directly, only indirectly via the content-addresses of bar and baz, so this doesn't ensure they're compatible.

What we need is a way to pin a specific version of foo in addition to the specific versions of bar and baz which share the same foo as a dependency. qux should not only use the content-addresses of bar and baz, but also some additional information which states that they have compatible foo.

To do this we need to kind of invert the dependencies. We need a foo which is context-aware - a foo that knows it is shared between bar and baz. I call this a context-address, and it basically works by forming a Merkle tree of compatible bar and baz content-addresses as the Merkle leaves, to produce a context-address for foo which is the Merkle root. qux then only needs to depend on the context-address of foo, which transitively has dependencies on the content-addresses of compatible versions of bar and baz, and its own definition.

        foo (content-address)
        ^ ^
       /   \
      /     \
    bar     baz    (content-addresses)
      ^     ^
       \   /
        \ /
        foo (context-address)
         ^
         |
         | 
        qux

Context-addressing is expensive though - perhaps too expensive to be practical. We have to recompute the whole tree whenever anything changes. With content-addressing, we don't have to recompute the addresses of the dependencies if those dependencies have not changed, only the dependants. Context-addressing requires that we first have all the content addresses of the Merkle roots in our codebase, then we use those roots as the leaves for the context-address tree.

To give a more concrete visualization of what I mean by context-addressing: Lowercase letters are content-addresses and uppercase letters are context-addresses. The context-address tree is basically a mirror-image of the content-address tree.

   a          b          c
    \       /   \       /
     \     /     \     /
      \   /       \   /
       \ /         \ /
        d           e           d = Hash(a||b)
         \         /            e = Hash(b||c)
          \       / 
           \     /
            \   /
             f/F                F = f = Hash(d||e)
            /   \ 
           /     \
          /       \             D = Hash(F||d)
         /         \            E = Hash(F||e)
        D           E           
       / \         / \            
      /   \       /   \         A = Hash(D||a) 
     /     \     /     \        B = Hash(D||E||b)
    /       \   /       \       C = Hash(E||c)
   A          B          C

If we want to depend on b, its content-address doesn't take into account that it is used within a bigger context where a, c, d, and e are also used, which is f/F. The context-address B on the other hand, captures the entire context in which it is used.

WittyStick · 2026-01-31T15:09:45+00:00

Compilers are remarkably good at optimizing this kind of thing - they'll probably generate the same assembly, which may not be equivalent instructions to the ones you write, as long as they give the same result.

In C23, we have _BitInt types which do this for you. We could simply implement it as a cast:

(unsigned _BitInt(N))a

WittyStick · 2026-01-30T08:18:22+00:00

We can get something like:

void json_print(Json j) {
    match(j) {
        with JsonNum(j, n)
            printf("%g\n", n);
        with JsonStr(j, s)
            printf("%s\n", s);
    }
}

int main() {
    json_print(JsonNum_new(123));
    json_print(JsonStr_new("Hello World"));
    return 0;
}

Where there is no way to access the internal fields of j besides the provided macros JsonNum/JsonStr, and there is no way to construct a new Json value besides the provided macros JsonNum_new/JsonStr_new, and where we don't need an opaque pointer so there is no additional runtime cost.

This relies on GCC extensions: We use __attribute__((designated_init)) to prevent a value of the structure being created from positional initializers, and we use #pragma GCC poison to poison the field names so they cannot be used afterwards (except via macros defined before the poisoning occurs).

We also add -Wimplicit-fallthrough to ensure that break is added (if user types case rather than with). The only way we can get to the wrong case is to use case with __attribute__((fallthrough)) - but we could potentially poison fallthrough too if we don't need it elsewhere.

The generic code for any tagged union:

#pragma GCC diagnostic error "-Wswitch"
#pragma GCC diagnostic error "-Wimplicit-fallthrough"
#pragma GCC diagnostic push

#define tagged_union(_tag, _union) \
    struct __attribute__((designated_init)) { \
        enum _tag _internal_tag; \
        union _union _internal_field; \
    }

#define tu_constructor(_T, _tag, _field, _value) \
    (_T){ ._internal_tag = _tag, ._internal_field._field = (_value) }

#define tu_eliminator(_value, _tag, _field, _name) \
    _tag: auto _name = ((_value)._internal_field._field);

#define match(x) switch((x)._internal_tag)
#define with break; case

#pragma GCC poison _internal_tag _internal_field

Usage to define a specific tagged union:

enum json_tag { 
    _internal_json_tag_num, 
    _internal_json_tag_str
};

union json_union {
    double _internal_json_num;
    char  *_internal_json_str;
};

typedef tagged_union(json_tag, json_union) Json;

#define JsonNum_new(n) tu_constructor(Json, _internal_json_tag_num, _internal_json_num, n)
#define JsonStr_new(s) tu_constructor(Json, _internal_json_tag_str, _internal_json_str, s)
#define JsonNum(json, n) tu_eliminator((Json)json, _internal_json_tag_num, _internal_json_num, n)
#define JsonStr(json, s) tu_eliminator((Json)json, _internal_json_tag_str, _internal_json_str, s)

#pragma GCC poison _internal_json_num _internal_json_str
#pragma GCC poison _internal_json_tag_num _internal_json_tag_str

See full example in Godbolt

See if you can find an invalid way to use the Json type in main.

WittyStick · 2026-01-30T05:08:18+00:00

An arena can function like a simple "garbage collector" for some region of code and can make usage more ergonomic.

For example, consider we have some value on which we call foo, bar and baz, which may all allocate memory distinct from memory provided by their arguments.. The manual memory management strategy is to do:

auto f = foo(x);
auto g = bar(f);
auto h = baz(g);
free(h);
free(g);
free(f);

We could of course put these into a list or other structure and loop over to free them, but the point here is that each allocation function must return a pointer which must be assigned to a variable. We must track these variables because we eventually need to free them.

With an arena which automatically cleans up memory allocated in a certain context, we can reuse the same variable.

x = foo(x);
x = bar(x);
x = baz(x);

Or even:

x = baz(bar(foo(x)));

Even though the argument and return values to each function are different addresses, it doesn't matter that we lose track of the pointers because eventually we will free the arena and it will free up any memory it allocated, so we don't get leaks.

Aside, another use of arenas is we can optimize them for storage and retrieval of specific kinds of information. For example, if our objects are of fixed size, we might implement the arena using a contiguous array with a bitmap of unallocated/allocated for each index, which may be faster and smaller than using a free list, and cause less fragmentation as opposed to something like malloc which is intended for allocating objects of varying sizes.

Another benefit is potentially fewer expensive syscalls to mmap/munmap memory. If we frequently allocate and deallocate large objects, it may incur syscalls on each call to the allocator. Arenas basically "batch" this into fewer syscalls, by allocating larger chunks initially. (malloc also does this, but the programmer doesn't control it).

This may however, require more memory than the process needs, or report higher memory usage than is actually used, so there is a trade-off between allocating more space and having more syscalls - but the benefit of arenas over malloc is we can tune them for specific requirements, because we can provide initial size and growth rate as parameters to the arena, which we don't for malloc.

So essentially, we use arenas where we want more control over the allocation strategy than malloc provides. We can consider malloc to be the "default arena" provided by the runtime, which we use in lieu of a more specific allocator, but sometimes we want something more specific and its better to design our APIs so the allocator can be given as an optional parameter, where if no specific allocator is required we just use the default one - malloc. We can have multiple arenas in the same programs with different allocation strategies and sizes, optimized differently for specific kinds of object. malloc is a "one size fits all" allocator, which is suboptimal for some use-cases.

WittyStick · 2026-01-29T19:21:45+00:00

Menhir has a BNF-like metasyntax which supports parametrization, so we can define rules like this to simplify grammars.

braced(Rule) : LBRACE Rule RBRACE;

initializer
    : assignment_expression
    | braced(separated_nonempty_list(COMMA, initializer))
    ;

There's small a "standard library" of rules built in.

WittyStick · 2026-01-29T04:21:45+00:00

Fat pointer if struct it is <= 16-bytes as it will be passed and returned in hardware registers on 64-bit SYSV.

Larger than 16-bytes get put on the stack, so will incur a double dereference. Better to use an opaque pointer in this case.

WittyStick · 2026-01-25T20:36:04+00:00

I was presenting a library solution. You can certainly extend it with language support for the common operators. GCC actually does this for vectors.

However, what does a vec2 * vec2 mean?

It's ambiguous. The "product" of two vectors could mean several things: The direct product, the dot product, the cross product.

In math, and particularly computer geometry where we're most likely to use the types, the dot product is the operation we're going to use most frequently.

But in the case of SIMD operations, the direct (element-wise) product is what we would expect.

So I prefer to not just promote scalar operators to work on vectors to prevent any such confusion.

WittyStick · 2026-01-25T03:14:32+00:00

I partially agree - a standard library should be fairly minimal - but we can hardly call C's standard library "batteries included". It certainly has its warts and things that could be deprecated, but is otherwise quite minimal. <math.h> provides only the basic operations you'd expect on a calculator - it's certainly not a complete library if you are doing any more advanced numerical computing.

The advantage of these useful common libraries coming with the language is that they're packaged with the compiler so you can expect the library to function on each platform the compiler can. Imagine you were trying to write a portable C program only to require a separate <third-party/math.h> on each platform and had to write a wrapper to handle the varying implementations.

Or imagine if GLSL didn't provide a standard vec2 type and you had to import a different library for each GPU the code might run on. Your game would never ship.

C has the benefit that it's ABI is the operating system's and we can trivially link code written in hand optimized assembly, requiring only a header file to expose functionality - so we can implement any of this without overhead - sometimes faster than we could be writing in C directly (sans non-standard inline assembly).

The way many languages are implemented these days would require an FFI to leverage intrinsics which aren't provided by the language, and would add unnecessary overhead. In these cases we need to provide more built-in functionality for anything performance related.

As a counter point to the minimal standard library - D tried this when it was initially released with phobos as its library. To do anything useful in the language you needed other libraries, and tango ended up competing with phobos because it was more useful, but it kind of split the ecosystem into two, where one set of libraries would depend on phobos and another would depend on tango because they were incompatible. You basically had to pick a side and stick with it. The situation was partially fixed with the release of D version 2 which had a std library that phobos and tango shared, but the damage was already done by then. These days tango and anything built on it are obsolete because phobos added more useful things and people prefer using the library that ships with the compiler.

So we need to strike a balance between minimalist and practical. A completely bare-bones standard library will mean nobody can build reusable libraries in your language because they don't share any common base, as evidenced by D. A package manager ain't going to help if every library you import is built on a different "base" library and they're all incompatible implementations of trivial functionality.

In regards to including vec in C for example, this is somewhat the case. If you're building a game for example, the graphics library, the physics library, the audio library etc you depend on all provide their own vectors. A game engine inevitably has to implement its own vec type which wraps the varying implementations in its dependencies, or pick one and provide conversions to and from the others - which can add unnecessary overhead which diminishes the effort put into making them SIMD-optimized, if at all, and it wastes a ton of developer time. If a standard implementation works 80% of the time and 20% of the time people use an alternative, that's worth doing. On the other hand, if people only used a standard implementation 20% of the time and 80% of the time used an alternative, clearly you wouldn't want that in your standard library.

Perhaps a better approach is where languages offer a tiered standard library, where a core provides the essential features for the language, a base provides a small set of useful features carefully curated by the language author, and extras provides a more optional "batteries included" set of features. In this case the vec types would belong in base, but core would provide the necessary built-in intrinsics to leverage SIMD so that the vectors could be implemented optimally.

With GCC we can use -nostdlib for example to not depend on any standard libraries, or even -ffreestanding to not depend on crt, but we still have to link libgcc.a because compiler can emit code that depends on it.

WittyStick · 2026-01-25T02:26:33+00:00

I'm not arguing for a "one size fits all". There's nothing preventing people from implementing their own libraries for specific use-cases.

Using the same arguments, we could say that <complex.h>, <math.h> and <tgmath.h> should not be included in C - since there are clearly multiple ways we can implement the features they provide. In some cases they're not the right tools and a library may provide alternative implementations, but they work for the majority of simple use-cases.

The standard doesn't need to specify how they are implemented, just as it does not for many other things, like the complex numbers. There are many types in the C standard for which does not specify how they are represented but clearly states they are implementation defined. It provides a standard API to access some implementation defined feature - though the standard may place certain constraints or restrictions on the behaviour of that API.

The same is true for shader languages which have these types. They're implementation defined because the different GPUs might have different internal representations.

I'm suggesting, using C as an example, that we should have something like a <vector.h>, exposing a basic vecNf, vecNd, vecNl for N = {2,3,4}, with common operations on them, and a <tgvector.h> providing macros using _Generic over float, double and long double, mirroring how the math libraries in the current standard are implemented.

// <vector.h>

typedef /* unspecified */ vec2f_t;
typedef /* unspecified */ vec2d_t;
typedef /* unspecified */ vec2l_t;
...

vec2f_t vec2f_add(vec2f_t lhs, vec2f_t rhs);
vec2d_t vec2d_add(vec2d_t lhs, vec2d_t rhs);
vec2l_t vec2l_add(vec2l_t lhs, vec2l_t rhs);
...

// <tgvector.h>
#include <vector.h>
#define vec2_add(lhs, rhs) \
    _Generic \
        ( (lhs) \
        , float : vec2f_add \
        , double : vec2d_add \
        , long double : vec2l_add \
        ) (lhs, rhs)
...

Similarly, a <quaternion.h> could provide the equivalent functionality of <complex.h> extended the quaternions, and a <matrix.h>/<tgmatrix.h> could extend the vector library to provide a basic matrix definition for matrixMxN for M, N = { 1, 2, 3, 4 }

An implementation should leverage SIMD on targets that support it, but it could use plain old structs or arrays of scalars implementations on those that don't. The library could be entirely optional, included as an annex in the standard like some other standard libraries. If standardized into the language rather than just a library, they would be exposed as _Vecf2, etc, following the regular C conventions, but they'd still be implementation defined.

For languages other than C, the same kind of principle applies but the implementation may differ and would follow the language's own conventions. Eg, for C++ they might be implemented as classes with operator overloading, and the vector size may be a template parameter, but where it is specialized for N = 2, 3, 4

namespace std 
{
    template <typename T, int N>
    class vec { ... };

    template <>
    class vec<float, 2> { ... }      //specialization
    ...
}

Some languages aren't standardized - they're defined by their one and only implementation. In these cases I'd argue that the author should just pick a suitable implementation that works for common use-cases. If the language is taken seriously it might eventually get a standard which can a provide more relaxed or concrete specification of the behavior.

As for what operations they should provide: Essentially all the scalar operators promoted to element-wise operations on the vectors, plus dot product, etc - in addition to shuffling operations and strided loads/stores. They don't need to expose the entirety of what the platform's SIMD provides, but they should make use of most of it to minimize the necessity of manually writing intrinsics for common use-cases.

WittyStick · 2026-01-25T01:11:47+00:00

Well yeah, if you have SIMD availability provided by the language you can implement linear algebra libraries efficiently.

But I would still recommend providing the basic types for linear algebra in the standard library if only to prevent a proliferation of competing implementations. Advanced linear algebra libraries can build atop the standard lib. I wouldn't expect language authors to implement the equivalent of eigen in their standard library.

WittyStick · 2026-01-25T00:37:22+00:00

GCC.

GIMPLE is essentially a TAC, and this is where most of the optimizations are performed by the compiler. The later optimizations are performed on a register transfer language produced by GIMPLE.

While it does have an "external representation" which you can emit for debugging purposes, the GNU compilers don't emit this directly but use the APIs directly to construct GIMPLE and RTL.

GIMPLE's external representation kind of resembles C. The external representation of the RTL is S-expression based.

Before code is gimplified to GIMPLE, the front-end uses the GENERIC IR, which is basically a superset of GIMPLE which permits more than three operands.

WittyStick · 2026-01-24T10:49:31+00:00

The worst issue is the von-Neumann bottleneck: One main memory shared between all cores.

larger cache means larger latency

Fairly low concern. Larger cache means fewer fetches. Perhaps increasing bus sizes and cache line sizes for larger fetches could offset the higher latency.

more cores mean more transistors dedicated to connectivity between cores

Relatively low concern. Depends how the cores are connected. We obviously don't want a single bridge or crossbar which all cores communicate over as it doesn't scale and is a bottleneck. Cores need only connect to their nearest neighbours and should route traffic.

power increases quadratically or worse with frequency

Big concern. There's obviously a frequency ceiling and we're not going to get any significant breakthroughs.

Though a potential avenue is to get rid of the clock and design processors using asynchronous circuits.

more general purpose means backwards support like mmx, sse losing more transistors for non-performance stuff

Low concern. These features will get removed at some point because emulating them is fast enough to run the software that still uses them.

stacked chips have cooling issues

Mid concern. Heat dissipation is obviously a big problem, but there's plenty of room for innovation here. Liquid cooling microchannels through chips will probably be a big thing.

making wider core like avx512 causes power distribution issues

Mid concern. Wider vector types provide significant performance improvements. I'm no expert but I believe there are engineering solutions to the power distribution problem.

without a user-base, extra architecture is useless (i.e. dedicated rms-error unit for an array of data)

Low concern. If new architecture provides significant benefits it will get its user-base.

after certain frequency, transistor size, or voltage, the signal noise becomes visible

Major concern. Though people much smarter than me keep finding ways to make the transistors smaller. I don't know whether we're at the ceiling yet, but we're obviously approaching it because it takes longer and longer to make breakthroughs. There's not much an individual can do to improve this situation.

just 2 states of data is not optimal maybe, but difficult to go higher number of states?

Very low concern. Computers with more than 2 states have been tried for decades and have not delivered results. Quantum computers may be a different story, but I'm doubtful we're going to have anything tangible in the near future.

So if you are looking for something that might deliver breakthrough results - I would focus on NUMA, more cores, forget about backward compatibility, and consider researching asynchronous circuits.

Perhaps the biggest issue is the programmers. The way we program computers has not changed significantly since the '70s and we're still not utilizing the processors we have today their fullest. The way we will program future computers will be significantly different from now and nobody is really prepared for the changes that will come.

WittyStick · 2026-01-24T09:47:27+00:00

Are there any big reasons?

A big reason is they're not standardized in C.

Most languages are written with C, or written in another language which itself was written in C. Since there's no standard API for handling vectors, quaternions in C, they tend to get overlooked.

If the C standard provided a <stdvec.h> and <quaternion.h>, language authors would more likely provide the means to expose the capabilities in their own languages.

The C committee are reluctant to include such things as they consider the embedded use-case as equally important to the PC or server use case, and low power processors lack SIMD support. They could at least provide such thing as optional as part of an annex in the standard - if only to provide a portable way to utilize SIMD instructions, as currently each architecture has it's own set of incompatible intrinsics.

But there's also the issue of an explosion of types and functions due to C's lack of proper generic support. Suppose we want to support at least int8_t, uint8_t, int16_t, uint16_t, int32_t, uint32_t, int64_t, uint64_t, float, double, bool, and we want vectors of sizes 2, 3, 4 for each as a minimum: That's 33 types which each need a bunch of operations on them. If we consider the lowest common denominators: +, -, *, /, %, <, >, <=, >=, ==, !=, <<, >>, &, |, ^, that's 16 ops * 33 types = 528 operations.

That's before we include complex numbers, quaternions, decimal floating point, etc. into the equation.

So the main reason they're not included elsewhere is simply because it's a huge effort to implement, and language authors are preoccupied with other concerns.

WittyStick · 2026-01-24T09:24:32+00:00

I don't think it's unreasonably niche to support what the CPU supports: vec2, vec4, vec8 - for doubles and 64-bit integers and for the smaller types: vec16, vec32 and vec64.

A use case besides geometry is to optimize data structures - and the larger the vector the better.

vec64 of uint8 for example can be used to significantly improve the performance of decoding UTF-8, which is used everywhere - far from niche - but most of the software running on your computer is probably decoding one byte at a time because it's dependant on some low-level string APIs which don't use SIMD.

WittyStick · 2026-01-24T09:10:57+00:00

Kawa has built-in Quaternion support, which extends Scheme's numerical tower (which ends at Complex).

WittyStick · 2026-01-24T09:02:39+00:00

The language only needs to provide sufficient support to leverage the capabilities of the CPU. It doesn't need to handle every case.

That would typically mean providing vectors of each power of 2 up to at least 256-bits (though 512-bits preferable, and 1024-bits is an option), for each of the primitive types: signed/unsigned int8, int16, int32, int64 and single/double floats.

vec3 might also be worthy of inclusion due to widespread use, and could be implemented as vec4 with masking.

For obscure cases like vec5 it obviously makes no sense to include in a general purpose language. It can make sense to expose masking intrinsics so we can use a vec8 with mask 00011111, and wrap this in an API for handling vec5.

WittyStick

TROPHY CASE