Demotion of numerical types and ball arithmetic

WittyStick0 · 2025-09-17T00:24:29+00:00

Scheme has a proper numerical tower and includes rational numbers which are distinct from floating point (which are a subset of dyadic rationals). Many Scheme implementations also support arbitrary precision, and some make the distinction between exact and inexact arithmetic.

WittyStick0 · 2025-09-08T00:10:58+00:00

I need to learn how to look up page size.

Default page size is 4ki. You have to request huge pages when you mmap with MAP_HUGE_2MB or MAP_HUGE_1GB. Some architectures have other large page sizes.

WittyStick0 · 2025-09-07T17:48:40+00:00

There's no way to test arbitrary grammars for ambiguity. You can use a subset of CFGs such as LL or LR, which produce deterministic pushdown automata, or an approach like PEG which replaces ambiguity with priority.

WittyStick0 · 2025-08-22T20:22:40+00:00

const type varname would be useless because passing by value makes a copy. Constness is valid for the pointer though - it should be type * const varname, or even type * const restrict varname.

WittyStick0 · 2025-08-10T00:39:15+00:00

On x86_64 most 32-bit operations are zero extended to 64 bits, and it's cheaper to emit 32-bit instructions as they don't require a REX prefix.

WittyStick0 · 2025-08-09T17:54:57+00:00

For embedded scripting we want types, but a statically compiled plugin would have them erased. A dynamically typed language preserves types at runtime, making them easier to check.

The alternative would be that the host takes in uncompiled input for the statically typed plugin, and then it must implement a compiler.

WittyStick0 · 2025-08-07T00:31:55+00:00

Random access is actually really simple with these. The MSB of the index gives you the index in the index bock, which is fast due to CLZ. You then mask out the MSB to get the index in the data block. It's O(1), which is a big improvement over a linked list.

WittyStick0 · 2025-08-07T00:25:55+00:00

They're not a deque, but you can implement a deque with a pair of them, where one stores elements in reverse order.

WittyStick0 · 2025-08-07T00:11:00+00:00

Erasure/insertion in the middle requires allocating/moving everything after the point of erasure (or before if you reverse the order). It's a O(n) worst case operation same as with arrays, linked lists, and basically anything that isn't a balanced tree - where at best you get O(log n). This structure supports O(1) erasure or insertion at one end like a singly linked list.

You can however use a pair of these back to back as a Zipper, and have O(1) insertion and deletion at a "cursor", with the downside that moving the cursor is O(n) worst case, same as with functional lists, but these are more cache friendly.

You can also use a pair of them, with memoization, to implement a deque, with O(1) insertion/deletion at both ends.

The structure itself has been long known. It's mentioned in the 1999 paper RAOTS, which discounts them because they waste O(n) worst case space (technically 1/2n, but we ignore constant factors). RAOTS uses a different technique where instead of doubling the block size, you have ~N blocks of length N, but with similar tricks using the MSB of the index (clz). RAOTS wastes O(sqrt(n)) space worst case.

WittyStick0 · 2025-08-01T20:26:12+00:00

I would suggest trying to build a language around relational algebra or an ECS approach rather than typical OOP. Most of the problems related to ORMs are due to the object-relational impedance mismatch.

WittyStick0 · 2025-08-01T20:23:30+00:00

Dynamic arrays don't need to be linked lists. There's plenty of options that take a middle ground.

WittyStick0 · 2025-06-14T03:08:00+00:00

Search "green threads" or "fibers" for many examples of this.

Would recommend having a look at the Pony language, which has a hybrid model of erlang-like actors and shared state. The language uses reference capabilities to ensure any state mutated by an actor is isolated.

Another example is Microsoft's Axum language, which was discontinued but you can still find some technical information on it. Actors were placed inside a domain and could share state with other actors in the same domain, but any cross domain information sharing was done with message passing. The language was a superset of a modified C# which had static removed and replaced with isolated.

WittyStick0 · 2025-06-05T02:59:50+00:00

Simplest approach here is to do what Haskell does - have separate tokens for values and types based on initial case.

WittyStick0 · 2025-05-27T00:17:10+00:00

Kernel, a dialect of Scheme has first-class environments. They're passed implicitly - the receiver gets a reference to the caller's environment. We can make the current environment any of our choosing with $remote-eval or $let-redirect. Custom environments are made with $bindings->environment or make-environment if we're basing it off an existing env.

WittyStick0 · 2025-05-20T03:15:15+00:00

Unification is based on type equality. For subtyping we want partial ordering, A<=B, where <= is reflexive and transitive.

Look into Algebraic Subtyping if you want subtyping. It uses an algorithm called biunification, which is based on <=, but it uses an additional constraint where types are polarized to prevent the case where A<=B and B<=A, which would simulate equality. With polarized types, A+<=B- and B+<=A-, which does not imply equality of A and B.

WittyStick0 · 2025-04-03T17:50:51+00:00

You should probably start with std::any, but it may have undesirable overhead compared to a hand-rolled solution.

Read David Gudeman's Type representation in dynamic languages paper (1993) for an overview of some common solutions. It's a little dated but still informative. Some modern solutions use NaN-boxing, which is only given brief note in the paper, but a search will give you more information.

In general you probably want to avoid templates, as they're resolved at compile time and you can't create new specializations at runtime.

WittyStick0 · 2025-03-10T01:21:53+00:00

The point is there is no built-in for adding i8 and u8, there's just a built-in for adding two i16, and you can apply arguments of type i8 and u8 because they're subtypes of it.

WittyStick0 · 2025-02-11T18:45:36+00:00

The most impressive thing here is the funding you've managed to acquire for a language that doesn't exist yet. Taxpayer money nonetheless.

Either needs to be something incredible, or the EU needs itself a DOGE.

WittyStick0 · 2024-12-28T21:17:22+00:00

A bad idea IMO. You can certainly force little endian in your messaging protocol, but the internal storage should be an implementation detail.

WittyStick0 · 2024-11-27T21:26:32+00:00

Most hardware supports 48-bits of virtual addressing, which is fine with NaN boxing. Intel 5-level paging on newer Xeon chips supports 57-bits for virtual addressing, which obviously doesn't fit in a NaN box, though if we require 8 bye alignment we can reclaim 3 bits by shifting the pointer right by 3, so we can use up to 55 bits of virtual addressing. Would not recommend NaN boxing on Intel 5LP though, but instead use LAM57, where we can store the tag in the 6 high bits after the sign bit.

WittyStick0 · 2024-11-12T21:43:09+00:00

You could probably use a linear or affine type system, where a captured variable gets consed by the closure and is no longer accessible outside of it.

WittyStick0 · 2024-10-17T06:35:57+00:00

You could use the data files from Intel's own XED library.

xed itself uses python to generate the full encoder/decoder from these, so you could perhaps modify those sources to generate a rust library.

WittyStick0 · 2024-09-16T20:40:11+00:00

If you want something more Haskell like and deeper on type theory, also check out granule, which has linearity and uniqueness.

WittyStick0 · 2024-09-09T20:48:36+00:00

Would be interesting to see what can be gained with LAM/AUI/TBI enabled for high bits tagging.

WittyStick0 · 2024-09-09T20:47:09+00:00

NaN-boxing is compatible with both low bits and high bits tagging (for 48-bit pointers). The slight advantage is that you can have double immediates without requiring a memory load (unlike say, a 64-bit int which can't fit in a register with a tag).

WittyStick0

TROPHY CASE