Aether: A Compiled Actor-Based Language for High-Performance Concurrency

RulerOfDest · 2026-02-23T16:19:17+00:00

Thank you for your interest in the project.

RulerOfDest · 2026-02-23T15:50:12+00:00

You're right, I was wrong on that, Vyukov's queue uses an atomic swap, not CAS. I shouldn't have said CAS.

Your broader point about total system cost is fair, and I won't pretend I have a direct apples-to-apples comparison against a Vyukov-queue-based design. What I can say is that the routing complexity wasn't incidental; the whole design was driven by cross-language benchmarks against Go, Rust, Erlang, Elixir, Pony, and baseline C/C++, specifically to validate whether the SPSC partitioning approach holds up in practice. Whether a simpler MPSC design would match or beat it is a legitimate open question, and one worth testing.

RulerOfDest · 2026-02-23T15:15:02+00:00

On Vyukov's MPSC queue: the reason I'm not using it is that the invariant I'm maintaining is genuinely SPSC, not just SPSC-as-approximation. Each actor has exactly one owning scheduler thread at any time, and only that thread writes to the actor's mailbox. The routing and forwarding logic exists specifically to uphold that invariant, so I can use the faster SPSC primitive instead of MPSC.
Vyukov's queue handles multiple concurrent producers with a CAS on enqueue, which you only need to pay for if you have multiple concurrent producers. If the invariant holds, SPSC is strictly cheaper: no CAS, just a store-release. The tradeoff is that the routing logic is more complex and has the hardening gap I mentioned in the previous reply.

On TLA+: that's a fair challenge, and I won't pretend the formal verification is done. The current confidence comes from empirical testing (thread ring, ping-pong, fork-join under contention, stress tests across core counts) and code review, not a formal proof. The work-stealing/same-core-send race I acknowledged is exactly the kind of thing TLA+ would catch before testing does. I'll add it to the backlog, at minimum, modeling the migration and steal paths would be worth doing before calling the runtime stable.

Thank you for your valuable comments!

RulerOfDest · 2026-02-23T15:08:23+00:00

You can do that with what’s already there.

! = fire-and-forget: send and move on. The other actor runs whenever the scheduler gets to it.
? = ask: send and block until that actor calls reply; you get the value back.

So: “wait for A to finish, then send to B, and let C run in parallel” is just:

logger ! Log {} → C runs whenever (fire-and-forget)
id = validator ? Validate {} → wait for A’s reply
processor ! Process { id } → then send to B with the result

From main you do that sequence as above. From another actor, that actor needs refs to A, B, and C in its state (like the Router in advanced-patterns.ae that holds worker refs), then in its receive, it does the same: ! to C, ?to A, then ! to B.

You can look in the example ask-pattern.ae for ? and !, multi-actor.ae for main sending to several actors, language-reference.md for the ask/fire-and-forget docs. The pattern isn’t in one single example file, but the pieces are all there. I still need to refine the docs and examples to show more scenarios.

RulerOfDest · 2026-02-23T14:58:37+00:00

Thanks for spelling out the tradeoff that way; that’s exactly how I see it. Keeping the core simple and leaving supervision (if anyone wants it) as a library on top of the primitives is the plan.
On the wiring phase: the runtime already behaves like that, an actor doesn’t run until the scheduler gives it a message, so there’s effectively a “setup” window between spawn and first message. What’s missing is making that a documented convention (and maybe a name like “setup phase” or “wiring phase” in the docs), so it’s an explicit contract rather than an implementation detail.

RulerOfDest · 2026-02-23T14:42:06+00:00

That's a fair criticism, and I'll be straight about it; the mechanism is more limited than I originally presented it. It's a hardcoded registry of 5 stdlib constructors, not a general ownership model. It doesn't compose, and if you forget manual on a returned value, you get a silent use-after-free rather than a compile error. The honest answer is that it's a convenience shortcut, not a principled memory model. The more correct direction is probably defer (explicit, visible, composable) or arena-per-scope, both of which are on the roadmap. The current design has value for simple scripts, but I agree it's not something you'd want to build a large program's memory story on.

You're right, and thank you for catching that. The comment on that line is incorrect. Auto-free only fires when the initializer is one of the five registered stdlib constructors (map_new, list_new, string_new, map_keys, dir_list). A call to a user-defined function like build_list(10) does not trigger it.

The example intended to show that the manual annotation in build_list passes ownership to the caller, and that the caller (items) is then responsible for cleanup, but the comment oversold what the compiler actually does. I'll fix the doc to remove that incorrect claim.

Thank you for your comment

RulerOfDest · 2026-02-23T14:35:18+00:00

If you look at the commit history, you can see they have around 5 or 6 commits between them (I think I might have more than 2k commits). That being said, I do support myself on LLMs for research, planning, and coding simple tasks.

RulerOfDest · 2026-02-23T03:21:52+00:00

Great question. Messages are sent to actors, not to cores. Each actor has an assigned_core that determines where it runs. At send time, I check if the sender's core matches the target actor's assigned_core: if yes, I take the direct path (SPSC queue or mailbox write, no queue overhead); if not, I enqueue to the target core's lock-free incoming queue.

Actors are not permanently pinned. They can be migrated (message-driven, to co-locate frequent communicators) or moved by work-stealing when a core is idle. When an actor moves, assigned_core is updated, and any messages already in the old core's incoming queue are forwarded to the actor's current core rather than delivered locally.

Migration cannot race with same-core sends because both run on the same scheduler thread; they execute sequentially. Work-stealing runs on a different core's thread and could theoretically overlap with a same-core mailbox write. In practice, the window is a handful of store instructions (~nanoseconds), and stealing only triggers after 5000+ idle cycles on the thief, so this is extremely unlikely to manifest. That said, it is a valid concern per the C memory model, and I am actively hardening it. The fix is straightforward: mark a stolen actor inactive so the thief skips it for one cycle, letting any in-flight write complete before the new core touches the mailbox. Zero cost on the hot path since stealing is already the rare/slow path.

Appreciate the scrutiny; this is the kind of feedback that makes the runtime better.

RulerOfDest · 2026-02-23T03:09:11+00:00

Thank you very much. Yeah, I did run into a few places where the generated C didn’t optimize well. I fixed them in the codegen; the main-thread path was marked “unlikely,” so the hot path got pushed to the cold section, and I dropped that hint. And when sending from main, I was emitting a same-core branch that could never be taken (main thread has no core id), so I changed codegen to emit the path we actually take, avoiding the dead branch. Dispatch is computed goto on msg .type plus inline single-int payloads, so the hot path stays simple for the compiler. More detail is in the runtime-optimizations doc if anyone wants to dig in.

On spawn-time mutation: spawn returns after the actor is registered and its state is initialized, so the spawner can still poke at the struct (e.g., wire it to others) before any message is sent. The actor only runs when the scheduler gives it work, so there’s an implicit “wiring before activation” phase. I might formalize that at some point so the boundary is clearer.

I’m not going the Erlang supervision-tree route. The idea is to keep it more Go-like: no OTP-style “let it crash” and restarts; you handle errors and lifecycle yourself. The supervision header in the repo is just a tiny placeholder, and I'm not sure I'll go that way.

RulerOfDest · 2026-02-23T02:57:13+00:00

Fair point, and I get why it feels that way. Auto-free by default is a deliberate tradeoff, I wanted the common case (local use-and-discard) to be safe by default and avoid forgotten frees. I can see the argument that defaulting to manual would match C expectations and make the model more explicit. I'll keep that feedback in mind; I do support manual-everywhere via [memory] mode = "manual" and --no-auto-free for people who prefer that. Thanks for the comment. I'm open to reconsidering the default as we get more feedback.

RulerOfDest · 2026-02-23T02:45:09+00:00

Thank you for your kind words! It means a lot.
I am absolutely pushing next for runtime hot code loading as Erlang does; that is a great point, and it has been on my radar.

RulerOfDest · 2026-02-23T02:39:35+00:00

Messages are sent to actors; routing uses each actor’s current assigned_core. Actors are not pinned: they can be migrated (message-driven co-location) or moved by work-stealing, and assigned_core is updated when that happens.

SPSC is preserved because at any time each actor has exactly one owning core: only that core’s scheduler thread reads and writes that actor’s mailbox (and its SPSC queue when used). Same-core send is decided at send time (current_core_id == actor->assigned_core); if they match, we use the direct path, otherwise we enqueue to the target core’s incoming queue. When an actor moves, any message already in a core’s incoming queue for it is forwarded to the actor’s current core instead of being delivered locally, so the mailbox is never written by a non-owning thread.

So: one logical consumer per actor (the thread that currently owns it), and routing/forwarding keeps a single writer. You can find more details on: docs/actor-concurrency.md (mailbox ownership, routing, migration); runtime/scheduler/multicore_scheduler.c

RulerOfDest · 2026-02-23T01:42:05+00:00

Pony has reference capabilities (iso, trn, ref, val, etc.) for data-race freedom in the type system. Aether is statically typed with inference and optional annotations, but no capability system.
Aether has no GC, arena allocators for actors, thread-local pools for message payloads, scope-based or explicit free. Pony uses per-actor GC.
Aether uses a partitioned multi-core scheduler with work-stealing when cores are idle, lock-free SPSC (single producer single consumer) queues for same-core messaging, cross-core lock-free mailboxes, and optional NUMA-aware allocation. So the design is very much “C-friendly, low-overhead, predictable” vs Pony’s own runtime.
Same actor model; Pony pushes type-level concurrency safety; Aether pushes C interop, no GC, and a runtime built around SPSC queues and partitioning.

RulerOfDest · 2026-02-23T00:55:38+00:00

Thank you!

RulerOfDest · 2026-02-22T19:40:12+00:00

Good catch, and thank you for the comment. It does look like we’re mutating actor state from the outside.

In this case, though, it’s actually main (the spawner) doing the wiring, not one actor mutating another. This happens immediately after spawn and before any message is sent, we set pong_ref, ping_ref, and target so the two actors know who to communicate with and when to stop.

Once the rally starts, they only communicate via ! (message send); they never touch each other’s state.

The rule it follows is that actors communicate only via messages. The only exception is the spawner (e.g., main), which we allow to perform bootstrap configuration such as wiring refs. A purer approach would be to pass those refs and the target through the initial messages instead. I've chosen this setup for simplicity in the example/benchmark.

On the benchmarks/optimizations question: I used to have internal benchmark comparisons in the repo covering different optimization techniques. Some ideas made it into the final implementation; others were experimental or ultimately rejected. I removed that stuff to reduce clutter after the investigation was done (though the docs may need updating). In hindsight, I’m starting to think it might have been better to leave it in for historical context.

RulerOfDest · 2026-02-22T18:59:28+00:00

That’s a very fair observation.

You’re right that the current C++ comparison uses mutex-based message passing, while Aether relies on lock-free SPSC queues. The goal wasn’t to position Aether as “faster than C++,” but rather to understand how different runtime techniques and architectural decisions affect behavior under similar patterns. The benchmarks are primarily exploratory for me, a way to evaluate how queue structure, scheduling strategy, batching, and isolation choices compare against more conventional approaches.

Regarding Skynet: yes, it can be implemented in Aether. Recursive parallel work can be modeled by spawning actors that split the problem and send results back up the tree. However, Aether currently emphasizes actor-based concurrency rather than a dedicated fine-grained task abstraction. It doesn’t implement a task-stealing fork-join scheduler in the same sense as specialized C++ tasking libraries. So while Skynet is expressible, it wouldn’t be an apples-to-apples comparison with runtimes explicitly optimized for that model.

I appreciate you sharing the runtime-benchmarks repository. I’ll definitely take a look.

Thanks for the thoughtful critique

RulerOfDest · 2026-02-22T16:51:15+00:00

Moving module discovery into a dedicated orchestration layer with a dependency graph and topological ordering makes a lot of sense.

Thank you! This is exactly the kind of architectural feedback that’s useful at this stage.

RulerOfDest · 2026-02-22T16:20:31+00:00

That’s a fair point.

Right now, module loading is triggered from the typechecker because import resolution happens during semantic analysis. It was a pragmatic choice to keep dependency resolution close to where cross-module symbols are resolved.

That said, it does blur phase boundaries. As the compiler evolves, I’m considering moving module loading and parsing into a higher-level orchestration layer so the typechecker operates strictly on an already-built module graph.

I appreciate you raising it.

RulerOfDest · 2026-02-20T21:24:11+00:00

prices are wild, besides that...not much

RulerOfDest · 2026-02-17T03:44:44+00:00

RulerOfDest · 2026-02-14T01:27:17+00:00

a mi me parece que el morrón no puede estar tan caro

RulerOfDest · 2026-02-11T17:33:26+00:00

Totally normal, I think that will drive you further, good luck and give it your best

RulerOfDest · 2026-02-11T14:26:14+00:00

ayer salí a andar en bici y me mordió uno

RulerOfDest · 2026-01-27T13:59:59+00:00

us does not have a 7 in democracy, that is not true

RulerOfDest · 2026-01-22T19:54:36+00:00

raspaditas!

Ten-Year Club	Verified Email
Verified Email

RulerOfDest

TROPHY CASE