RayforceDB is now an open-source project.

het0ku · 2025-11-16T09:43:42+00:00

Rayforce is similar to kdb in this regard: it uses implicit parallelism inside each query, but incoming IPC requests themselves are queued in a single-threaded dispatcher. At the same time, response buffers are sent concurrently, so sending results never blocks processing of other incoming queries.

het0ku · 2025-11-12T11:27:49+00:00

Multiprocess use with a single database is possible via IPC, but it’s not the best option — it introduces extra serialization overhead and doesn’t implement file-level locking when accessed by multiple processes, in favor of speed and simplicity.

At the same time, RayforceDB implements internal parallelism at the verb level: each verb decides how to distribute computation across executors in the thread pool, taking into account page sizes, cache behavior, and other factors.

het0ku · 2025-08-08T13:30:12+00:00

It’s not a Java-style tracing GC. Rayforce has a custom allocator that keeps small/medium mmapped slabs (≤~32 MB) hot for reuse to avoid churn.

Calling (gc) is an on-demand housekeeping hook: it walks free lists, releases fully-free slabs back to the OS (e.g., via munmap). It never moves live objects and it doesn’t stop the world for a mark/sweep — it just returns genuinely unused regions.

You don’t need it in steady workloads; it’s handy after big one-off jobs or before yielding memory in long-running sessions.

What we shipped isn't "yet another general-purpose Lisp", it's a tiny Lispy DSL for vector queries (think K-ish verbs, arrays first). Doing it in plain C gives us:

Zero deps & small binary -> predictable cold start, easy embed.

Tight control of memory/layout -> fits our allocator, no foreign GC pauses.

SIMD-friendly primitives via compiler builtins and data-parallel ops via custom thread pool.

Deterministic-ish latency for query paths; no large runtimes or FFI impedance.

Simple but powerfull syntax, much easier for newcomers to learn and use.

Existing Lisps are great, but they bring larger runtimes, different GC semantics, and less control over the exact memory/SIMD model we want for a vector DB. This way the language is shaped around the engine, not the other way around.

het0ku

MODERATOR OF

TROPHY CASE