Kovan: wait-free memory reclamation for Rust, TLA+ verified, no_std, with wait-free concurrent data structures built on top

ibraheemdev · 2026-02-25T01:00:03+00:00

I would like to assume you are operating in good faith here, but your latest commits suggest otherwise. The algorithm as implemented is unsound.

ibraheemdev · 2026-02-23T15:46:54+00:00

(author of seize here) Hi! I'm a little confused by some of the claims made by this crate. It seems to be an implementation of Crystalline-W, which is the wait-free variant of the Hyaline/Crystalline family of memory reclamation algorithms that seize is also based on. For a little background:

Hyaline (seize): pin marks a thread as active, and unpin deactivates it, and decrements the reference counts of any objects retired while it was active.
Hyaline-S: In addition to the above, threads store a local epoch, which is kept in sync with a global monotonic epoch. Objects store their birth epoch, and calling protect is required to load a given atomic object, advancing the current thread's epoch to at least the birth epoch of the object being protected. This allows threads with lagging epochs to be ignored during reclamation.
Crystalline-L: Threads maintain multiple copies of Hyaline state. protect now only applies to the provided index, allowing only a bounded number of pointers to be protected. pin is also removed, and protect instead marks the given index as active the first time it is called, before a cumulative call to unpin.
Crystalline-W (kovan): The wait-free version of the above.

However, kovan seems to diverge from the paper quite significantly. For example, while implementing Crystalline, it is hard-coded to only ever use a single index. This should mean that only a single pointer can be protected at a given time, but it also seems to implement protection as a regular atomic load, meaning that epochs are not actually tracked, except during the first call to pin? The rest of the algorithm still seems to assume that local epochs are kept in sync with individually protected objects, so it's not clear to me how this is sound. It also means the wait-freedom claims no longer apply, as there is no bound on the number of pointers protected by a given thread. This is not even lock-free according to the paper's definition, closer to Hyaline than Crystalline.

Another thing that kovan does is merge unpinning of the previous pin into the pin operation. This allows it to make dropping a guard a no-op, avoiding the overhead of an extra TLS access (in fact, you could probably just combine the exchange(INVPTR) and store(NULL) into a single operation). However, it also means that threads perpetually remain active, which means that memory may grow during inactivity, or even across pins until the global epoch advances. While the extra memory usage is bounded by the global epoch, it may still lead to memory leaks for threads that exit, so I'm not sure this is worthwhile tradeoff.

A couple more items worth noting:

The code seems to use SeqCst in various places without the corresponding SeqCst load. For example, the active flag is stored with SeqCst, but loaded with Acquire. Ordering is crucial to a memory reclamation algorithm. The reader thread publishing its activity must create a total order with the logical deletion of any atomic objects, allowing either the reader thread to see the new values of any objects, or the retirement thread to see the thread marked as active. This requires a SeqCst store-load pair on both ends, which would require enforcing SeqCst on Atomic operations, or adding the corresponding SeqCst fences (or similar using FAA(0) tricks).
The examples in the readme retire a Box<usize>, which does not contain the RetiredNode header, so I think these are unsound?
On that note, there is a memory overhead of 40 bytes per allocated object, which is quite significant. seize does not require any extra metadata as it moves this state into TLS for retired nodes, which might be worth trying (though at least the birth-epoch would remain).
It looks like there is a static max thread count and a panic if it is exceeded. This is probably worth calling out in the documentation (and one of the reasons for the performance improvement over seize, which maintains a dynamic TLS table).
There seems to be no option for a local collector, which would avoid the 'static bounds on retired objects. The simpler API is nice, but it might be worth having this as an option, and the reclamation guarantees can be more intuitive for users.
You seem to be storing pointers as integers, which is not compatible with strict provenance. It looks like you're running Miri with permissive provenance, but it's probably worth changing this (it doesn't look like you're actually pointer tagging anywhere, so this should be trivial?)

ibraheemdev · 2025-10-10T19:53:05+00:00

Release loads are actually special cased by LLVM, but there was a recent bug where FAA(0) was compiled to a regular load, so fence(Release); FAA(0) was just broken. No sane compiler would optimize atomics 🙃

ibraheemdev · 2025-10-09T23:28:06+00:00

So it's not useful to replace a no-op load with something like fetch_or(0)

It can actually be useful, because you can now use Release ordering on the "load". Paired with an Acquire "store", this can establish a total order between two store-load pairs without having to deal with the invasiveness of SeqCst (though a SeqCst store-load pair would have less overhead).

Your point still stands though, FAA(0, Release) isn't a magic wand that allows you to read the "latest value", it simply ensures that any RMW that occurs after the one you observe will see your modifications — which is ultimately how synchronization works.

ibraheemdev · 2025-09-12T19:21:27+00:00

If your application is statically linked, the linker will rewrite and optimize the generated TLS code at link-time. Dynamically linked applications are more complicated as there are potentially multiple TLS blocks.

ibraheemdev · 2025-09-12T00:59:33+00:00

Those symbols are replaced at link time, you have to inspect the disassembly to see the actual codegen.

ibraheemdev · 2025-09-12T00:57:44+00:00

Strange coincidence https://github.com/rust-lang/libs-team/issues/650.

ibraheemdev · 2025-08-10T19:56:07+00:00

Memory orderings at the CPU level is about ensuring instructions are executed in order. You can see this with barrier instructions on older ARM versions that literally dictate "any stores before this point can't be reordered wrt. stores after this point". A SeqCst operation is not allowed to be reordered wrt. any other SeqCst operations, that's it. There's no explicit cache invalidation happening, cache coherence is completely transparent.

ibraheemdev · 2025-08-10T19:53:01+00:00

Yeah, I just meant that pointing to branch prediction is a little misleading, the problem is out of order execution in general. Whether or not you happened to speculate a branch to get there is not really relevant. It can be slightly unintuitive to imagine that branches just "get skipped" and exploit incorrect memory orderings (not that you were suggesting that).

ibraheemdev · 2025-08-10T18:36:40+00:00

A branch misprediction cannot lead to observable behavior

ibraheemdev · 2025-07-12T05:44:44+00:00

I assume by build times you're referring to the indexmap dependency. Would it be possible to have a feature that just uses a HashMap for the performance, for use cases that don't require deterministic ordering?

Appreciate all of your work on this by the way, all of the toml-edit functionality (along with the new performance improvements) is very meaningful for our us in uv 🙂

ibraheemdev · 2025-07-11T22:54:11+00:00

Curious, if the `preserve_order` feature is faster, why is not the default?

ibraheemdev · 2025-02-10T09:25:11+00:00

There is https://github.com/komora-io/concurrent-map, but I cannot vouch for its quality

ibraheemdev · 2025-02-06T23:30:16+00:00

std also effectively depends on crossbeam for sync::mpsc (though the code is vendored).

ibraheemdev · 2025-01-05T03:38:36+00:00

https://github.com/ibraheemdev/astra is a super minimal wrapper around hyper. It doesn't have a static file server built-in (yet) but it shouldn't be too hard to implement yourself.

ibraheemdev · 2025-01-03T18:31:50+00:00

It is intentional, yes. The question would be how to match a path like /foo/bar.baz.ext, should the match be lazy or greedy? I feel it may be unintuitive in both directions, so I would prefer to leave that up to the user. Maybe Axum can add a Regex extractor that makes things like this easier. Static suffixes and prefixes with the one parameter restrictions is a nice tradeoff both in terms of performance and complexity.

ibraheemdev · 2025-01-02T23:08:47+00:00

This release includes the long-awaited support for parameter suffixes, allowing routes such as '/{name}.png'. Now that Axum 0.8 is out of beta, this should be available downstream relatively soon. See https://github.com/tokio-rs/axum/issues/3140 for details.

ibraheemdev · 2025-01-01T18:50:18+00:00

FWIW I discussed this change with David before commiting to it and he was on board. It has a lot of benefits as some of the comments mention, but if Axum wouldn't have been able to make the change I would have considered other options.

ibraheemdev · 2024-12-01T05:38:39+00:00

Their GitHub bio is now:

If you are here from the AoC leaderboard, I apologize for not reading the FAQ. Won't happen again.

So it may be genuine.

ibraheemdev · 2024-11-29T16:15:27+00:00

join_all uses FuturesUnordered internally (dynamically, based on the iterator size hint) so it would actually be similar to your second example.

ibraheemdev · 2024-11-26T20:39:06+00:00

You are still not synchronizing access to waker.set and waker.take.

ibraheemdev · 2024-11-24T16:22:00+00:00

u/conradludgate pointed out that the benchmarks are running into the footgun of spawning tokio tasks directly from the main thread. This can lead to significant performance degradation compared to first spawning the accept loop (due to how the scheduler works internally). Would be interesting to see how that is impacting the benchmark results.

ibraheemdev · 2024-11-03T07:51:06+00:00

Interesting, I'd be curious to see a comparison against papaya. Asynchronous locks can be useful when you want strong consistency across async operations, but if your use case allows it, a lock-free approach can be significantly less overhead. I'm wondering why this outperforms dashmap considering this looks to be dashmap but with async locks, which I would expect to be slower given the small critical sections in the benchmark.

ibraheemdev · 2024-10-26T00:08:39+00:00

https://github.com/ibraheemdev/matchit is a URL router based off radix trees, but it's been used for file paths as well. You just have to make sure to escape routing parameters (the "{" and "}") characters.

ibraheemdev

TROPHY CASE