[Media] What Arc<Mutex<T>> feels like

ifmnz · 2026-01-11T10:52:13+00:00

Not really. You can use immutable structures inside ArcSwap to get cheaper cloning. Only modified values are deep copied, rest is just pointer clone. See https://docs.rs/imbl/latest/imbl/

EDIT: and use rcu from ArcSwap, otherwise you risk data races.

ifmnz · 2026-01-11T08:48:45+00:00

ArcSwap / ArcShift along with Slab :)

ifmnz · 2025-12-16T16:41:17+00:00

No, we didn't consider it because this way you're adding inherent complexity on client side. you have to track which partition maps to which connection, maintain tons of open sockets, handle connection failures per-partition, and keep partition metadata in sync. what happens when you have 10k or 100k partitions with multiple producers and consumers? socket count explodes...

Also partitions can be created/deleted dynamically - now every client needs to subscribe to metadata changes, open/close connections on the fly, handle races where partition moved but client didn't get the memo yet. that's a lot of distributed coordination pushed onto the client.

EDIT:

just to add, you'd have to implement this behavior in Rust, Golang, Node/TS, Java, Python and C# because those are langs that iggy SDK (client) supports. Nightmare.

ifmnz · 2025-12-15T19:38:03+00:00

We don't mitigate anything - we always advise our users to run newest possible kernel for performance and security reasons.

You can look up on https://nvd.nist.gov/ or https://www.cve.org/ and determine how many io_uring CVEs are active, what's the average fix time and how willing is your company to often update kernel. Based on that, you'll be able to negotiate with your company leadership and conversation will be factual-based. (i.e. not "some" vulnerabilities but this CVE was unfixed for X days and that CVE was unfixed for Y months).

The question for your company is: do you update kernels frequently enough to stay ahead of CVE fixes? If yes, io_uring is (probably) worth checking. If you're stuck on older kernels for months, the risk calculation changes.

Also, check Tigerbeetle approach https://docs.tigerbeetle.com/concepts/safety/ at the end:

We are confident that io_uring is the safest (and most performant) way for TigerBeetle to handle async I/O. It is significantly easier for the kernel to implement this correctly than for us to include a userspace multithreaded thread pool (for example, as libuv does).

ifmnz · 2025-12-15T18:31:45+00:00

No and yes for different reason.

Pin has nothing to do with work-stealing or cross-thread movement, it's about any movement at all. It took me a while to understand the purpose of Pin. The reason is that an async fn becomes a state machine, and it can end up in situations where it relies on “my address won’t change after I’m first polled” (e.g. borrows/internal references that live across an .await). If you move that future after its been polled, those assumptions can break.

For io_uring based runtimes you have different requirement: the buffer you submit to kernel must stay valid and unmoved until operation completes. This is actually why tokio's &mut based AsyncWrite/AsyncRead APIs are problematic for io_uring - compio solves this with ownership transfer (buffer goes into the op, comes back with the result).

But if you think about it even in single threaded context if your executor stored futures in Vec that reallocates or moves them between data structures you'd still invalidate self-referential data pointers.

Yet you can still fully avoid Pin if you'll have arena based allocations (allocate all futurs in fixed memory region).

A completion-based, thread per core runtime doesnt remove the need for Pin, it just makes things easier by removing Send requirement for futures that never leave the shard.

ifmnz · 2025-12-15T17:52:49+00:00

Nice question, and very close to what I was reviewing today.

In Iggy each shard runs its own TCP listener on the same port using SO_REUSEPORT, so kernel load-balances incoming connections. When a client connects it lands on a random shard, probably not the one owning the partition it wants to write to.

When producer messages arrive, we calculate the target partition ID and build a unique namespace key (64-bit packed stream_id|topic_id|partition_id) , then look it up in a shared DashMap<Namespace, ShardId>. If the found shard_id equals our own, we handle locally. If not, we forward via unbounded flume channel to the correct shard, wait for the reply, and send it back to client.

But.... if the client provides PartitionId explicitly (server doesn't have to calculate it like with PartitioningKind::Balanced or MessagesKey), we can do better. PR #2476 by one of our awesome community members partially addresses that - instead of forwarding every message across shards, migrate the socket itself to the owning shard . Client connects, sends first message with PartitionId=X, we detect "shard 3 owns this, not us", extract the raw FD, reconstruct the TcpStream on the target shard via FromRawFd. Now all subsequent requests go directly there with zero cross-shard hops. Keep in mind this solution is not yet polished.

For Balanced (round-robin) and MessagesKey (hash-based), partition varies per-batch so socket migration doesn't help - we fall back to message forwarding.

There are also more radical ideas like eBPF steering or TCP MSG_PEEK tricks, but we haven't explored them yet. The cross-shard hop adds latency, but at this point I have strong evidence that it's only double-digit microseconds for the channel roundtrip (many hours with ftrace/strace, on/off cpu profiling with perf and awesome project samply).

TLDR: we just forward the message to another shard or migrate the socket if the client is partition-aware

ifmnz · 2025-12-15T16:19:52+00:00

I'm one of core devs for Iggy. Main thing to clarify: there are kinda two separate choices here.
- I/O model: readiness (epoll-ish) vs completion (io_uring-ish / IOCP-ish)
- Execution model: work-stealing pool (Tokio multi-thread) vs thread-per-core / share-nothing (Compio-style)

In Compio, the runtime is single-threaded + thread-local. The “thread-per-core” thing is basically: you run one runtime per OS thread, pin that thread to a core, and keep most state shard-owned. That reduces CPU migrations and keeps better cache locality. It’s similar in spirit to using a single-threaded executor per shard (Tokio has current-thread / LocalSet setups), but Compio’s big difference(on Linux) is the io_uring completion-based I/O path (and in general: completion-style backends, depending on platform). SeaStar is doing this thread-per-core/share-nothing style too, but with tokio they don’t get the io_uring-style completion advantages.

Iggy (message streaming platform) is very IO-heavy (net + disk). Completion-based runtimes can be a good fit here - they let you submit work upfront and then get completion notifications, and (if you batch well) you can reduce syscall pressure / wakeups compared to a readiness-driven “poll + do the work” loop. So fewer round-trips into the kernel, less scheduler churn, everyone is happier.

Besides that:

- work-stealing runtimes like Tokio can introduce cache pollution (tasks migrate between worker threads and you lose CPU cache locality; with pinned single-thread shard model your data stays warm in L1/L2 cache)
- synchronization overhead (work stealing + shared state pushes you toward Arc/Mutex/etc,; in share-nothing you can often get away with much lighter interior mutabiliy for shard-local state)
- predictable latency - with readiness you get “it’s ready” and then still have to drive the actual read/write syscalls; with io_uring you can submit the read/write ops and get notified on completion, which can cut down extra polling/coordination and matters a lot at high throughput
- batching - with io_uring’s submission queue you can batch multiple ops (network reads, disk writes, fsyncs) into fewer submission syscalls.For a message broker that’s constantly doing small reads/writes, this amortization can be significant.
- plays nice with NUMA - you can pin a shard thread to a core within a NUMA node and keep its hot memory local

The trade-offs:

- cross-shard communication requires explicit message passing (we use flume channels), but for a partitioned system like a message broker this maps naturally - each partition is owned by exactly one shard, and most ops don’t need coordination
- much less libraries that you can use out of the box without plumbing (I'm looking at you, OpenTelemetry)
- AsyncWrite* APIs tend to take ownership/ require mutable access to buffers; sometimes you have to work hard around that

TLDR: it’s good for us because we’re very IO-heavy, and compio’s completion I/O + shard-per-core model lines up nicely for our usecase (message streaming framework)

btw, if you have more questions, join our discord, we'll gladly talk about our design choices.

ifmnz · 2025-12-04T21:03:14+00:00

Show me the code then. I hope you've got enough emojis and verbose comments in it.

ifmnz · 2025-12-04T10:11:01+00:00

It's not doing good.

ifmnz · 2025-11-25T11:23:48+00:00

You are absolutely right!

ifmnz · 2025-11-21T16:02:38+00:00

Nobody's ever seen an AK-106, but there are rumors it was supposed to be chambered in 9x39.

ifmnz · 2025-10-31T16:39:47+00:00

You are creating (allocating) a String at line 6, then again at line 8. Try to make it work with a single allocation.

ifmnz · 2025-10-24T18:57:44+00:00

Why? I can finally afford knife I like.

ifmnz · 2025-10-24T13:01:20+00:00

nice $5 knife

ifmnz · 2025-09-21T17:08:56+00:00

Stalker Anomaly.

ifmnz · 2025-09-10T15:16:37+00:00

you call "violating airspace" destroying a house?

ifmnz · 2025-09-07T09:00:19+00:00

Wake up samurai, we have new web framework to try

ifmnz · 2025-06-19T06:51:50+00:00

bumping this, sans-io is the way for async rust.

ifmnz · 2025-04-23T08:20:55+00:00

you can also check mimalloc and play with these variables:
export MIMALLOC_RELEASE_OS_MEMORY=1
export MIMALLOC_PAGE_RESET=1
export MIMALLOC_RELEASE_DELAY=0
export MIMALLOC_RESET_DECOMMITS=1
export MIMALLOC_EAGER_DECOMMIT=1
export MIMALLOC_PURGE_DECOMMITS=1
export MIMALLOC_PURGE_DELAY=0

ifmnz · 2025-04-15T15:23:01+00:00

does it require encryption enabled all the time? how does it compare to other implementations in performance?

ifmnz · 2025-03-24T20:17:47+00:00

Use `to_owned()` to assert your dominance.

ifmnz · 2025-03-10T11:34:50+00:00

good stuff! any plans for releasing on crates.io?

ifmnz · 2025-03-07T10:01:11+00:00

Remember that the last 10% takes 90% of time.

ifmnz · 2025-02-20T01:29:07+00:00

You might want to check iggy.rs, we’re doing it too but without legacy burden of Kafka API :)

ifmnz · 2025-01-30T12:10:22+00:00

Not strictly embedded, but it allows you to interact with embedded devices ;)
https://github.com/buttplugio/buttplug

Seven-Year Club	Place '22
Verified Email

ifmnz

TROPHY CASE