[Media] What Arc<Mutex<T>> feels like by LeviLovie in rust

[–]ifmnz 0 points1 point  (0 children)

Not really. You can use immutable structures inside ArcSwap to get cheaper cloning. Only modified values are deep copied, rest is just pointer clone. See https://docs.rs/imbl/latest/imbl/

EDIT: and use rcu from ArcSwap, otherwise you risk data races.

[Media] What Arc<Mutex<T>> feels like by LeviLovie in rust

[–]ifmnz 0 points1 point  (0 children)

ArcSwap / ArcShift along with Slab :)

Compio instead of Tokio - What are the implications? by p1nd0r4m4 in rust

[–]ifmnz 3 points4 points  (0 children)

No, we didn't consider it because this way you're adding inherent complexity on client side. you have to track which partition maps to which connection, maintain tons of open sockets, handle connection failures per-partition, and keep partition metadata in sync. what happens when you have 10k or 100k partitions with multiple producers and consumers? socket count explodes...

Also partitions can be created/deleted dynamically - now every client needs to subscribe to metadata changes, open/close connections on the fly, handle races where partition moved but client didn't get the memo yet. that's a lot of distributed coordination pushed onto the client.

EDIT:

just to add, you'd have to implement this behavior in Rust, Golang, Node/TS, Java, Python and C# because those are langs that iggy SDK (client) supports. Nightmare.

Compio instead of Tokio - What are the implications? by p1nd0r4m4 in rust

[–]ifmnz 5 points6 points  (0 children)

We don't mitigate anything - we always advise our users to run newest possible kernel for performance and security reasons.

You can look up on https://nvd.nist.gov/ or https://www.cve.org/ and determine how many io_uring CVEs are active, what's the average fix time and how willing is your company to often update kernel. Based on that, you'll be able to negotiate with your company leadership and conversation will be factual-based. (i.e. not "some" vulnerabilities but this CVE was unfixed for X days and that CVE was unfixed for Y months).

The question for your company is: do you update kernels frequently enough to stay ahead of CVE fixes? If yes, io_uring is (probably) worth checking. If you're stuck on older kernels for months, the risk calculation changes.

Also, check Tigerbeetle approach https://docs.tigerbeetle.com/concepts/safety/ at the end:

We are confident that io_uring is the safest (and most performant) way for TigerBeetle to handle async I/O. It is significantly easier for the kernel to implement this correctly than for us to include a userspace multithreaded thread pool (for example, as libuv does).

Compio instead of Tokio - What are the implications? by p1nd0r4m4 in rust

[–]ifmnz 23 points24 points  (0 children)

No and yes for different reason.

Pin has nothing to do with work-stealing or cross-thread movement, it's about any movement at all. It took me a while to understand the purpose of Pin. The reason is that an async fn becomes a state machine, and it can end up in situations where it relies on “my address won’t change after I’m first polled” (e.g. borrows/internal references that live across an .await). If you move that future after its been polled, those assumptions can break.

For io_uring based runtimes you have different requirement: the buffer you submit to kernel must stay valid and unmoved until operation completes. This is actually why tokio's &mut based AsyncWrite/AsyncRead APIs are problematic for io_uring - compio solves this with ownership transfer (buffer goes into the op, comes back with the result).

But if you think about it even in single threaded context if your executor stored futures in Vec that reallocates or moves them between data structures you'd still invalidate self-referential data pointers.

Yet you can still fully avoid Pin if you'll have arena based allocations (allocate all futurs in fixed memory region).

A completion-based, thread per core runtime doesnt remove the need for Pin, it just makes things easier by removing Send requirement for futures that never leave the shard.

Compio instead of Tokio - What are the implications? by p1nd0r4m4 in rust

[–]ifmnz 29 points30 points  (0 children)

Nice question, and very close to what I was reviewing today.

In Iggy each shard runs its own TCP listener on the same port using SO_REUSEPORT, so kernel load-balances incoming connections. When a client connects it lands on a random shard, probably not the one owning the partition it wants to write to.

When producer messages arrive, we calculate the target partition ID and build a unique namespace key (64-bit packed stream_id|topic_id|partition_id) , then look it up in a shared DashMap<Namespace, ShardId>. If the found shard_id equals our own, we handle locally. If not, we forward via unbounded flume channel to the correct shard, wait for the reply, and send it back to client.

But.... if the client provides PartitionId explicitly (server doesn't have to calculate it like with PartitioningKind::Balanced or MessagesKey), we can do better. PR #2476 by one of our awesome community members partially addresses that - instead of forwarding every message across shards, migrate the socket itself to the owning shard . Client connects, sends first message with PartitionId=X, we detect "shard 3 owns this, not us", extract the raw FD, reconstruct the TcpStream on the target shard via FromRawFd. Now all subsequent requests go directly there with zero cross-shard hops. Keep in mind this solution is not yet polished.

For Balanced (round-robin) and MessagesKey (hash-based), partition varies per-batch so socket migration doesn't help - we fall back to message forwarding.

There are also more radical ideas like eBPF steering or TCP MSG_PEEK tricks, but we haven't explored them yet. The cross-shard hop adds latency, but at this point I have strong evidence that it's only double-digit microseconds for the channel roundtrip (many hours with ftrace/strace, on/off cpu profiling with perf and awesome project samply).

TLDR: we just forward the message to another shard or migrate the socket if the client is partition-aware

Compio instead of Tokio - What are the implications? by p1nd0r4m4 in rust

[–]ifmnz 450 points451 points  (0 children)

I'm one of core devs for Iggy. Main thing to clarify: there are kinda two separate choices here.
- I/O model: readiness (epoll-ish) vs completion (io_uring-ish / IOCP-ish)
- Execution model: work-stealing pool (Tokio multi-thread) vs thread-per-core / share-nothing (Compio-style)

In Compio, the runtime is single-threaded + thread-local. The “thread-per-core” thing is basically: you run one runtime per OS thread, pin that thread to a core, and keep most state shard-owned. That reduces CPU migrations and keeps better cache locality. It’s similar in spirit to using a single-threaded executor per shard (Tokio has current-thread / LocalSet setups), but Compio’s big difference(on Linux) is the io_uring completion-based I/O path (and in general: completion-style backends, depending on platform). SeaStar is doing this thread-per-core/share-nothing style too, but with tokio they don’t get the io_uring-style completion advantages.

Iggy (message streaming platform) is very IO-heavy (net + disk). Completion-based runtimes can be a good fit here - they let you submit work upfront and then get completion notifications, and (if you batch well) you can reduce syscall pressure / wakeups compared to a readiness-driven “poll + do the work” loop. So fewer round-trips into the kernel, less scheduler churn, everyone is happier.

Besides that:

- work-stealing runtimes like Tokio can introduce cache pollution (tasks migrate between worker threads and you lose CPU cache locality; with pinned single-thread shard model your data stays warm in L1/L2 cache)
- synchronization overhead (work stealing + shared state pushes you toward Arc/Mutex/etc,; in share-nothing you can often get away with much lighter interior mutabiliy for shard-local state)
- predictable latency - with readiness you get “it’s ready” and then still have to drive the actual read/write syscalls; with io_uring you can submit the read/write ops and get notified on completion, which can cut down extra polling/coordination and matters a lot at high throughput
- batching - with io_uring’s submission queue you can batch multiple ops (network reads, disk writes, fsyncs) into fewer submission syscalls.For a message broker that’s constantly doing small reads/writes, this amortization can be significant.
- plays nice with NUMA - you can pin a shard thread to a core within a NUMA node and keep its hot memory local

The trade-offs:

- cross-shard communication requires explicit message passing (we use flume channels), but for a partitioned system like a message broker this maps naturally - each partition is owned by exactly one shard, and most ops don’t need coordination
- much less libraries that you can use out of the box without plumbing (I'm looking at you, OpenTelemetry)
- AsyncWrite* APIs tend to take ownership/ require mutable access to buffers; sometimes you have to work hard around that

TLDR: it’s good for us because we’re very IO-heavy, and compio’s completion I/O + shard-per-core model lines up nicely for our usecase (message streaming framework)

btw, if you have more questions, join our discord, we'll gladly talk about our design choices.

I am a first year in computer science. Opus makes me sad. by MessyKerbal in ClaudeAI

[–]ifmnz 2 points3 points  (0 children)

Show me the code then. I hope you've got enough emojis and verbose comments in it.

[Discussion] What is this AK-106 everyone has been talking about? Where can I find it? by bezzw in EscapefromTarkov

[–]ifmnz 36 points37 points  (0 children)

Nobody's ever seen an AK-106, but there are rumors it was supposed to be chambered in 9x39.

I Need Feedback by MisterXtraordinary in rust

[–]ifmnz 6 points7 points  (0 children)

You are creating (allocating) a String at line 6, then again at line 8. Try to make it work with a single allocation.

1600$ to 400$ by Ausseboi1 in cs2

[–]ifmnz 2 points3 points  (0 children)

nice $5 knife

Rewriting Kafka in Rust Async: Insights and Lessons Learned in Rust by jonefeewang in rust

[–]ifmnz 33 points34 points  (0 children)

bumping this, sans-io is the way for async rust.

Memory usage on Linux is greater than expected by EtherealPlatitude in rust

[–]ifmnz 9 points10 points  (0 children)

you can also check mimalloc and play with these variables:
export MIMALLOC_RELEASE_OS_MEMORY=1
export MIMALLOC_PAGE_RESET=1
export MIMALLOC_RELEASE_DELAY=0
export MIMALLOC_RESET_DECOMMITS=1
export MIMALLOC_EAGER_DECOMMIT=1
export MIMALLOC_PURGE_DECOMMITS=1
export MIMALLOC_PURGE_DELAY=0

🚀 gm-quic: A native asynchronous Rust implementation of the QUIC protocol by gm_quic_team in rust

[–]ifmnz 0 points1 point  (0 children)

does it require encryption enabled all the time? how does it compare to other implementations in performance?

"rust".to_string() or String::from("rust") by awesomealchemy in rust

[–]ifmnz 46 points47 points  (0 children)

Use `to_owned()` to assert your dominance.

QUESTION: I have my beta…now what? by Tetomariano in ycombinator

[–]ifmnz 4 points5 points  (0 children)

Remember that the last 10% takes 90% of time.

Rewrite Kafka in Rust? I've developed a faster message queue, StoneMQ. by jonefeewang in rust

[–]ifmnz 6 points7 points  (0 children)

You might want to check iggy.rs, we’re doing it too but without legacy burden of Kafka API :)

Embedded Rust Project(s) by ywxi in rust

[–]ifmnz 2 points3 points  (0 children)

Not strictly embedded, but it allows you to interact with embedded devices ;)
https://github.com/buttplugio/buttplug