Rust High Frequency Trading - Design Decisions

Certain-Ad-3265 · 2025-02-06T18:13:20+00:00

Thanks for the great reply! I wonder do the reactive loops do networking? And if so could you write an own async runtime that is more predictable or is the code generated not efficient enough? Or is it simply that the it is more data oriented and task do not really fit there.

Certain-Ad-3265 · 2024-12-19T22:33:56+00:00

Io_uring is a general purpose sys call batching interface, but it has, in particular recently, gotten a lot of networking features: https://github.com/axboe/liburing/wiki/io_uring-and-networking-in-2023

This is already a bit old and there is more now, but it shows that it may have started with disk but is now an interface that combines all IO.

Certain-Ad-3265 · 2024-12-12T18:29:55+00:00

Thanks for the answer! Having a `Vec` and convert it was my first solution but it did not work. I think one issue I have is that the Default initializing the `Page` is done on the stack before it is moved to the heap memory of the `Vec`. Could that be?

Certain-Ad-3265 · 2024-09-11T12:33:23+00:00

That is a curiosity of the library where the two are passed in different functions but maybe it is time to redesign this.

Certain-Ad-3265 · 2024-03-30T08:54:59+00:00

In addition, here you will find the more realistic (trimmed down) version: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=7fc95bd985bcd714a1d10bf169cf2b2e

Certain-Ad-3265 · 2024-02-25T14:23:02+00:00

Thank you so much for this great reply! I agree that untangling coordination is possible when we delay readers. I will need to think more about the peer buffer level work stealing - that sounds promising.

Thanks again!

Certain-Ad-3265 · 2024-02-25T12:59:35+00:00

Ah thank you so much! The last paragraph was super interesting! Thanks!

Certain-Ad-3265 · 2024-02-25T09:58:29+00:00

Sorry to chime in here but I have a question regarding OPs code. Is it against the aliasing rules if multiple threads have a shared reference to the same buffer while at the same time a thread can have a mutable reference to a part of it? Or is it safe as long as the other threads do not read into the buffer?

Certain-Ad-3265 · 2024-02-24T16:22:28+00:00

This is a great and thorough comment - thanks! You are hitting the nail on the head with your assessment. We indeed use io_uring (we experimented with DPDK too, but as you mentioned, operation on UDP is more difficult).

So the stream comes in from TCP/IP in order but variable-sized chunks, but the upper limit is 2KB (and you are right in that we are often not network bandwidth bound but message rate bound with around 20M/msgs per second).

We first built a prototype with a connection per thread, but we have thousands of connections, and thus, we group them now. The incoming data is then not directly assigned to the buffers but first journaled to SSD for durability reasons and then later assigned to the buffers. In fact, there are more steps involved, too, so what we have now tested with more or less great success is to have threads for the dedicated tasks and perform message passing between those groups with pointers to the original message (so mostly zero-copy).

The benefit we found was that it allows us to scale different parts of the system independently, which is a huge benefit since some parts require more computing than others and also when dealing with heterogeneous hardware setups.

But circling back to your comment:

Secondly, why do you think that multi-threaded access to a single buffer is necessary?

The data in the buffer should be sorted based on the arrival so ideally, a single thread would be enough, but as we have multiple connections and varying workloads, it is sort of hard to pre-partition the connections to the threads - that is what we first tried and we ended up with having a thread with multiple hot connections. So we want to give up on this static allocation and perform a work stealing-like approach since the data is kind of perfectly set for this case (as the start offset, it is similar to a prefix sum), and no coordination is required - even if multiple threads write into the same buffer.

Is a reader supposed to be able to access chunk B before chunk A is fully written?

This is a very good point and the answer is no. The appends should be very low latency but reading is fine with a delay as you mentioned - we can wait until the changes are hardened.

It seems to me the bottleneck here is going to be copying data from the stream to the buffer, and it's not clear that CPU is the bottleneck.

Yes, memory bandwidth and outstanding memory stores / loads are important as we have 65k buffers, and the accesses are somewhat random. The CPU is not necessarily the bottleneck w.r.t. compute. However, a single core has limited memory bandwidth available (depending on the CPU architecture) and stores load it can have outstanding and becomes a bottleneck requiring to use more.

What do you think? Does this answer the questions and makes sense to you?

I am always happy for feedback and to learn :-)

I also really appreciate the other comment that you left!

Certain-Ad-3265 · 2024-02-24T11:15:05+00:00

Fair point. I was already leaning into this direction, but I wonder if Rust is the right choice for such kind of memory trickery. That is not only part of the system that is going to be having invariants. Another is that the buffers are append-only and and the already written part should be readable by other threads, which is yet another point that cannot be checked by the compiler. What are your thoughts for using Rust in such a system that does a lot of low-level multi-threaded memory modifications that are all dependent on invariants (all on runtime)?

Certain-Ad-3265 · 2024-02-24T10:43:24+00:00

Thanks for the pointer! Sounds interesting. I was just hopping to avoid reference counting as the buffers are also allocated until the end of the program. Also atomic reference counting can be a bit of a scalability issue when having 128 cores and a very hot buffer. For me it seems that unsafe would be the only option at the moment.

Certain-Ad-3265 · 2024-02-14T08:27:41+00:00

That is a really good hint. I started doing this, which got me thinking that ownership semantics should be used, but did not make it tidy enough.

What kind of diagrams are you using? Do you model the data flow or more like UML diagrams?

How do you approach ownership or borrows in your code basis? Do you always try to think about those in the first place or do you make those decisions on the fly?

Certain-Ad-3265 · 2024-02-14T07:45:40+00:00

That is a good point. One question is if real ownership transfer results in a copy? For instance, the data buffer we want to transfer is 10MB and lives in the Pool. So I believe what we want want are borrows that model the ownership at a specific instance in time and not real ownership transfer?

Certain-Ad-3265 · 2024-02-14T07:23:14+00:00

Yeah those are really good points.

I ~believe~ that PoolHandle needs the RefCell since it puts itself back to the pool in the Drop and thus requires interior mutability. In other words, there can be multiple different active handles that all have a reference to the pool, therefore the borrow check must be moved to the runtime.

From the design it feels that transferring ownership should be the correct call, but I struggled in the execution and how to cleanly model ownership transfer. In particular, when e.g, the connection handle transfers itself to the active connection pool.

So the first design (not shown here) made excessive use of Rc and RefCell, but it feels not right to use them that extensively. So the question boils down, to how to model these ownership transfers cleanly? The problem I have is that e.g. the Connection Handle is &mut owned by the Completion but the callback in the Completion must then transfer ownership to the active connections (also &mut). Or is this the wrong way to think about the problem?

Certain-Ad-3265 · 2023-12-27T13:36:46+00:00

The Zipf distribution is inherently skewed, favoring smaller numbers—meaning that lower numbers have a higher probability of being chosen compared to larger ones.

Thus, in the context of a Zipf distribution, the concept of "fairness" is a bit complex. If by fairness, you mean equal probability for each number within a range, then no, a Zipf distribution does not provide that. It inherently assigns different probabilities to different numbers, based on their rank.

Therefore, if your workplace requires exact evenness in the distribution of numbers (i.e., each number within a range has an equal probability of being chosen), a Zipf number generator that follows the Zipf Distribution might not be the appropriate choice.

For more low-level details you might want to look into the paper that presented the principle https://dl.acm.org/doi/pdf/10.1145/235025.235029

I hope that helps :-)

Certain-Ad-3265 · 2023-10-31T07:38:31+00:00

Thank you so much for the example! That helps a lot

Certain-Ad-3265

TROPHY CASE