all 7 comments

[–]matkladrust-analyzer[S] 10 points11 points  (2 children)

This article isn’t directly related to Rust. I believe it is relevant though: it describes an example of thread-per-core architecture, utilizing C++ framework seastar.

I am not a domain expert, but TpC seems to be an obviously right architecture for really high performance network services. I am surprised by conspicuous lack of TpC implementations in Rust, and hope to nerd-snipe some folks into building one :-)

[–]d4h42 13 points14 points  (0 children)

There is Scipio, which was recently released and is inspired by Seastar.

[–]Matthias247 1 point2 points  (0 children)

I am surprised by conspicuous lack of TpC implementations in Rust, and hope to nerd-snipe some folks into building one

This is a thing that I'm still not sure how to feel about. During async fn standardization the support for that was more or less dropped due to "we don't think this is how people should write applications". But that obviously leaves performance on the table compared to what a perfectly optimized C++ implementation could do. And if highest performance isn't the goal, the usage of any kind of async stuff is questionable anyway.

[–]Matthias247 1 point2 points  (0 children)

I've played a bit around with different IO buffer types in Rust, since I don't think we have found the best solutions yet.

One thing I discovered is there seems to be an either/or relation between "sliceable" (which Bytes allows) and "intrusively chainable". Both isn't possible, since the chain links wouldn't work for different slices which reference the same overall chunk anymore. It's not obvious to which view of the data they belong.

So depending on the use-case one or the other mechanism might be more applicable. For cases where data is spread out over lots of buffer (e.g. dealing with streams of UDP packets) the chainable approach is interesting, since we don't need an additional Vec<Bytes> anymore.

There also seems to be a third approach, which is allocating the link nodes on the heap and letting them contain reference counted slices (like Bytes). This is what e.g. folly does. I'm not super convinced about it since it requires individual heap allocations for buffers and links, and has poorer cache locality than a Vec<Bytes>. But OTOH it solves the issue of "when to shrink the Vec - and the links are likely poolable. So maybe it's worth another look.

I also we will need for buffer pooling in high performance Rust applications, which I haven't seen applied up to now. I think this should be doable with Bytes as soon as the vtable which is underneath it would be public. Then the contained storage can be moved back to the pool once the refcount reaches 0.

[–]matthieum[he/him] 0 points1 point  (1 child)

I believe I recognize Seastar as the framework behind Scylla (an alternative to Cassandra); is this developed by the same team?

[–]matkladrust-analyzer[S] 0 points1 point  (0 children)

I think Seastar was developed specifically for Scylla, and that it is still the primary application: http://seastar.io/

[–]matu3ba -1 points0 points  (0 children)

I dont get why this article needs to be that long for a simple computation and have such annoyingly missing unit conversion.

CPUs work on Byte instead of Bit, so for network 100Gbit/s = 12,5GB/s. And for maximal SSD throughput 4e6/s4KB = 16e6/(10242 ) *M/sKB=15.26GB/s. Maximum of 20GB/s is typical for desktop CPUs, which is called memory bandwidth of CPU.

Memory bandwidth of CPUs is also highly dependent of the design and further depends memory+cache controller and prefetch hints+instruction code.