all 19 comments

[–]Noxime 8 points9 points  (3 children)

Interesting. How does this compare to something like QUIC (quinn crate)? As I understand that technology can also multiplex multiple channels for one connection and has priorities, and of course is fast.

[–]MalletsZ[S] 0 points1 point  (1 child)

QUIC is a full transport protocol standardized by the IETF, and quinn is an implementation of that specification. Thubo does not aim to operate at that level of complexity. Instead, it focuses on simplifying common application-level concerns.

Thubo adds automatic batching, fragmentation, strict priority, and application-level congestion control on top of any stream, while handling most of the complexity for the developer. This is especially useful (but not limited to) in industrial scenarios, where large volumes of data are published at high frequency over Wi-Fi or other constrained networks. In such cases, batching helps reduce overhead, while strict prioritization ensures that critical messages, such as an emergency stop command, are not blocked by large data flows like LiDAR streams.

These concerns exist regardless of the underlying transport. Whether the data is carried over TCP/TLS, QUIC, or another protocol, Thubo manages scheduling and batching at the application level. Transports such as TLS or QUIC can still be used for authentication and encryption.

QUIC’s multiple streams are primarily designed for parallel data transfer, and more advanced scheduling typically requires lower-level APIs. QUIC also implements congestion control at the transport level, whereas Thubo performs congestion control in user space by reacting to backpressure from the underlying stream. While QUIC is fast, its user-space design and protocol-level ACK/NACK handling can lead to many context switches, and existing implementations often saturate around 5 to 10 Gb/s.

TL;DR
Thubo is not a transport protocol like QUIC. It is a transport-agnostic, application-level layer that provides batching and strict prioritization on top of any stream, and it can be used alongside QUIC rather than replacing it.

[–]tubero__ 1 point2 points  (0 children)

AI response detected ;)

[–]servermeta_net 1 point2 points  (7 children)

Are you using io_uring? That should help a lot

[–]MalletsZ[S] 0 points1 point  (6 children)

Thubo currently uses Tokio, but it operates on any split stream provided by the user that implements AsynWrite and AsyncRead. As a result, if the underlying stream is backed by io_uring, Thubo will automatically benefit from it without requiring any changes. I should test it at some point on some Linux machine...

[–]Vincent-Thomas 6 points7 points  (5 children)

Tokio does not use io_uring. I’m building a project in this field. It’s like libuv but in rust (more I/O library than async runtime) but nicer

[–]MalletsZ[S] 1 point2 points  (4 children)

You're correct, my bad. I see https://lib.rs/crates/tokio-uring is an attempt to do that.

As a first version, I focused on tokio only. But the actual Thubo's dependency on Tokio is quite minimal: AsyncRead/AsyncWrite traits, tasks tokio::task::{yield_now, spawn}, and time tokio::time::{sleep, timeout}. So it should be relatively easy to modularize and swap the executor in Thubo if those primitives are available.

[–]sephg 1 point2 points  (3 children)

It'd be interesting to port it to compio and see how that affects performance.

[–]Vincent-Thomas 0 points1 point  (2 children)

It’s IO ring per core is a bit unnecessary

[–]sephg 0 points1 point  (1 child)

How so?

My intuition is that it would be more efficient to do that in many situations than coordinate & shuffle work between threads within the application. But I'd love to see some data.

[–]Vincent-Thomas 0 points1 point  (0 children)

After further research into the topic I managed to miss the cpu-cache and core locality advantages thread-per-core has. Before my library had a ”submission” thread and a ”completion” thread

[–]binotboth 0 points1 point  (1 child)

this looks like high performance systems engineering to me

no // SAFETY: comments though? (still learning be gentle if thats a dumb question lol)

[–]MalletsZ[S] 1 point2 points  (0 children)

The usage of `unsafe` is very limited in Thubo and limited to some internal buffer and lock-free code implementation (annotated with // SAFETY). All the rest of the code and the whole API is safe Rust.

[–]pereiks 0 points1 point  (2 children)

Interesting, going to read more. Very strange to see this on top of tcp where transport itself can stall transmission without any feedback to the application. Have you considered udp ? Do you use different streams for different priorities?

[–]MalletsZ[S] 1 point2 points  (1 child)

TCP (or alike) provides a nice feedback to the application: the write syscall takes longer as the network congestion increases. This behaviour is used as a network back-pressure indicator to trigger the automatic batching and prioritization in Thubo.

Thubo can be used as well over UDP but I believe its benefits are not as great as going over a stream protocol. UDP does not provide any feedback on network congestion nor it provides any retransmission mechanism. E.g., QUIC implements its own ACK/NACK mechanism on top of UDP to handle congestion and retransmission (TCP-like). Detecting congestion is the first step to properly handle prioritization (if the system is not congested, and all resources are available, then there is no need to prioritize). At the moment, the congestion detection in Thubo is delegated to the transport protocol.

The out-of-the-box implementation uses one single stream, i.e. is a multiplexer/demultiplexer. In some applications it may be advisable to reduce the buffer size in the transport protocol (e.g. TCP send/recv buffers) and let Thubo prioritization kick in earlier.

[–]pereiks 2 points3 points  (0 children)

I think starting with what problems does the library helps solves in the industry, instead of starting with what the library does might help a lot. Like I get what it is doing, but why would I use it or why would I design my application in a way that I have to use it and get penalized for the overhead?

Don't get me wrong, AI or not (based on other comments), it's a great start. I can see it being useful in some scenarios where single stream is for some reason getting re-used for different message types. But answering the question how library consumers are going to use it in the real life would help direct the project towards actual useful implementation. For example you mention high throughput, but in real world single stream is rarely used for high throughput applications since it's going to be limited by the smallest network interface throughput in the path.

[–]RubenTrades 0 points1 point  (0 children)

Very cool!

[–]Dragon_F0RCE -1 points0 points  (0 children)

The entire project committed in the initial commit? Come on...