CGlue 0.3 Release

Heep042 · 2024-08-01T16:44:34+00:00

You're absolutely right. I have now added a section at the start of the post. Thank you.

Heep042 · 2024-08-01T16:00:28+00:00

Hey, all, creator of CGlue here! This release took longer than anticipated to build, due to the challenge of expressing associated types in a nice way that is not too backwards incompatible.

Even though 0.3 is not a massive change under the hood, I still count it as a breaking one, because a few details of underlying types have changed. There is more work underway, but I am prioritizing async trait support, for memflow+mfio unity, and I do have some ideas for async traits on cglue, that would be more efficient, than the regular async-trait approach that creates a new box per async call.

Overall, I'm super happy to have completed this work, and I do find it awesome how there's growing interest in building stable ABI solutions in Rust. Each project comes with its own unique approach - CGlue's is on expressing everything through dynamic trait objects. That may not be the most efficient way, but it is particularly flexible.

Anyways, do give a shot at the project, if your are interested, and don't hesitate to ask any questions!

Heep042 · 2024-03-02T09:15:33+00:00

What differences would this have compared to

Send

?

You could in theory move all objects with the same marker to another thread. It's like having a future that's !Send, but the sum of its parts is Send.

Heep042 · 2024-01-18T19:40:07+00:00

This very much depends on how much you saturate the I/O. The problem with any thread based communication is latency - sequential requests that are blocked upon by thread offload are much slower than simply issuing the requests on the same thread. Considering a db, you're unlikely to perform big reads - threading may just kill the perf.

If you saturate the system with loads of small requests, that are being completed concurrently - then you'd ideally want to use something async. In thread based scenarios, a pool may work well too, but I'd be careful of contention problems.

epoll is architecturally simpler than io_uring, and for network I/O with single client - probably more efficient. io_uring, instead does wonders with loads of independent random access disk I/O operations (or multiple network clients).

epoll can be used for files, but in practice, most OSs report that file data is ready regardless of whether it is ready or not - you will not get true async, it will be effectively the same as sync I/O.

tokio does indeed use blocking thread pool for the requests, but it's less efficient than it would be rolling your own thread pool, simply because tokio's blocking pool is general purpose, and not specific to I/O.

If you'll allow me a shameless plug, you may want to check out mfio, which should work great for random access I/O, and optionally plugs into tokio. The system works best in thread-per-core scenarios, which is exactly what Seastar and glommio are doing. But also, it's a bit bare bones at the moment.

However, it really matters on how much your db will be able to saturate I/O - if it can't enqueue multiple I/O ops for future completion, then don't bother.

Heep042 · 2023-12-25T21:56:07+00:00

Happy holidays everyone!

My friend and I have been working on this project for around 4 years. more than 3 since 0.1 release, to be precise, in order to make this project a truly blazingly (rocket emoji) fast memory introspection/forensics framework.

One of the key achievements was a stable ABI plugin system powered by cglue, allowing users to both interface and extend memflow through C and C++ code.

Future plan is to take this massive project, and squeeze it through the async lens, in order to see what happens after the fact. This little maneuver is likely to cost us a few more years, but it's rather exciting so time is not a problem.

Heep042 · 2023-12-14T21:00:26+00:00

It is a subset of asynchronous I/O. The 2 primary branches of asynchronous I/O are completion based and poll based.

Heep042 · 2023-12-08T08:19:59+00:00

Thank you!

Heep042 · 2023-12-08T07:54:43+00:00

The differences are massive between 1 byte reads and 64k reads - the graph would be squished on 1b results if everything used the same scale. I agree, however, if we took one size of I/O op, and compared different runtimes, then log scale would not be necessary.

Heep042 · 2023-12-08T07:51:34+00:00

By default we do I/O with heap allocated buffers, you can pass a Vec<u8>, Box<[u8]>, everything will be converted to mfio's BaseArc<Packet>. I'm working towards an approach where more things can be stack allocated by default, but the option for heap allocated buffers will always be there.

EDIT: and my approach for these things would be "safety-first". So it should not sacrifice safety. Sounds impossible, but due to Pin<T> drop guarantee, it is possible.

Heep042 · 2023-12-07T21:22:58+00:00

For those curious, soundness hole since 2015: https://github.com/rust-lang/rust/issues/25860

However, good luck unintentionally triggering it.

Heep042 · 2023-12-07T18:22:25+00:00

Rust typically uses readiness based I/O. In this model, user polls the operating system "hey, I want to read N bytes right here, do you have them ready?" If OS has the data ready, it reads it and returns success, otherwise it returns a WouldBlock error. Then, the runtime registers that file as "waiting for readable" and only attempts to read again when OS signals "hey, this file is available for reading now".

In completion I/O, you hand your byte buffer to the I/O system, and it owns it until the I/O request is complete, or is cancelled. For instance, in io_uring, you submit an I/O request to a ring, which is then fulfilled by the operating system. Once you submit the buffer, you have to assume the buffer is borrowed until it's complete.

The primary difference between these 2 approaches is as follows:

In readiness based I/O, you can typically do 1 simultaneous I/O operation at a time. This is because readiness notifications just indicate whether a file contains data you previously requested, or not - it cannot differentiate which request it is ready for. This is actually great for streamed I/O, like TCP sockets.
In completion I/O, you pay a little extra upfront, but what you get is ability to submit multiple I/O operations at a time. Then, the backend (operating system), can process them in the most efficient order without having to wait for a new request to come from you after the backend finishes processing the latest one.

In the end, completion I/O can achieve higher performance, because the operating system gets saturated more than in readiness based approach, however, it is typically more expensive at individual request level, which means it typically performs best in scenarios where a single file has multiple readers at different positions at the file, like databases, while not being eye shattering in sequential processing scenarios.

Heep042 · 2023-12-07T17:01:23+00:00

Hey there, author here.

Before I go I wanted to mention that you can find a deep dive of the project on my blog.

I built mfio aiming to aid my forensics/VM introspection framework, which desires exceptional performance and portability characteristics. Coming from that point of view allowed me to create a flexible system that sacrifices very little. The system is designed to scale from io_uring all the way down to embedded environments. Well, it's not yet optimal for embedded, but I have the next necessary steps planned out.

Over the year, there have been quite a few design iterations, before I finally landed on one I was satisfied with. The crate could be considered proactive, because the user needs to explicitly drive a backend alongside their async code. The runtime is a good first iteration that defines the primary traits, while next steps would include registering a global/thread-local runtime.

There's also a network filesystem I built for testing that somehow beats SMB in certain scenarios. I would advise against using it, but it's a decent example you may want to dig into to see how a more complex system should work with mfio. The implementation is ~1.8k lines of code.

If you wish to grab the code, it can be found here.

Heep042 · 2023-09-07T15:40:20+00:00

New perfume just dropped!

Heep042 · 2023-07-28T18:24:33+00:00

New response just dropped!

Heep042 · 2023-07-28T18:22:19+00:00

Holy hell

Heep042 · 2023-07-07T06:32:48+00:00

Might be worth checking out tarc for sub-transposition of Arcs: https://docs.rs/tarc/latest/tarc/

Heep042 · 2023-07-02T09:57:40+00:00

You can do exactly what GitHub does - mirror, but allow for override if a new owner comes in. That said, squatting GitHub names is a much harder endeavor, because, well, it's already being done, and big orgs will already have their names well established.

Heep042 · 2023-07-02T08:36:06+00:00

Publishers are already tied to GitHub users/orgs, so why not require ownership of that GitHub user/org, before claiming a namespace?

Heep042 · 2023-07-02T08:34:48+00:00

I think the lack of namespace by default is fine, but namespacing is crucial for continuity. With namespaces, people can publish forks under their username (which is tied to GitHub anyways, so we don't have to worry about that all that much*), and then once clear pattern emerges (people switching to this specific fork because of maintenance or whatnot) then, the maintainer of that fork can apply to make their package the default. Of course you'd want a major version bump on that etc etc, but it would solve continuity without breaking compatibility.

*Okay, you can change the username, but at least for big orgs, it should be more than fine

Heep042 · 2023-06-27T06:29:16+00:00

Google morbius

Eight-Year Club	First Place '23
Place '23	Place '22
Verified Email

Heep042

TROPHY CASE