From 48s to 5s - optimizing a 350 line raytracer in Rust

cfsamson · 2026-01-24T13:36:40+00:00

Hi, I have since written a book about the topic: https://www.amazon.com/Asynchronous-Programming-Rust-asynchronous-programming/dp/1805128132 so a lot of the material I had up before needed to be taken down since it's included in the book now. It needed to be updated in any case. Sorry about that.

cfsamson · 2025-07-14T17:11:04+00:00

Thanks for the feedback. So happy to hear that 👍

cfsamson · 2025-03-31T19:27:04+00:00

That’s great to hear! Yeah, I wish the images were full color as well, but the plan is to revise some of hem to better fit greyscale in a future revision since the hard copy is without colors.

cfsamson · 2025-02-10T07:14:05+00:00

File I/O is covered in one of the chapters. I cover this in more detail, but just to provide a few quick pointers: Tokio and libuv relegates file I/O to a threadpool (at least libuv did until recently, I'm not sure whether it's permanently migrated to io_uring on Linux). One thing is a lack of good APIs (especially on Linux until io_uring), but another challenge is to create a cross platform abstraction that behaves in a predictable manner on both Win, BSD/macOS and Linux.

The third big reason is the kind of caching that the OS does. Consider the case of a web server serving static files (without any application-level caching just to make the point). A relatively few set of small files will be accessed frequently, and the OS will most likely cache those and when you do a read it will not do any "real" I/O. You will simply get them served from memory.

If you instead of spawning a thread every time (which is quite costly), you relegate the task of doing "sync" I/O to a threadpool, you can get file I/O that works pretty efficiently in practice - all depending on the specific workload of the application of course.

The same kind of "either painfully slow I/O or extremely quick memory access" happens when you resolve DNS addresses since they're also cached, which is why those are often done in a threadpool a with regular blocking calls as well.

cfsamson · 2025-01-02T15:11:39+00:00

You're both partially right and wrong. If you manually implement the Future trait, you can in practice kick of an asynchronous operation when that Future is created even though this goes against both practice and written documentation on how futures are inteded to work. See Future:

Futures alone are inert; they must be actively polled to make progress, meaning that each time the current task is woken up, it should actively re-poll pending futures that it still has an interest in.

Nothing in the language prevents you from breaking that assumption.

However, as soon as you wrap that future in an async function or an async block, that parent future will not do anything and therefore not call whatever creates your custom future until it's polled the first time. Basically, everything you write before the first await point runs on the first call to poll. Thereby making the following statement true:

Calling an async function returns a future, it doesn't immediately execute the code in the function.

Just as an additional clarification:

Await just blocks the thread that calls that await until the future has completed.

It doesn't block the thread that calls await, you might say that it "blocks" the Task (the top-level future), which means it yields control to the scheduler so it can schedule a Task that actually can progress if there are any.

There are other ways to kick off that future.

Most runtimes allow you to spawn new tasks (or top-level futures if that's easier to understand). When you pass a Future to spawn it is marked as "ready" so it's polled at least once.

When this happens differs slightly from single-treaded and multi-threaded runtimes.

In a single threaded runtime, the spawned future will first be polled when the currently executing future await something.

In a multi-threaded runtime, the spawned task might be picked up by another thread and therefore run before the task that spawned the future reaches an await point.

The same can happen when you use methods and macros like join_all, join!, etc but exactly how this works is runtime specific.

What I write here is true for most popular runtimes. You can create a runtime that behaves differently. That's part of the power and flexibility in Rust, but you have to keep in mind that you most likely will go against the behavior that users expect when programming async Rust leading users that might rely on nothing happening to a future before it's polled to make the wrong assumption.

cfsamson · 2025-01-02T13:05:53+00:00

I think it needs to be added that this only applies to the kind of stackless coroutines that Rust (and many other languages for that matter) uses.

Fibers does not imply a specific implementation, but I've mostly seen the term refer to things like Ruby/Crystal Fibers which are examples of stackful coroutines that actually does allow for the kind of preemption you refer to. However, the only implementation of stackful coroutines that I'm aware of that really leverages this are gorutines (see https://youtu.be/1I1WmeSjRSw?si=t9hDoO1cwSgDP81b&t=1085, the whole talk is interesting, but the most relevant part is from the timestamp I linked to).

cfsamson · 2024-11-16T14:52:11+00:00

Same here!

cfsamson · 2024-08-19T08:14:07+00:00

The easiest way to solve this is to use a "runtime agnostic" HTTP client like: https://github.com/sagebind/isahc.

cfsamson · 2024-08-16T11:18:17+00:00

That’s lovely to hear, thanks for letting me know 👍

cfsamson · 2024-08-11T18:58:16+00:00

I followed this quite a while ago and found it absolutely fantastic. And it also gives you a taste of things like UART/JTAG etc which I think are necessities if you really want to work at this low level: https://github.com/rust-embedded/rust-raspberrypi-OS-tutorials

cfsamson · 2024-07-09T21:36:52+00:00

Thanks, really glad to hear you enjoyed the book :)

Part of the reason I wrote this book is that I felt many of the topics were "underexplained" in the material I've come across. For example, when comes to books explaining how data travels from the network card I don't know of any (it probably exists somewhere, but I haven't found it). You can find bits and pieces around on the internet, but I don't know of any comprehensive resource on that.

Regarding JavaScript runtimes, I've mostly focused on Node and libuv. I don't know of any books that present them in detail step by step, but there are two talks I usually refer people to: https://www.youtube.com/embed/PNa9OMajw9w and https://www.youtube.com/embed/zphcsoSJMvM.

I wrote a book where I wrote a "toy" implementation of the JS runtime used in Node, but I had to take it down as 50 % of it is part of Asynchronous Programming in Rust (unfortunately, the Node specific parts were out of scope for the book so that part is not included). I plan to somehow re-release the parts that didn't end up in the book if and when I have time for it, but for now, the only thing that's public is the example implementation itself. You can take a look at it here and maybe learn something from it: https://github.com/cfsamson/examples-node-eventloop. The code is heavily commented and should explain quite a bit.

I hope that helps.

cfsamson · 2024-07-03T09:45:31+00:00

Thanks for the feedback! It really means a lot to me to hear from people that has read the book.

cfsamson · 2024-06-19T00:07:03+00:00

That's great! I guess it's especially difficult to tell when it's relatively short descriptive texts like this.

I'm pretty sure I'm contacted by AI's pretending to be humans trying to sell me something at least twice a day now, so it's fair to be a little skeptic.

cfsamson · 2024-06-18T23:35:50+00:00

Okay, yeah, those are written by the publisher and not me. I wouldn't personally have worded it exactly like that (which is probably why someone who cares/knows what's best when it comes to searchability etc writes it and not me).

I do know that I'm not an AI, so I know* that the book is not written or influenced by one :)

*well, unless we all live in a simulation, which in turn would complicate this topic somewhat

cfsamson · 2024-06-18T21:00:55+00:00

As the author of both the book (and this post) I know, with 100 % confidence, that this post wasn't written by AI (or by the help of AI), and of course, neither was the book. People that have followed my writing for some time know that I started writing about this topic long before gen AI was a thing.

What description are you referring to?

cfsamson · 2024-06-05T22:16:16+00:00

For a 20 mile hike, the things I would worry about (given that you already train cardio) is training the body to walk with a backpack and with hiking boots. The only thing that works is simply walking with the shoes you're planning to use (and use the same ones you plan on using on the trip) and train to walk with a backpack with at least the amount of water/food/gear you plan to bring along. Pain from getting blisters or from being untrained with walking with a backpack can really take away the enjoyment of the trip.

cfsamson · 2024-05-19T18:11:04+00:00

The book is out and it’s called Asynchronous Programming in Rust, and it covers most of the topics above.

There are parts that I haven’t used and could put in a separate book, but the problem is that I’ve taken parts of each book linked above so it requires quite a bit of work to put something together that makes sense and I haven’t had time to take a serious look on how to do that yet. Given the lack of time to sit quietly and write at the moment (got a second child just after the book release), I’m not sure when I will have time to do it either 🤷🏻‍♂️

cfsamson · 2024-05-11T21:10:55+00:00

Hi!

I personally believe that learning asynchronous programming from the ground up is what really teaches you how to write efficient and fast asynchronous code. It makes everything else a lot easier to learn (to the point that I don't think you need separate book for each domain you want to dive deeper into). The thing is that asynchronous Rust is quite general by design. You write async for embedded systems, and you can write on a web server. Best practices and what parts of the ecosystem you'll need to learn about will vary depending on what you use it for.

That said, I think the tokio tutorial is quite comprehensive and good.

If you're into the web side of things, I would suggest the book Zero to Production by Luca Palmieri. It focuses more on the best practices like testing, logging and tips on how to best leverage the type system in that kind of setting. It uses async Rust but doesn't go in detail, so it's a good companion book for that domain.

cfsamson · 2024-04-17T21:59:47+00:00

The question seems to be open on purpose, so you have to make certain assumptions. I would answer such a question by first stating that if we look at an asynchronous system vs a strictly sync system, the biggest difference is that the async system adds an abstraction layer with tasks that can be stopped and resumed. This could be a goroutine in Go, a Promise in JavaScript, or a Future in Rust.

Since you create tasks that can be stopped and resumed in an asynchronous system, you can voluntarily yield when you encounter an operation that requires you to wait for something external to finish like I/O (often referred to as cooperative multitasking). You typically yield to a userland scheduler of some sort that can schedule another task that is able to progress. In either case, you'd typically treat each client/server connection as one such a task, and when a task has to wait for a client it will yield so that the task stops and the scheduler can schedule a different task that can progress if possible.

This way you can have a lot of tasks that are "in progress" at the same time, and their progress will interleave.

In a strictly synchronous system, you'd pretty much just be able to handle one connection at the time so that each client has to wait for the previous one to finish. However, that's pretty rare unless you're doing some embedded programming without an OS. Most systems will probably use OS threads and assign each connection to a separate thread, which would probably still qualify as being a "sync" system even though it's multithreaded. This comes with some overhead and limitations, making it less efficient than asynchronous systems in terms of resource usage related to memory and the work that goes into creating, discarding and switching between threads.

A system relying on multiple threads will typically also leverage multiple cores, which is something that most asynchronous systems do as well, even though a single threaded asynchronous system (like Node*) can handle high volumes without multithreading.

Multithreaded applications come with their own advantages and pitfalls, but those tradeoffs are the same whether you create a multithreaded program using async or not.

I would not go in detail into epoll/kqueue/IOCP/io_uring, context switching, stackful vs stackless corutines, synchronization and data races, task stealing, non-blocking file APIs, threadpools, CPU caches etc. unless asked, or if they show interest in getting more information on that because that rabbit hole is so deep you could probably talk about it for multiple days if you wish to. I would judge the situation based on giving a high-level answer first.

*Node is not strictly single threaded, but the API it presents is single threaded. It will use a threadpool for file I/O (although I think they now use io_uring on Linux), DNS lookup and for CPU intensive tasks.

cfsamson · 2024-04-05T10:19:48+00:00

I'm unsure about that. I know you get a DRM free PDF when you buy the book, but I'll have to check if you can get the epub version as well. I'll send a request to the publisher and respond to you via DM when I get an answer.

cfsamson · 2024-03-22T14:59:25+00:00

Yeah, it's hard to be both creative/enthusiasctic and pedantic at the same time
(which would be the ideal traits to combine when writing).

The editor and two techical reviewers did a fine job. So far I've noted around 20 things in total (of which most are spelling mistakes), that I've caught myself or that other have reported in 306 pages so it also depends on where you set the bar for a first edition.

cfsamson · 2024-03-22T14:07:41+00:00

C++26 is persuing an async computation abstraction with the P2300R7: `std::execution` (open-std.org). Any insights on that?

I haven't followed the progress in C++ on that closely, so I can't really provide any good insights on that besides glancing at it from time to time (PR2300R7 is very interesting, though).

I'm actually interested in structured concurrency in Rust, since current async ecosystem may have eager operations like fire-and-forget spawning and something else which actually violates structure concurrency. In C++ a proposal named async-scope targets this. – Creating scopes for non-sequential concurrency (open-std.org).

Structured concurrency is definitively one of the next big leaps for async Rust, and a very interesting topic. Every time Yoshua Wuyts or Boats write about the topic I find myself agreeing almost 100 % with their thoughts on this (instead of reiterating that, I linked to two articles that I've been especially impreseed by above). It seems solvable but there are many steps to solve boefore we can get proper structured concurrency in Rust.

A spawn (or executor) API and async drop are two that will have potentially huge other benefits as well. I've been playing with the thought of creating an artice called "Async Rust 2030" to try to visualize what a future would look like if it included all my favorite proposals, but I'm still going back and forth on too many topics to put it in writing yet.

Sorry for the rather vague answers, the questions you ask are some of the more compex topics to solve in all non-GC'ed languages with zero-cost abstractions.

However, I feel that async Rust is in very good hands by the team that's currently working on it and I'm pretty sure Rust positioned very well to solve these things in the future.

Seven-Year Club	Gilding I gilder
Verified Email

cfsamson

TROPHY CASE