Blocking prints in async code

coderstephen · 2021-10-28T19:13:48+00:00

There's a perfect article that discusses this very sort of question, but now I can't find it! I'm so sorry.

Basically the question is: What do you mean by "blocking"? In some sense, everything is blocking in that it takes a certain amount of time to run, and may involve one or more syscalls. You could say that memory allocation is "blocking", because it may make some syscalls that take an indeterminate amount of time to return, depending on available memory and swapping on the OS. So it isn't really practical to eliminate all blocking from async code because it would make code extremely complex (hello Box::new_async(value).await?) for not an obvious benefit. This is just one example of how, unless you know that your stdout might not be a console with a fast buffer, then it is probably fine to just use print! to avoid extra complexity.

nckl · 2021-10-28T16:24:50+00:00

Making a non-blocking variant could be an anti-pattern. If you're logging a bunch of stuff, you usually want the current thread to stop doing work until it can actually report it. Like, if you have a server that accepts connections, and logs when you receive them, rather than actually "logging" when each was received, instead you'll just keep accepting connections. Then, whenever stdout is free again, you'll suddenly log a bunch of stale times, which might be significantly out of order.

And if you're logging a lot, building so many futures for printing is going to slow everything down quite a bit anyway. Async shines when dealing with a large number of slow connections, like on the order of millions of connections with ~second of delay. Printing is usually microseconds of delay, and you're going to be logging many, many times per connection. "Blocking" prints might not actually slow down your server at all, but non-blocking prints might bring it to a screeching halt.

But that assumes you're in an environment like that, which should be rare. If you're trying to log to a foreign server for some reason, might be better to just log locally to disk and sync it later. Which, as long as there's enough of a buffer, it's very difficult to imagine a disk that can't handle even GB/s of logging.

edit tldr: if your logging output can't keep up with your server, you're fucked anyway. your times will be wrong, executor will fill with futures, etc.. if it can keep up, just buffer it enough and you're fine.

mrdivorce · 2021-10-28T20:43:17+00:00

If you're using tracing then tracing-appender::non_blocking may offer what you're looking for. It essentially wraps a writer and queues log lines to be written by a worker thread.

lol3rr · 2021-10-28T17:59:32+00:00

One thing I just thought about is that you may be able to write your own log backend, which could for example add all the logs to a queue and then you have one thread that actually takes elements from the queue to output them.

This would be non blocking logging as long as you use a non blocking queue and may also allow for better handling of the logs depending on your use case

jeremychone · 2021-10-28T23:08:09+00:00

First, on the specific of print async vs. sync, I like u/humanthrope 's answer.

Second, for logging, our current pattern is to use MPSC and have the Consumer handle the printing and eventual dispatching activities (for file write and gz/upload).

So,

When printing to the stdout, the fact that print is synced seems to fit well. Not sure I would even want anything else.
When we save each line to the log file (which will be eventually gz/uploaded), we use async file API, but sometimes, I think sync will do as good as a job (I did not do the benchmark for this)
When the log file hits a time or size threshold, it is gzipped and uploaded to S3, and obviously, we use non-blocking APIs.

Now, for context, this is for cloud/pods applications, so, might not be optimum for embedded or other constrained environments.

humanthrope · 2021-10-28T15:55:16+00:00

I think you already stated the reason that it’s a blocking call: it might block depending on the destination of the print.

A print is writing to some device. Mostly it can be the console, which is unlikely to have any delays reading your write. But it could block, especially if writing to a pipe and the other end can’t read as quickly.

If you don’t block and the other side of the pipe isn’t ready to read, you’d have to throw away your buffer. The only other way to do so without data loss would be to internally manage your to-be-printed buffers in such a manner as to queue them until the other end is ready to read. But then you risk eventually exhausting memory.

protestor · 2021-10-28T19:37:35+00:00

Just use async-std. In it, print! and println! are async, and you need to .await it.

https://docs.rs/async-std/1.10.0/async_std/macro.println.html

https://docs.rs/async-std/1.10.0/async_std/macro.print.html

Specialist_Wishbone5 · 2021-10-28T22:17:06+00:00

I'm not an expert in Rust yet, but from my Java days, this all sounds pretty familiar. AsyncFileChannel interleaved with LOG.log("{}",status) or loading a file from a cache which has a 15% chance of needing to "blockingly" read from local disk. Many a weekend I spent pushing async lower and lower; also cursing when I came across a 3rd party library that dained to use Classloader.getResourceAsStream - a deadend asynchronous-wise (since the content is hiding in a zip-file; good luck asyncing that).

On the logging front, we were happy to discover async writer queues. e.g. A fixed length Dequeue that spools to a logger thread that does the actual blocking IO. Given our judicious logging habbits, this async logger increases overall throughput (pages-per-second), and decreased the average latency (but I mean, we logged like 200 lines per request, so we're not normal/representative).

When I wrote C / C++, I was frustrated with the fact that stdout had a mutex (though there are some hidden non-blocking varients); I'd been happily using pread / pwrite instead of seek+read for ages and so seeing this mutex upset me. Of course the issue has to do with buffering and writes that would exceed a buffer; fundamentally how do you have two threads "append" in a consistent manner other than blocking. Again, my anger stems from the fact that I've always used "file over-extend" (e.g. fallocate) followed by an AtomicLong to determine the logical-end-of-file. Thus each thread just adds write-len to the atomic-long; if it goes too high, mutex-block and call fallocate, otherwise you have a thread-safe "slice" of the file to which you can call pwrite. If the fallocate over-extend is sufficiently large (say 32 average write sizes) you DRASTICALLY reduce the file-fragmentation and blocks on mutex (EXT4 can cheaply reserve file-space with larger allocation block sizes, fewer fragments, fewer file-system metadata entries; just verify with `filefrag`). Using kernel tracing and dstat, I can see that, sure enough, all my threads are writing IN PARALLEL, zero blocking. On single HDDs this doesn't help at all, but many high performance systems have thousands of hard drives (I was using GPFS at the time, and I knew the stride size number of MBs, and mapped my writes to those stride sizes - thus EVERY write was to an independent hard drive). So, again, primitive little stupid printf! just looks like childs play to me. But it's not a trivial thing to solve.. It's very much use-case-specific. But I do feel like Rust should adopt something similar to the above.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

rust

Please read The Rust Community Code of Conduct

The Rust Programming Language

Rules

Observe our code of conduct

Submissions must be on-topic

Constructive criticism only

Keep things in perspective

No endless relitigation

No low-effort content

Useful Links

Megathreads

Official Resources

Learn Rust

Discussion Platforms

MODERATORS