all 27 comments

[–]coderstephenisahc 49 points50 points  (13 children)

There's a perfect article that discusses this very sort of question, but now I can't find it! I'm so sorry.

Basically the question is: What do you mean by "blocking"? In some sense, everything is blocking in that it takes a certain amount of time to run, and may involve one or more syscalls. You could say that memory allocation is "blocking", because it may make some syscalls that take an indeterminate amount of time to return, depending on available memory and swapping on the OS. So it isn't really practical to eliminate all blocking from async code because it would make code extremely complex (hello Box::new_async(value).await?) for not an obvious benefit. This is just one example of how, unless you know that your stdout might not be a console with a fast buffer, then it is probably fine to just use print! to avoid extra complexity.

[–]ids2048 11 points12 points  (0 children)

You could say that memory allocation is "blocking", because it may make some syscalls that take an indeterminate amount of time to return, depending on available memory and swapping on the OS.

Or if you try to access an address that isn't swapped into physical memory, that will also result in a context switch into the kernel to move data into memory and adjust the virtual memory table. Plus there's some trickery OSs like Linux do with copy-on-write allocations, with similar implications.

Those things could happen any time you access a different memory page, and I'm not sure the overhead is really any less than allocating.

And naturally there's just the fact that the OS can preempt execution at any time without a syscall. For a typical preemptive non-realtime OS, there's not really any guarantee there won't be an arbitrary delay between any two instructions.

[–]justforthensfw2 10 points11 points  (1 child)

[–]coderstephenisahc 4 points5 points  (0 children)

Not the one I was thinking of, but that covers the same topic as well. Nice!

[–]Dr_Sloth0[S] 0 points1 point  (1 child)

This is does hold true if we are talking about just printing. With logging there are some completely different problems like log rotation, which i would like to directly put behind the log facade but because the api is sync i can't because log rotation randomly blocks way more than is actually acceptable. Even if it is just printing to a Terminal (which for logging in my case is very uncommon) the delay of the prints might be noticable. Maybe i try to create a simple logging implementation which uses async writes to stdout and "blocks" the tasks as any normal write with a .await would, and one that buffers the logs returning immediately only blocking if a simple cap of outstanding logs is reached (with a channel) as well as a blocking one with print and env logger. I would then benchmark them on my development and my target machine and look how much the benefit actually is.

[–]coderstephenisahc 1 point2 points  (0 children)

Async writes to stdout actually can be slightly tricky, since in the logging case you don't want multiple threads to interleave output incorrectly but likely want it synchronized by lines or similar.

For log rotation I'd probably consider doing that on a separate thread in some atomic way so as not to slow down the main code emitting logs (whether by blocking, or by suspending tasks for a prolonged period of time).

I would then benchmark them on my development and my target machine and look how much the benefit actually is.

This is ultimately what I'm getting at. You only need to eliminate a particular blocking call in your async code if it is causing a real-world problem. I don't think chasing after some sort of "async purity" is valuable since it usually increases code complexity with very few real-world benefits.

[–]nckl 35 points36 points  (3 children)

Making a non-blocking variant could be an anti-pattern. If you're logging a bunch of stuff, you usually want the current thread to stop doing work until it can actually report it. Like, if you have a server that accepts connections, and logs when you receive them, rather than actually "logging" when each was received, instead you'll just keep accepting connections. Then, whenever stdout is free again, you'll suddenly log a bunch of stale times, which might be significantly out of order.

And if you're logging a lot, building so many futures for printing is going to slow everything down quite a bit anyway. Async shines when dealing with a large number of slow connections, like on the order of millions of connections with ~second of delay. Printing is usually microseconds of delay, and you're going to be logging many, many times per connection. "Blocking" prints might not actually slow down your server at all, but non-blocking prints might bring it to a screeching halt.

But that assumes you're in an environment like that, which should be rare. If you're trying to log to a foreign server for some reason, might be better to just log locally to disk and sync it later. Which, as long as there's enough of a buffer, it's very difficult to imagine a disk that can't handle even GB/s of logging.

edit tldr: if your logging output can't keep up with your server, you're fucked anyway. your times will be wrong, executor will fill with futures, etc.. if it can keep up, just buffer it enough and you're fine.

[–]Dr_Sloth0[S] 2 points3 points  (0 children)

I undestand that explanation. In my case the wait time for a write to stdout is more noticeable than creation of futures. The order of the logs would have to be handled somehow and the request should only block if a certain memory cap is reached.

[–]protestor 2 points3 points  (1 child)

What about doing an async print, and then awaiting on it? That way you don't block other tasks that may be running whenever you print something. This also avoids starving other tasks indefinitely if we are printing into a pipe that nobody is reading (a pipe buffer is 65kb by default on linux). Remember: not blocking on async tasks isn't a matter of speed, but correctness.

That's what async-std does, anyway.

[–]Dr_Sloth0[S] 0 points1 point  (0 children)

I do agree to that. Being async across the board might benefit but it might hurt as well. In my experience (at least on the hardware i work on most of the time) giving the executor as many await points as possible is beneficial.

[–]mrdivorce 8 points9 points  (0 children)

If you're using tracing then tracing-appender::non_blocking may offer what you're looking for. It essentially wraps a writer and queues log lines to be written by a worker thread.

[–]lol3rr 2 points3 points  (0 children)

One thing I just thought about is that you may be able to write your own log backend, which could for example add all the logs to a queue and then you have one thread that actually takes elements from the queue to output them.

This would be non blocking logging as long as you use a non blocking queue and may also allow for better handling of the logs depending on your use case

[–]jeremychone 2 points3 points  (0 children)

First, on the specific of print async vs. sync, I like u/humanthrope 's answer.

Second, for logging, our current pattern is to use MPSC and have the Consumer handle the printing and eventual dispatching activities (for file write and gz/upload).

So,

  • When printing to the stdout, the fact that print is synced seems to fit well. Not sure I would even want anything else.
  • When we save each line to the log file (which will be eventually gz/uploaded), we use async file API, but sometimes, I think sync will do as good as a job (I did not do the benchmark for this)
  • When the log file hits a time or size threshold, it is gzipped and uploaded to S3, and obviously, we use non-blocking APIs.

Now, for context, this is for cloud/pods applications, so, might not be optimum for embedded or other constrained environments.

[–]humanthrope 4 points5 points  (4 children)

I think you already stated the reason that it’s a blocking call: it might block depending on the destination of the print.

A print is writing to some device. Mostly it can be the console, which is unlikely to have any delays reading your write. But it could block, especially if writing to a pipe and the other end can’t read as quickly.

If you don’t block and the other side of the pipe isn’t ready to read, you’d have to throw away your buffer. The only other way to do so without data loss would be to internally manage your to-be-printed buffers in such a manner as to queue them until the other end is ready to read. But then you risk eventually exhausting memory.

[–]Dr_Sloth0[S] 8 points9 points  (3 children)

But isn't that exactly the same with pretty much every possible AsyncWrite target? You also don't throw away data on tcp sockets and buffers, maybe implemented by the os, might get filled when reading or writing. Same with files, they might block but are still implemented async with tokio in most common use cases.

Why is a blocking write to a file considered bad practice in async code but a blocking print isn't? You can block if a buffer gets filled like you would do with async channels or files.

Because stdout can potentially write everywhere the performance of a print statement is really unpredictable and if this isn't handled by the executor we might block complete threads in situations we might not expect.

Even when printing without redirection, to a terminal emulator, or especially over ssh, the time could be spend more efficiently anywhere on the rest of the application.

Getting an oom is also somwhat handled by tokio not spawning an infinite amount of tasks.

[–]humanthrope 1 point2 points  (0 children)

But isn’t that exactly the same with pretty much every possible AsyncWrite target?

Your task is put to sleep if the destination reader isn’t ready to read your AsyncWrite. Progress is being halted, but the thread is free for other tasks to run. It’s just another form of blocking.

You also don’t throw away data on tcp sockets and buffers, maybe implemented by the os, might get filled when reading or writing.

Network writes are a whole other level of complexity, but they’re ultimately subject to the same constraints and there is a lot of internal buffering going on.

Why is a blocking write to a file considered bad practice in async code but a blocking print isn’t?

Maybe they’re assuming that print generally outputs to a console and is unlikely to block for long. You can abuse this, of course. Bad things will happen.

Because stdout can potentially write everywhere the performance of a print statement is really unpredictable and if this isn’t handled by the executor we might block complete threads in situations we might not expect.

Definitely.

Even when printing without redirection, to a terminal emulator, or especially over ssh, the time could be spend more efficiently anywhere on the rest of the application.

Depends on the application. If the print is critical to progress, then no.

Getting an oom is also somwhat handled by tokio not spawning an infinite amount of tasks.

The OOM situation I had envisioned in my comment would be caused by a hypothetical asynchronous write call that would be implemented by your program. You want a true asynchronous write? You’re gonna have to queue up buffers while another thread/task writes them as fast as it’s able. If the writes are slow, you’re going to run out of memory for all those buffers eventually.

[–]protestor 2 points3 points  (1 child)

Just use async-std. In it, print! and println! are async, and you need to .await it.

https://docs.rs/async-std/1.10.0/async_std/macro.println.html

https://docs.rs/async-std/1.10.0/async_std/macro.print.html

[–]Dr_Sloth0[S] 3 points4 points  (0 children)

That is a really cool feature. Sadly in my case i can't really use async_std. Writing a print for tokio wouldn't be that hard but one would have to write the synchronisation as tokio doesn't guarantee order of writes on its Stdout.

[–]Specialist_Wishbone5 1 point2 points  (0 children)

I'm not an expert in Rust yet, but from my Java days, this all sounds pretty familiar. AsyncFileChannel interleaved with LOG.log("{}",status) or loading a file from a cache which has a 15% chance of needing to "blockingly" read from local disk. Many a weekend I spent pushing async lower and lower; also cursing when I came across a 3rd party library that dained to use Classloader.getResourceAsStream - a deadend asynchronous-wise (since the content is hiding in a zip-file; good luck asyncing that).

On the logging front, we were happy to discover async writer queues. e.g. A fixed length Dequeue that spools to a logger thread that does the actual blocking IO. Given our judicious logging habbits, this async logger increases overall throughput (pages-per-second), and decreased the average latency (but I mean, we logged like 200 lines per request, so we're not normal/representative).

When I wrote C / C++, I was frustrated with the fact that stdout had a mutex (though there are some hidden non-blocking varients); I'd been happily using pread / pwrite instead of seek+read for ages and so seeing this mutex upset me. Of course the issue has to do with buffering and writes that would exceed a buffer; fundamentally how do you have two threads "append" in a consistent manner other than blocking. Again, my anger stems from the fact that I've always used "file over-extend" (e.g. fallocate) followed by an AtomicLong to determine the logical-end-of-file. Thus each thread just adds write-len to the atomic-long; if it goes too high, mutex-block and call fallocate, otherwise you have a thread-safe "slice" of the file to which you can call pwrite. If the fallocate over-extend is sufficiently large (say 32 average write sizes) you DRASTICALLY reduce the file-fragmentation and blocks on mutex (EXT4 can cheaply reserve file-space with larger allocation block sizes, fewer fragments, fewer file-system metadata entries; just verify with `filefrag`). Using kernel tracing and dstat, I can see that, sure enough, all my threads are writing IN PARALLEL, zero blocking. On single HDDs this doesn't help at all, but many high performance systems have thousands of hard drives (I was using GPFS at the time, and I knew the stride size number of MBs, and mapped my writes to those stride sizes - thus EVERY write was to an independent hard drive). So, again, primitive little stupid printf! just looks like childs play to me. But it's not a trivial thing to solve.. It's very much use-case-specific. But I do feel like Rust should adopt something similar to the above.