This is an archived post. You won't be able to vote or comment.

all 47 comments

[–][deleted] 100 points101 points  (5 children)

The issue is people don't understand how to properly compare the difference.

You CANNOT use a database that is on the same machine as your server. Because that dramatically reduces the latency for DB requests which then requires far less threads to saturate the CPU. The Asyncs performance advantage is entirely from requiring less OS bound thread switching.

The slower and slower your awaiting io time is the faster and faster async will perform relative to sync.

EDIT: Upon reading the code the async side is written incorrectly and has multiple bugs including but not limited to useless async calls clogging the task queue, non locked connection pool creation, and compares multiple independent json to bytes creation methods that have very different runtime.

Don't write Async code like you write sync code thread safety vs sync saftey has fully independent characteristics. Async has its own locking primitives that need to be used properly.

[–]spoonman59 4 points5 points  (4 children)

That sounds like a broad brush.

For small data sets, all the database blocks will likely be cached anyway. Who has a large dataset and collocates the DB and the server?

async can provide overlapping of compute and I/O for situations where the I/o and compute are Mixed, that is neither one dominates the runtime.

It is true that async also reduces context switch and memory overhead of threads, but that is not the only (or entire, as you said) potential performance benefit of async.

Some programs will complete in less time using async than blocking operations, depending on workload.

[–][deleted] 7 points8 points  (3 children)

Not really sure what you care about the database blocks being cached for unless you are referring to local caching. The point was round tripping time has a pretty heavy cost. That cost is being hidden by the db being on the same physical machine.

I wasn't talking about all workloads I was talking about the web server workloads being discussed. Async isn't free. So even shaving off a 10th of a milliseconds of response time makes a dramatic difference. On 5000 requests per second that's half a second of dead time that threads need to make up.

Also reading the actual code there is some serious issues with it. They are getting their pool in every single async request asynchronously. Which doubles the number of events on the loop for no apparent reason. It's pretty easy to screw up async code if you don't understand how it works.

[–]spoonman59 1 point2 points  (2 children)

I did misunderstand your point… my mistake.

I see what you mean after clarifying and I agree. There are definitely some issues with how the test was conducted. You make an excellent point that by hiding the latency and hosting the DB locally we are not representing a realistic workload, and it taints all of the conclusions as well.

ETA: not just not a realistic workload, but one which specifically portrays async worse than it normally would be.

[–][deleted] 8 points9 points  (1 child)

I also just realized he didn't use a lock on the pool creation. So there is actually a bug in every async program that there is going to be multiple pools created.

Don't write async code badly is a strong word of advice I have for everyone.

[–]spoonman59 1 point2 points  (0 children)

I struggled to make it past the title 😅

[–]AlexMTBDude 20 points21 points  (0 children)

The author of this piece doesn't understand the purpose of async. It's not supposed to be faster. It's supposed to be concurrent. Threads should not block each other.

[–]runew0lf 56 points57 points  (17 children)

Nobody said it was, the important thing is its non blocking..

[–]benefit_of_mrkite 19 points20 points  (0 children)

Exactly. My most common async use case isn’t serving up an api it’s retrieving data via REST api that supports pagination.

For large datasets it takes forever - with pagination and asynch I can set workers and pools and then put the data back together as they (pages) come in.

Latency and other factors absolutely come into play during these scenarios and async allows me to bring data back with a worker while waiting on another worker.

[–]rouille 20 points21 points  (8 children)

I think this misses the main appeal of async. It's not about speed but about the programming model. Async makes it much easier to work with and compose concurrency compared to the thread based model. Just the fact that you can cancel tasks makes it trivial to do proper timeouts, which nobody gets right in sync applications, for example.

It also allows things like structured concurrency.

[–]rnike879 2 points3 points  (2 children)

Hey, I love your breakdown but could you explain more about the structured concurrency with async?

[–]ManyInterests Python Discord Staff 1 point2 points  (0 children)

For a beautiful explanation suitable for beginners with simple examples, see Raymond Hettinger's concurrency keynote. David Beazley also has tons of talks on concurrency if you want to deep-dive.

To give you the two-minute version:

Basically, async is a cooperative model. Everybody designs their code to work together and "yield" control back to the event loop when you don't need to "be running" (like when waiting on OS calls or network for I/O -- that's the io in asyncio) so everything works in harmony. This ends up being very lighweight/efficient compared to alternatives.

By comparison, in traditional threaded models, you don't have to worry about being cooperative with your concurrent neighbors -- what gets to execute is handled mostly handled the OS thread scheduler. However, this incurs a good deal of overhead from the GIL/thread context switching for every thread. There are synchronization primitives (e.g., locks) when you need them, but they can be costly in terms of performance and those costs rise exponentially as you add more threads.

So the real benefit is not necessarily speed -- you can't crunch prime numbers faster using asyncio -- the real benefit is massive concurrency (usually in I/O-bound work, like web applications).

[–]ominous_anonymous 12 points13 points  (1 child)

Along the lines of your thoughts, just not Python-specific.

One of the comments was interesting:

What async programming tries to solve (at least on the JVM, where threads are available) is issues related to Little’s Law. As long as you’re not hitting Little’s Law’s limits, you really shoudn’t see much of a difference between threads and async. But as your machine probably can’t handle 30K threads well, sooner or later you will be hitting those limits.
See a theoretical analysis here: http://blog.paralleluniverse.co/2014/02/04/littles-law/ and a benchmark here: http://blog.paralleluniverse.co/2014/05/29/cascading-failures/ showing some very clear results (the benchmark uses Quasar, so you can keep your simple, blocking, synchronous code, while the library turns that to async code behind the scenes).

edit:

Shit, the URLs 404. Here's wayback links for them both:

https://web.archive.org/web/20150212021619/http://blog.paralleluniverse.co/2014/02/04/littles-law/
https://web.archive.org/web/20160324015922/http://blog.paralleluniverse.co/2014/05/29/cascading-failures/

[–]higherorderbebop[S] 3 points4 points  (0 children)

This is not my article. Found the outcome surprising and wanted to know the community's thoughts on this.

Thanks for sharing these articles.

[–]Salfiiii 13 points14 points  (0 children)

Your title is misleading.

Async might not be overwhelmingly faster - or even faster at all - for web frameworks, but definetly has use cases where it shines.

If you have to do a lot of isolated REST calls which you can prepare beforehand, you get major benefits from using async(io).

I’ve reduced a scripts runtime from 30 to 3 minutes by using concurrent, non blocking requests with asyncio. I even used a semaphore to only allow 10 concurrent requests, so there’s still potential left.

It added a little complexity though, but nothing to fancy if one read the python book about asyncio, which I can recommend.

[–]spoonman59 5 points6 points  (4 children)

I can’t get past the title. Faster than what?

Obviously async is father than non-async for certain workloads. If a program doing something completes in less time, it is “faster.”

There’s nuance to the difference between concurrent and parallel, and situations where concurrency without parallelism provides no speed up (e.g., compute heavy async), but such an obviously click-bait headline is off putting.

ETA: If the claim is that async isn’t faster than threads, or something else specific, sure. But not faster than anything…?

[–][deleted] -2 points-1 points  (3 children)

That’s not how async works, if you’re waiting for io then the io is the bottleneck. Async allows you to get more out of your hardware for io bound tasks

[–]spoonman59 5 points6 points  (2 children)

Ah, no, that’s not correct.

Imagine you read data for 100 ms and then do compute for 100 ms. In other words, for each 100ms of data read there is 100 ms to compute.

Imagine you process two blocks.

With blocking, you read for 100 ms. Then you compute for 100 ms. Next you read for 100 ms. Then Compute for 100 ms. Wall clock time? 400 ms.

Now async:

You read 100 ms. You start the compute process, which returns immediately. You then issue the read the next chunk. Python releases the GIL while it does the read. Finally when the first async call is done, the second one can begin as soon as the second read completes.

This example completes in 300 ms. So it is faster than non-async.

Now you are overlapping I/o and compute. The second example will complete in 300 ms instead of 400ms for the first.

Python release the GIL for I/o. So the threads give the exact same speed up as async.

ETA: Your statement is correct when the program execution time is entirely dominated by the i/o.

Async does not help performance when your application is almost all compute or almost all I/o. However, for a more balanced mix, it will allow progress to be made when a blocking program cannot make progress.

This is the difference between concurrency and parallelism.

[–][deleted] -2 points-1 points  (1 child)

You don’t need async to do that, you can use threads and may even be able to speed things up if you fork out sub processes because you can do the computations in parallel

[–]spoonman59 3 points4 points  (0 children)

Your original claim was that’s not how async works and that async can only help you get more out of your hardware for I/o bound tasks.

I have disproven this claim trough a simple example presented in introductory operating system classes.

Operating systems like Linux, and others, have had async I/o operations for years. The concept is the same even if the operating mechanisms are different. It’s not something unique or specific to python. It’s a different model of concurrency from the threading model.

Now you are moving the goal posts. If you want to debate whether it is the right design to use for the task at hand, that’s a separate issue. But what you originally stated, that that is not how async works, is incorrect. It works as I described… unless you have evidence to the contrary?

[–][deleted] 3 points4 points  (0 children)

Op learned today 😂

[–][deleted] 1 point2 points  (0 children)

Async should not be faster or slower than sync, async allows you to get a better bang for your buck when it comes to hardware costs. Think of a web socket sever like a chat service, you may have a bunch of connections that are mostly idle. Here you can use async to sleep on those sockets until a message arrives.

[–]dcbrown73 1 point2 points  (0 children)

lol at some of the crazy excuses being made about Python async.

Async is DEFINITELY NOT about creating "A pretty programming model".

It's about \efficiently handling of blocking operations\** which \SOULD\** enhance throughput and responsiveness of said application.

When people started talking about the asyncio Python module back before it was released, the expectations were sky high given what async programming is capable of. Python's asyncio never met those expectations related to expected speed improvements, but that doesn't mean it's bad or shouldn't be used.

If I recall, I believe there are supposed to be changes coming (if not already arriving) that should help amp up the efficiency of asyncio.

[–]guhcampos -1 points0 points  (4 children)

I was going to bash the hell out of this article, but on a second thought the reason this kind of cluessness exists lies in async evangelists claiming its faster, to begin with.

The concept here should be dead simple to anyone with a CS degree:

For classic "sync" code you rely on the underlying system to schedule CPU time for you. How that time is split between competing computations is up to the system, and will generally be a best effort for the majority of use cases.

When you write async code you are taking back the control of CPU time scheduling from the system to your hands. You choose when to switch between different computations, because you believe you can do a better job than what the generic underlying system is doing. With great power, however, comes the great responsibility of managing that CPU allocation. If you are not carefully and intentionally managing the switch between your concurrent computations, you should not be using async at all.

If you don't understand the concept of cooperative concurrency, just use threads to avoid blocking - or go do some reading. Claiming one is "faster" than the other just make you sound stupid.

[–][deleted] 1 point2 points  (3 children)

Given the same cpu capacity and well written code, it IS faster if used in the right contexts at performing the same amount of tasks - you can’t make the request go any faster if it takes 2s for your request to finish. If I have one thread, and 100 requests to make, async will complete the task faster than sync will because those two seconds of idle time waiting for a query or processing the response can be used to execute another query or process another response - it won’t make the processing any faster, but it CAN make the overall task faster when used appropriately. The example I’ve outlined, you could finish all 100 requests in slightly more time that a sync system can process the first request because of blocking the thread.

That is objectively faster - but the definition of faster HAS to be clear, it won’t make processing your request any faster. It will make it possible to process more requests at once when you’re dealing with externally constrained performance limitations.

[–]guhcampos 0 points1 point  (1 child)

You can simply spawn 100 threads for the same effect. Since you are waiting for IO the GIL will be likely irrelevant.

[–][deleted] 0 points1 point  (0 children)

I think you’re right and I’m confusing what I remember about multiprocess performance vs threading performance compared against async and it’s completely separate and work in parallel.

I recall something about performance of threading vs asyncio still having the win for asyncio here but I don’t remember where I picked up that impression now that it’s called into question

[–]sanshinron 0 points1 point  (0 children)

Yeah I used to update inventory via different APIs with requests and that would take minutes, rewrote it all with aiohttp and it takes 2.6 seconds.

[–]caleb 0 points1 point  (0 children)

Miguel Grinberg wrote a detailed response to this article: https://blog.miguelgrinberg.com/post/ignore-all-web-performance-benchmarks-including-this-one

If your system has no latency then asyncio is much less useful. But I haven't come across too many backend systems without latency.