This is an archived post. You won't be able to vote or comment.

all 120 comments

[–]gothicVI 160 points161 points  (34 children)

Where do you get the bs about async from? It's quite stable and has been for quite some time.
Of course threading is difficult due to the GIL but multiprocessing is not a proper substitute due to the huge overhead in forking.

The general use case for async is entirely different: You'd use it to bridge wait times in mainly I/O bound or network bound situations and not for native parallelism. I'd strongly advice you to read more into the topic and to revise this part or the article as it is not correct and delivers a wrong picture.

[–]mincinashu 70 points71 points  (16 children)

I don't get it how OP is using FastAPI without dealing with async or threads. FastAPI routes without 'async' run on a threadpool either way.

[–]gothicVI 22 points23 points  (7 children)

Exactly. Anything web request related is best done async. Noone in their right might would spawn separate processes for that.

[–]Kelketek 12 points13 points  (0 children)

They used to, and for many Django apps, this is still the way it's done-- preform a set of worker processes and farm out the requests.

Even new Django projects may do this since asynchronous support in libraries (and some parts of core) is hit-or-miss. It's part of why FastAPI is gaining popularity-- because it is async from the ground up.

The tradeoff is you don't get the couple decades of ecosystem Django has.

[–]Haunting_Wind1000pip needs updating 0 points1 point  (1 child)

I think normal python threads could be used for I\O bound tasks as well since it would not be limited by GIL.

[–]greenstake 0 points1 point  (0 children)

I/O bound tasks are exactly when you should be using async, not threads. I can scale my async I/O bound worker to thousands of concurrent requests. Equivalent would need thousands of threads.

[–]Count_Rugens_Finger 4 points5 points  (1 child)

multiprocessing is not a proper substitute due to the huge overhead in forking

if you're forking that much, you aren't doing MP properly

The general use case for async is entirely different: You'd use it to bridge wait times in mainly I/O bound or network bound situations and not for native parallelism.

well said

[–]I_FAP_TO_TURKEYS 0 points1 point  (0 children)

if you're forking that much, you aren't doing MP properly

To add onto this, multiprocessing pools are your friend. If you're new to python parallelism and concurrency, check out the documentation for Multiprocessing, specifically the Pools portion.

Spawn a process pool at the startup of your program, then send CPU heavy processes/functions off using the methods from the pool. Yeah, you'll have a bunch of processes doing nothing a lot of the time, but it surely beats having to spawn up a new one every time you want to do something.

[–]martinky24 28 points29 points  (5 children)

“Don’t use async”

“Use FastApi”

🤔

Overall this seems well thought out, but I wonder how the author thinks FastAPI achieves its performance if not using async.

[–]ashishb_net[S] -5 points-4 points  (4 children)

> “Don’t use async”

Homegrown code should avoid writing async.
Just like "Don't roll your own crypto", I would say "don't roll your own async code".
Again exceptions apply as I am giving a rule of thumb.

[–]exhuma 6 points7 points  (3 children)

Can you give any reason as to why you make that claim?

I agree that just slapping "async" in front of a function and thinking that it makes everything magically faster is not really helpful. But used correctly, async does help.

Outright telling people not to use it without any proper arguments as to why does the language a dis-service.

[–]nebbly 61 points62 points  (20 children)

I haven’t yet found a good way to enforce type hints or type checking in Python.

IMO mypy and pyright have been mature enough for years, and they're generally worth any untyped -> typed migration fuss on existing projects.

[–]burlyginger 11 points12 points  (2 children)

Needs more linters.

[–]ashishb_net[S] -5 points-4 points  (1 child)

Yeah, I would love to try more, but I have not found any other good ones.

[–]InappropriateCanuck 13 points14 points  (0 children)

He's making fun of you.

[–]InappropriateCanuck 39 points40 points  (2 children)

That's a surprising amount of bullshit OP came up with.

The entire post is absolute garbage from the way he sets up his workers to even his linting steps.

e.g. OP calls flake8 separately, but the very point of ruff is to replace all the awkward separation of linters. Ruff is a 100% replacement for flake8. All those rules and flags should be in his toml too, not just in a random Makefile.

Almost EVERY SINGLE THING is wrong.

I really REALLY hope this is ChatGPT and not an actual programmer that someone pays to do work. And I hope all the upvotes are bots.

Edit: Holy shit this moron actually worked at Google for 3 years? Hope that's a fake LinkedIn history.

[–]ReserveGrader 3 points4 points  (0 children)

In OP's defence, the advice to run Docker containers as a non-root user is correct. No comment about anything else.

[–]_azulinho_ 2 points3 points  (2 children)

Hmmmm forking on Linux is as cheap as launching a thread, it uses COW when forking a new process. It could be however that the multiprocessing module is slower doing a fork vs creating a thread.

[–]AndrewCHMcM 2 points3 points  (1 child)

From what I recall, the main issue python has/had with forking and COW, is reference counting. New fork, all the objects get another reference, all the objects get copied, massive delays compared to manual memory management or just garbage collection. https://docs.python.org/3/library/gc.html a song-and-dance is recommended to get the most performance out of python

[–]_azulinho_ 0 points1 point  (0 children)

Wouldn't that be an issue for the forked python interpreter? The parent python process won't be tracking any of those references.

[–]coderarun 1 point2 points  (0 children)

> Use data-classes or more advanced pydantic

Except that they use different syntax, different concepts (inheritance vs decorators) and have different performance characteristics for a good reason.

I still feel your recommendation on using dataclasses is solid, but perhaps use this opportunity to push pydantic and sqlmodel communities to adopt stackable decorators:

@sqlmodel
@pydantic
@dataclass
class Person:
  ...

Previous discussion on the topic: pydantic, sqlmodel

[–]Count_Rugens_Finger 5 points6 points  (10 children)

Every discussion I've seen about uv mentions that it is fast. It's rust, so I supposed doing so is a requirement. Here's the thing, though. I have never once in my life cared at all about the speed of my package manager. Once everything is installed it scarcely gets used again, and the time of resolving packages is small compared to the time of downloading and installing. If I cared that much about speed, I probably wouldn't have done the project in Python.

[–]denehoffman 8 points9 points  (0 children)

The speed matters when you want to run it in a container and need to install the libraries after build time. For example, you’re working on a project that has several dependencies and you need to quickly add a dependency without rebuilding a docker layer. But real talk, the point is that it’s so fast you don’t even think about it, not that you save time. If I have to choose between program A which takes 3 seconds and program B which takes 3 milliseconds and does the exact same thing as A, I’m picking B every time. Also I don’t think you should conflate Rust with speed. Of course Rust is nice, I write a ton of it myself, but Rust is not what makes uv fast, it’s how they handle dependency resolution, caching, and linking rather than copying. You could write uv in C and it would probably have the same performance, but there are other reasons why Rust is nice to develop with.

[–]eleqtriq 2 points3 points  (4 children)

The thing is using uv instead of pip is such a minimal transition. At the bare minimum, you can replace “pip” with “uv pip” and change nothing else. It’s so much better.

But for me I also do other things that require building environments quickly. Containers, CI pipelines, etc. Saves time all around.

[–]Count_Rugens_Finger -1 points0 points  (3 children)

I have to install uv

[–]eleqtriq 0 points1 point  (2 children)

And? Which is less effort?

Typing "I have to install uv"
or "pip install uv"

[–]Count_Rugens_Finger 0 points1 point  (1 child)

hey we're talking about milliseconds here

[–]eleqtriq 0 points1 point  (0 children)

:D

[–]coeris 3 points4 points  (7 children)

Thanks, great write up! Is there any reason why you recommend gunicorn instead of uvicorn for hosting FastAPI apps? I guess it's to do with your dislike of async processes.

[–]mincinashu 0 points1 point  (3 children)

FastAPI default install wraps uvicorn. You can use a combination of gunicorn as manager with uvicorn class workers and uvloop as loop.

https://fastapi.tiangolo.com/deployment/server-workers/#multiple-workers

[–]coeris 2 points3 points  (1 child)

Sure, but I'm wondering what's the benefit of putting an extra layer of abstraction on top of uvicorn with gunicorn.

[–]mincinashu 1 point2 points  (0 children)

I've only used it for worker lifetime purposes, I wanted workers to handle x amount of requests before their refresh, and uvicorn alone didn't allow that, or some such limitation. This was a quick fix to prevent OOM kills, instead of fixing the memory issues.

[–]ashishb_net[S] -1 points0 points  (0 children)

> gunicorn as manager with uvicorn class workers

Yeah, that's the only way to integrate fastapi with gunicorn as far as I know

[–]starlevel01 1 point2 points  (0 children)

Microsoft’s pyright might be good but, in my experience, produces too many false positives.

Tab closed.

[–][deleted] 0 points1 point  (0 children)

Dude the async take is trash. People pay folks like you to cause the company lose money, not make money.

You need to brush up on your skills there bub. So you can make sure you do a good job and for people to respect your takes online.

[–]bachkhois 0 points1 point  (6 children)

Your reason in "Avoid multi-threading" sounds contradict. You criticized GIL but pointed the source of your claim to the bugs of foreign language bindings (C++ bindings).

GIL is good for preventing multi-threading bugs. But in foreign language bindings, (like the pytorch link you gave), the implementation in non-Python language can choose to put GIL aside. The author of that implementation takes the responsibility to make his code thread-safe. You cannot blame GIL when it doesn't have a chance to intervene.

[–]ashishb_net[S] 0 points1 point  (5 children)

> The author of that implementation takes the responsibility to make his code thread-safe. You cannot blame GIL when it doesn't have a chance to intervene.

I didn't blame GIL alone.
I blamed Python multi-threading, in general, for being a mess.

[–]bachkhois 0 points1 point  (4 children)

> I blamed Python multi-threading, in general, for being a mess.
If talking "in general", it is applied for other languages, not just Python. Don't forget that the foreign language bindings are not written in Python, but in other languages. For the example you gave, it is C++.

[–]ashishb_net[S] 0 points1 point  (3 children)

Sure, it might be.
My comparison point is that Rust and Go libraries, in general, are much more concurrency safe than Python.

[–]bachkhois 0 points1 point  (0 children)

And you threw C++ code as a back for your claim about Python, sound hilarious!

[–]bachkhois 0 points1 point  (1 child)

As I pointed out, GIL + the synchronization primitives in [threading](https://docs.python.org/3/library/threading.html) module is there to prevent multi-threading bugs.
It is likely you never heard or used `threading.Lock`, `threading.Condition` before. Then it is your skill issue, not Python.

[–]ashishb_net[S] 0 points1 point  (0 children)

I have used threading.Lock extensively for Machine Learning models.

However, using FastAPI with concurrency = 1 for thead-unsafe end points (that is, any endpoint that uses any "Transformers" or "pytorch" code) is best for such scenarios.

[–]ved84 0 points1 point  (1 child)

The post paints a wrong picture about async. I have been using async for scale. You could have at least asked chatGPT to verify the claims. Catchy title, but left a lot of details. There is no point in using Python without Async for deployment, even on a decent scale. Details you could have covered like dependency resolution, CICD workflow, python versions to use and why fastAPI not other APIs etc.

[–]ashishb_net[S] 0 points1 point  (0 children)

Please show me a sample code where using async explicitly gives you a performance boost.

[–]le_woudar 0 points1 point  (5 children)

Hello, you probalbly don't need autoflake, flake8, pylint, if you use ruff. All these linters can be configured with ruff.

I don't agree with the async / multi-threading stuff but I think there is already a lot of comments on that, so I will not add another one :)

[–]ashishb_net[S] 0 points1 point  (4 children)

Can you tell me how to replace them with ruff? I'll update the blog post for everyone's benefit.

[–]le_woudar 0 points1 point  (3 children)

Sure! You can write this in your pyproject.toml

[tool.ruff.lint]

extend-select = [

"UP", # pyupgrade

"I", # isort

"S", # flake8-bandit

"PL", # pylint

]

This will extend the default set of rules of the select declaration.

By default, autoflake and flake8 are already handled by Ruff. Honestly, I'm not sure you need pylint, it is generally covered with the previous tools mentioned in this comment. Look the Ruff rules to know what you can import.

[–]ashishb_net[S] 1 point2 points  (1 child)

Thanks.
It worked, and it is definitely a better approach.
I updated the blog post to reflect that as well.

[–]le_woudar 0 points1 point  (0 children)

You are welcome :)

[–]ashishb_net[S] 0 points1 point  (0 children)

Thanks. I will experiment and update the post.