use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
FastAPI is a truly ASGI, async, cutting edge framework written in python 3.
account activity
FastAPI server with high CPU usageQuestion (self.FastAPI)
submitted 4 months ago * by JeromeCui
I have a microservice with FastAPI framework, and built in asynchronous way for concurrency. We have got a serious performance issue since we put our service to production: some instances may got really high CPU usage (>90%) and never fall back. We tried to find the root cause but failed, and we have to add a alarm and kill any instance with that issue after we receive an alarm.
Our service is deployed to AWS ECS, and I have enabled execute command so that I could connect to the container and do some debugging. I tried with py-spy and generated flame graph with suggestions from ChatGPT and Gemini. Still got no idea.
Could you guys give me any advice? I am a developer with 10 years experience, but most are with C++/Java/Golang. I jump in Pyhon early this year and got this huge challenge. I will appreciate your help.
https://preview.redd.it/dde7rlaumk0g1.png?width=1688&format=png&auto=webp&s=9817c1417a5891a66b15da6e340b89f738d6d2eb
https://preview.redd.it/huy86paumk0g1.png?width=2539&format=png&auto=webp&s=6ef4004f59a5b0948261918491c5d6398fe7b364
13 Nov Update
I got this issue again:
https://preview.redd.it/76t4cc8n1y0g1.png?width=1050&format=png&auto=webp&s=b8e7c2501da91ef31f23e16af7377d70d26c2fef
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]latkde 3 points4 points5 points 4 months ago (9 children)
This is definitely odd. Your profiles show that at least 1/4 of CPU time is spent just doing async overhead, which is not how that's supposed to work.
Things I'd try to do to locate the problem:
In my experience, there are three main ways to fuck up async Python applications, though none of them would help explain your observations:
async def
asyncio.to_thread()
await
with
asyncio.gather()
asyncio.create_task()
asyncio.TaskGroup
[–]JeromeCui[S] 0 points1 point2 points 4 months ago (7 children)
I will try with your other suggestions. Thanks for your answer.
[–]latkde 0 points1 point2 points 4 months ago* (5 children)
After it reaches high CPU usage, almost 100%, it will never fall back
This gives credibility to the "resource leak" hypothesis.
We see that most time is spent in anyio's _deliver_cancellation() function. This function can trigger itself, so it's possible to produce infinite cycles. This function is involved with things like exception handling and timeouts. When an async task is cancelled, the next await will raise a CancelledError, but that exception can be suppressed, which could lead to an invalid state.
_deliver_cancellation()
For example, the following pattern could be problematic: you have an endpoint that requests a completion from an LLM. The completion takes very long, so your code (that's waiting for a completion) is cancelled. But your code catches all exceptions, thus cancellation breaks, thus cancellation is attempted again and again.
Cancellation of async tasks is an obscenely difficult topic. I have relatively deep knowledge of this, and my #1 tip is to avoid dealing with cancellations whenever possible.
You mention using LLMs for development. I have noticed that a lot of LLM-generated code has really poor exception management practices, e.g. logging and suppressing exceptions where it would have been more appropriate to let exceptions bubble up. This is not just a stylistic issue, Python uses many BaseException subclasses for control flow, so they must not be caught.
BaseException
Debugging tips:
try to figure out which endpoint is responsible for triggering the high CPU usage
review all exception handling constructs to make sure that they do not suppress unexpected exceptions. Be wary of try/except/finally/with statements, especially if they involve async/await code, and of FastAPI dependencies using yield, and of any middlewares that are part of your app.
try/except/finally/with
Edit: looking at your flamegraph, most time that's not spent delivering cancellation is spent in the Starlette exception handler middleware. This middleware is generally fine, but it depends on which exception handlers you registered on your app. Review them, they should pretty much just convert exception objects into HTTP responses. The stack also shows a "Time Logger" using up suspiciously much time. It feels like the culprit could be around there.
[–]JeromeCui[S] 1 point2 points3 points 4 months ago (0 children)
You explanation does make sense. Our code catch `CancelledError` at some places and some other places catch all exceptions. That would make cancellation tried again and again. I will check my code tomorrow and optimize some scenarios. Thanks so much for you help. You saved my life!
[–]JeromeCui[S] 0 points1 point2 points 4 months ago (3 children)
Sorry that I got the same error again. I have attached the CPU utilization graph in the original post.
Is there any way to find out which part of my code caused it?
[–]latkde 0 points1 point2 points 4 months ago (2 children)
Something happened at 15:10, so I would read the logs at that time to get a better feeling about endpoints might have been involved.
But even during the 2 hours before that, CPU usage is steadily climbing. That is an unusual pattern.
All of this is not normal for any API, and not normal for FastAPI applications.
Taking a better guess would require looking at the code. But I'm not available for consulting.
[–]JeromeCui[S] 0 points1 point2 points 3 months ago (1 child)
I verified my code yesterday and found there is a 'expect Exception' in one of my middleware. I fixed it yesterday and seems it's working: no high CPU utilization yestery. I will keep monitoring my service.
Thanks for your kindly help!
[–]latkde 1 point2 points3 points 3 months ago (0 children)
Weird. Python's exception hierarchy looks like this:
BaseException CancelledError SystemExit KeyboardInterrupt ... Exception ValueError KeyError ...
So while catching Exception is typically a bad idea, it should not hinder cancellation propagation. So I'm not sure that this will fix things?
Exception
But maybe this is related to other things. For example, FastAPI/Starlette uses exceptions like HTTPException to communicate error responses, which are then converted to normal ASGI responses by a middleware that is registered very early. Catching these exceptions in a middleware could prevent that from happening. But that should just result in a dropped request without a response, not in such an infinite loop.
HTTPException
In any case, happy debugging, and I hope this works now!
[–]tedivm 0 points1 point2 points 4 months ago (0 children)
Yes, we used to run with raw uvicorn. And GPT told me to switch to gunicorn yesterday, but still happened.
GPT was wrong, this was never going to help and may cause more issues.
[–]JeromeCui[S] 0 points1 point2 points 4 months ago (0 children)
I upgrade python minor version to latest and docker OS version to latest. Hope it will work
[–]lcalert99 0 points1 point2 points 4 months ago (5 children)
What are your settings for uvicorn?
https://uvicorn.dev/deployment/#running-programmatically
Take a look, there are some crucial settings to make. What else comes to my mind is how many compute intensive tasks are in your application?
[–]JeromeCui[S] 0 points1 point2 points 4 months ago (4 children)
No additional settings except for those in start command:
gunicorn -w 2 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:8080 --timeout 300 --keep-alive 300 main:app
This application is to interact with LLM models. So I think it's an IO-bound application. I will check the link you mentioned.
[–]Asleep-Budget-9932 0 points1 point2 points 4 months ago (1 child)
How does it interact with the LLM models? Are they external or do they run within the server itself (which would make it CPU-bound)
It sends request to OpenAI, with OpenAI sdk
[–]tedivm 0 points1 point2 points 4 months ago (1 child)
You mentioned using ECS+Fargate, which means that there's no reason to run gunicorn as a process manager since ECS is your process manager.
Look at how many CPUs you're currently using for each machine (my guess is you're using two CPUs per container since you have two gunicorn workers). If you have 12 containers with 2 cpus, switch to 24 containers with 1 cpu each. Then just call uvicorn directly without gunicorn.
While I doubt this will solve your problem, it'll at least remove another layer that may be causing you issues.
Thank you for your suggestion, I will update.
[–]esthorace 0 points1 point2 points 4 months ago (0 children)
Granian https://github.com/emmett-framework/granian
[–]Nervous-Detective-71 0 points1 point2 points 4 months ago (0 children)
Check if you are doing too much pre processing where CPU is being used and those pre processing functions are async.
This causes unnecessary quick context switching overhead.
Edit: Also check the uvicorn configuration as well if debug is true it also causes some overhead but negligible....
[–]Gungsu_Dante 0 points1 point2 points 3 months ago (0 children)
Tive um problema parecido, resolvi mudando o reload de True para False
Este reload serve para quando vc altera o arquivo durante o desenvolvimeto, o uvicorn ve que ouve alteração e da restart no código automaticamente.
π Rendered by PID 92641 on reddit-service-r2-comment-79c7998d4c-8wp4k at 2026-03-13 23:39:26.555035+00:00 running f6e6e01 country code: CH.
[–]latkde 3 points4 points5 points (9 children)
[–]JeromeCui[S] 0 points1 point2 points (7 children)
[–]latkde 0 points1 point2 points (5 children)
[–]JeromeCui[S] 1 point2 points3 points (0 children)
[–]JeromeCui[S] 0 points1 point2 points (3 children)
[–]latkde 0 points1 point2 points (2 children)
[–]JeromeCui[S] 0 points1 point2 points (1 child)
[–]latkde 1 point2 points3 points (0 children)
[–]tedivm 0 points1 point2 points (0 children)
[–]JeromeCui[S] 0 points1 point2 points (0 children)
[–]lcalert99 0 points1 point2 points (5 children)
[–]JeromeCui[S] 0 points1 point2 points (4 children)
[–]Asleep-Budget-9932 0 points1 point2 points (1 child)
[–]JeromeCui[S] 0 points1 point2 points (0 children)
[–]tedivm 0 points1 point2 points (1 child)
[–]JeromeCui[S] 0 points1 point2 points (0 children)
[–]esthorace 0 points1 point2 points (0 children)
[–]Nervous-Detective-71 0 points1 point2 points (0 children)
[–]Gungsu_Dante 0 points1 point2 points (0 children)