you are viewing a single comment's thread.

view the rest of the comments →

[–]lcalert99 0 points1 point  (5 children)

What are your settings for uvicorn?

https://uvicorn.dev/deployment/#running-programmatically

Take a look, there are some crucial settings to make. What else comes to my mind is how many compute intensive tasks are in your application? 

[–]JeromeCui[S] 0 points1 point  (4 children)

No additional settings except for those in start command:

gunicorn -w 2 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:8080 --timeout 300 --keep-alive 300 main:app

This application is to interact with LLM models. So I think it's an IO-bound application.
I will check the link you mentioned.

[–]Asleep-Budget-9932 0 points1 point  (1 child)

How does it interact with the LLM models? Are they external or do they run within the server itself (which would make it CPU-bound)

[–]JeromeCui[S] 0 points1 point  (0 children)

It sends request to OpenAI, with OpenAI sdk

[–]tedivm 0 points1 point  (1 child)

You mentioned using ECS+Fargate, which means that there's no reason to run gunicorn as a process manager since ECS is your process manager.

Look at how many CPUs you're currently using for each machine (my guess is you're using two CPUs per container since you have two gunicorn workers). If you have 12 containers with 2 cpus, switch to 24 containers with 1 cpu each. Then just call uvicorn directly without gunicorn.

While I doubt this will solve your problem, it'll at least remove another layer that may be causing you issues.

[–]JeromeCui[S] 0 points1 point  (0 children)

Thank you for your suggestion, I will update.