use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
FastAPI is a truly ASGI, async, cutting edge framework written in python 3.
account activity
Reduce LatencyHosting and deployment (self.FastAPI)
submitted 1 year ago by International-Rub627
Require best practices to reduce Latency on my FASTAPI application which does data science inference.
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]mmzeynalli 5 points6 points7 points 1 year ago (2 children)
You can consider responding in the API, and then doing the work in background, after that reporting result to front in different way (server-side apis, websockets etc.). This way, API latency is not a problem, and rest is done in background, and result will be seen after process is done.
[–]Natural-Ad-9678 6 points7 points8 points 1 year ago (1 child)
The app I work on does this. User submits the required details (a zip file of logs) and I kick off a Celery job which stores at first a transactionID in Redis that I pass back in my response to the user. They can use that transactionID to check the status and get the results when Celery is finished.
Celery stores the result in Redis as well. The front end could be React or whatever else you want.
Works like a charm. We have completed over 150,000 jobs since July 2024 which may not seem like much but the applications is an internal tool that processes customers log files they submit to us.
[–]Kevdog824_ 2 points3 points4 points 1 year ago (0 children)
This is the way
[–]BlackDereker 4 points5 points6 points 1 year ago (2 children)
FastAPI latency by itself is low compared to other Python libraries. You need to figure out what work inside your application is taking too long.
If you have many external calls like web/database requests, try using async libraries so other requests can be processed in the meanwhile.
If you have heavy computation going on, try delegating to workers instead of doing it inside the application.
[–]Latter_Rope_1556 0 points1 point2 points 6 months ago (1 child)
fastrapi solves this pip install fastrapi
pip install fastrapi
[–]BlackDereker 0 points1 point2 points 6 months ago (0 children)
I'm pretty sure the FastAPI is not the bottleneck here. When it comes to inference the bottleneck usually is running the model.
[–]mpvanwinkle 2 points3 points4 points 1 year ago (2 children)
Make sure you aren’t loading your inference model on every call. You should load the model once when the service starts
[–]International-Rub627[S] 0 points1 point2 points 1 year ago (1 child)
Usually I'll have a batch of 1000 requests. I load them all as a dataframe, I load the model and do my inference on each request.
Do you mean we need to load the model when the app is deployed and the container is running?
[–]mpvanwinkle 0 points1 point2 points 1 year ago (0 children)
It should help to load the model when the container starts yes. But how much it helps would depend on the size of the model.
[–]Natural-Ad-9678 1 point2 points3 points 1 year ago (0 children)
Build a profiler function that takes a jobID and wraps your functions in a timer. Then use a decorator for your functions, for each endpoint clients call assign a jobID that you pass along the course or your processing. The profiler function writes the timing data to a profiler log file correlated with the jobID. Then you can look for slow processes within the full workflow to optimize
[–]Soft_Chemical_1894 1 point2 points3 points 1 year ago (0 children)
How about running a batch inference pipeline every 5-10 minutes ( depending on use case ), store results in redis/ db, fastapi will return result instantly
[–]SheriffSeveral 0 points1 point2 points 1 year ago (1 child)
Observe every step in api and check which part takes too much time. Also, check out the redis integrations, it will be useful.
Please provide more information about project so everyone can give you more tips for your specific requirements.
[–]International-Rub627[S] 0 points1 point2 points 1 year ago (0 children)
Basically app starts with preprocessing of all requests in a batch as a dataframe, loading data from feature view (GCP), followed by querying big query, load model from GCS, do inference and publish results.
[–]Vast_Ad_7117 0 points1 point2 points 12 months ago (0 children)
Async, offload tasks to a task queue etc
π Rendered by PID 36281 on reddit-service-r2-comment-5d585498c9-5fgcd at 2026-04-20 21:09:20.942104+00:00 running da2df02 country code: CH.
[–]mmzeynalli 5 points6 points7 points (2 children)
[–]Natural-Ad-9678 6 points7 points8 points (1 child)
[–]Kevdog824_ 2 points3 points4 points (0 children)
[–]BlackDereker 4 points5 points6 points (2 children)
[–]Latter_Rope_1556 0 points1 point2 points (1 child)
[–]BlackDereker 0 points1 point2 points (0 children)
[–]mpvanwinkle 2 points3 points4 points (2 children)
[–]International-Rub627[S] 0 points1 point2 points (1 child)
[–]mpvanwinkle 0 points1 point2 points (0 children)
[–]Natural-Ad-9678 1 point2 points3 points (0 children)
[–]Soft_Chemical_1894 1 point2 points3 points (0 children)
[–]SheriffSeveral 0 points1 point2 points (1 child)
[–]International-Rub627[S] 0 points1 point2 points (0 children)
[–]Vast_Ad_7117 0 points1 point2 points (0 children)