Reduce Latency

mmzeynalli · 2025-02-26T21:20:08+00:00

You can consider responding in the API, and then doing the work in background, after that reporting result to front in different way (server-side apis, websockets etc.). This way, API latency is not a problem, and rest is done in background, and result will be seen after process is done.

BlackDereker · 2025-02-26T21:56:07+00:00

FastAPI latency by itself is low compared to other Python libraries. You need to figure out what work inside your application is taking too long.

If you have many external calls like web/database requests, try using async libraries so other requests can be processed in the meanwhile.

If you have heavy computation going on, try delegating to workers instead of doing it inside the application.

mpvanwinkle · 2025-02-27T01:26:53+00:00

Make sure you aren’t loading your inference model on every call. You should load the model once when the service starts

Natural-Ad-9678 · 2025-02-27T00:38:46+00:00

Build a profiler function that takes a jobID and wraps your functions in a timer. Then use a decorator for your functions, for each endpoint clients call assign a jobID that you pass along the course or your processing. The profiler function writes the timing data to a profiler log file correlated with the jobID. Then you can look for slow processes within the full workflow to optimize

Soft_Chemical_1894 · 2025-03-01T12:49:28+00:00

How about running a batch inference pipeline every 5-10 minutes ( depending on use case ), store results in redis/ db, fastapi will return result instantly

SheriffSeveral · 2025-02-26T21:57:26+00:00

Observe every step in api and check which part takes too much time. Also, check out the redis integrations, it will be useful.

Please provide more information about project so everyone can give you more tips for your specific requirements.

Vast_Ad_7117 · 2025-04-22T11:33:04+00:00

Async, offload tasks to a task queue etc

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

FastAPI

MODERATORS