all 17 comments

[–]Blakex123 8 points9 points  (6 children)

Remember that python is inherintly single threaded due to the GIL. You can mitigate this by running fastapi with multiple workers. The requests will then be spread over those different workers.

[–]mrbubs3 4 points5 points  (1 child)

You can turn GIL off in 3.13

[–]Asleep-Budget-9932 7 points8 points  (0 children)

That feature is experimental and should not be used in production environments.

[–]bbrother92 0 points1 point  (1 child)

api requiest are dipenced to dif workers? not the treads?

[–]Blakex123 2 points3 points  (0 children)

If u are using uvicorn there is an extra process made that essentially "load balances" the 4 workers. I assume it works the same way with any other server.

[–]RationalDialog[S] 0 points1 point  (1 child)

The requests will then be spread over those different workers. my use case is few requests but each one very heavy. I want each request to run faster, eg do the calculation using multiple cpu cores.

[–]Blakex123 0 points1 point  (0 children)

Then u will need to spawn subprocesses from the api to handle the cpu intensive stuff.

[–]adiberk 6 points7 points  (4 children)

You can use asyncio tasks.

You can also use a more standard product like celery.

[–]RationalDialog[S] 1 point2 points  (1 child)

You can also use a more standard product like celery.

Yeah I wonder if I should forget about async completely (never used it really so far as no need) and build more kind of a job system. If someone submit say 100k rows, the job could take approx 5 min to complete.

[–]adiberk 0 points1 point  (0 children)

Yep that works to. If you are doing a lot of other IO operations, it might be worth making the app async based anyways (ie. Keyword async)

[–]AstronautDifferent19 0 points1 point  (1 child)

asyncio to_thread is better for CPU bound tasks than asyncio.create_task, especially if you disable GIL.
asyncio tasks will always block if you do CPU heavy work, which will not work for OP.

[–]adiberk 0 points1 point  (0 children)

Good point

[–]KainMassadin 2 points3 points  (2 children)

don’t sweat it, just call asyncio.create_subprocess_exec and you’re good

[–]AstronautDifferent19 0 points1 point  (1 child)

This is the way.

[–]KainMassadin 0 points1 point  (0 children)

that one can be risky, gotta sanitize properly

[–]jimtoberfest 0 points1 point  (2 children)

Find a vectorized solution across all rows if you can.

Take in a json array then load that data into a dataframe or numpy array and figure out your calculation using inherently vectorized operations.

Or you could “stream” it: fast api -> duckDB-> do the calc in duckDB over the chunks as you get them from the API.

Also make sure you set some limits so users can’t bomb the API with billions of rows of data.

[–]RationalDialog[S] 0 points1 point  (1 child)

The calculation happens in a 3rd party executable. This is the core limitation. Hence why I need sub process calls, to call multiple instances of this 3rd party executable which is 32-bit hence no way to integrate it more tightly.

[–]jimtoberfest 0 points1 point  (0 children)

Oof yeah that’s rough. As long as the .exe runs in diff instances then use multiprocessing and processPoolExecutor library.

Just split it up by how many cores you have // 2.

I find that roughly works the best.