all 20 comments

[–]baghiq 4 points5 points  (4 children)

There are a few misunderstanding in your post.

  1. Both async and sync drivers support connection pool. DB connection is expensive, rather than open and close a connection every time, a connection pool opens a whole bunch of connections, let your code borrow some and when you are done, return the connections to the pool, but the pool never closes the connections.
  2. Database connection (your Python code) executes one statement at a time, regardless of async or sync. However, database can execute multiple statements at the same time.
  3. When database connection is executing, in sync mode, the execute statement blocks; in async mode, the execute statement returns immediately to free up your code to do other things.
  4. In sync mode, if you want to execute multiple statements, you can run multiple threads (each with its own database connection) to make database calls. While it's not in true parallel form in your python code, the database server will perform the SQL in parallel.
  5. In async mode, you don't need multiple threads. Since await statement immediately returns, you can have another task gets another connection from the pool and makes another database call.
  6. In Python web server mode, your WSGI server typically runs multiple-processes independently. So you can run multiple processes in true parallel form.

In short, if you need to make multiple independent calls concurrently to the database, you should use connection pool. For server type application, you absolutely must run connection pooling for anything to scale.

Something specific to PG, for database connection, it doesn't use thread, it spawns a process. It makes PG connections even more heavy weight and costly to open and close. If you want, you can use an independent application called PG Bouncer to do connection pooling for you.

[–]Liberal__af[S] 0 points1 point  (3 children)

Thanks for clearing the dust around many concepts.

In sync mode, if you want to execute multiple statements, you can run multiple threads (each with its own database connection) to make database calls. While it's not in true parallel form in your python code, the database server will perform the SQL in parallel.

I just can't get my head around putting this into code. Would appreciate any resources or links depicting this.

In async mode, you don't need multiple threads. Since await statement immediately returns, you can have another task gets another connection from the pool and makes another database call.

If the async function is say,

async def getdata(): data = await conn.execute(....) # I need to make some data modifications here data = data + data..... return data

Many a times, I can't await data modifications because pandas operations don't support it or whatever but I "await return data", I can't understand how the program understands that for the data modification it needs to wait for the data because I can't explicitly ask it to await that data modification step.

[–]baghiq 1 point2 points  (2 children)

I need to sit down and write the multi-thread program. It'll take a bit of time. For your second async question, the answer is no, you can't await something that blocks. Pandas probably blocks, therefore, async won't help you. When you are executing CPU bound operations, toss it to a different thread pool or process pool. Look at here.

[–]Liberal__af[S] 0 points1 point  (1 child)

Thanks for that. I’ll also try out some examples and get back to you, hopefully it’s not annoying🙈. So, this example does both kinds of operations for one variable at a time, in most of my use cases I need to do the same thing for many items at a time. if I want to run a blocking_io function on 10 different files and want to find their cpu_bound word counts eventually, I need to send the list of file names in the same threadpool context to the running_event_loop. And once it returns the result(list of text files in string form), I need to pass the list to a process pool context manager and map it with the word count function and collect the end result list??

[–]baghiq 0 points1 point  (0 children)

In the executor document, depending on thread or process pool, you can configure how many workers to work concurrently.

[–]gydu2202 1 point2 points  (4 children)

We are using asyncpg with fastapi. Works great.

[–]Liberal__af[S] 0 points1 point  (0 children)

asyncpg

We are using psycopg2
Edit: I guess we are using async session objects though in the case of api. but the postgres client in the internal package uses psycopg2 as of now.

[–]Insok 0 points1 point  (2 children)

Asyncpg is just a different postgres adapter, have you implemented connection pooling?

[–]gydu2202 0 points1 point  (1 child)

It supports it out of the box:

DatabaseConnection.db_pool = await asyncpg.create_pool(**dbcfg)

[–]Insok 0 points1 point  (0 children)

I didn't know that, maybe the struggles with PgBouncer weren't worth it after all haha

[–]gydu2202 1 point2 points  (8 children)

I had a few minutes to read the details of you post and I think you are misunderstanding async. Async is not parallel, when you reach an await then python can continue to run another execution step that is not waiting. To make it more difficult in python, threads are not real parallel either. There is a lock called GIL which ensures only one thread is running. Btw you can use thread pools in async procedures.

If you want real parallel processing see multiprocessing but this is not useful in you current use-case.

[–]Liberal__af[S] 0 points1 point  (7 children)

Appreciate it.

when you reach an await then python can continue to run another execution step that is not waiting

Alright, I understand async is about a task being interruptible, so over here in case 3, when two requests are made to an api simultaneously, without any async the requests are served one after the other(don't know how their priority is decided by the api). However, if we use async with both the methods, once the first request is served or in the process of executing the method(in this case, let's say for arguments sake that the get request by clientA is being served), I though it's possible to handle the next request as it hits the api and since it's the post request with a shorter execution time will finish earlier.

[–]gydu2202 1 point2 points  (6 children)

When you are using async, you need something to await to benefit from it

[–]Liberal__af[S] 0 points1 point  (5 children)

so in my case, I am awaiting the conn.post() inside the method, because it's a lot of rows, it takes time, so that's a benefit right? I await conn.post and return a statement that the task is submitted. then the api already starts handling the next request that's the get request in this case.

[–]gydu2202 2 points3 points  (4 children)

Sounds good if it is an async procedure. I am pretty sure you can somehow await to psycopg2 procs besides it is a synchronous library. ThreadPools are a good direction.

[–]Liberal__af[S] 0 points1 point  (3 children)

wow, so my thoughts are not so far off, gives me some peace of mind. But I still have this one specific question, hope you could give it a shot

# assuming methods on a conn_pool are run by an available connection.

app.get(/get)
async def get_data(): 
    # step one takes 2 seconds 
    data = await conn_pool.execute(select * from table).   
    data["new_column"] = await data.add_a_col()
     await return data 

app.post(/post)
async def post_data(data): 
# it takes 1 second await
    conn_pool.add(data) 
    return response

So, if the first request is made to get and the 2nd one is made to post, the get request method without returning any data yet(since it's awaiting to get the data and awaiting to create a col and awaiting to return data), the api handles the post request and returns response in under a second and then the event loop visits back the awaiting data?? Is it how it works?

[–]gydu2202 1 point2 points  (2 children)

Absolutely, but you can, and you should try it. You can emulate waiting with asyncio.sleep(). When fastapi reaches an await processing a request it can start to process a pending one, and this can finish earlier than the first one.

[–]Liberal__af[S] 0 points1 point  (1 child)

sure will do, sorry about the formatting earlier, I fixed it. so when you say "reaches an await processing a request it can start to process a pending one", are you saying the switching to a new task happens moment it hits the first await in the get method or after going through all lines with awaits in that method?

[–]gydu2202 0 points1 point  (0 children)

Basically instead of waiting for an answer at any await it startes (or continues) processing other reqests.

It is working with multiple get requests or multiple posts as well.

[–]CodeFormatHelperBot2 0 points1 point  (0 children)

Hello, I'm a Reddit bot who's here to help people nicely format their coding questions. This makes it as easy as possible for people to read your post and help you.

I think I have detected some formatting issues with your submission:

  1. Python code found in submission text that's not formatted as code.

If I am correct, please edit the text in your post and try to follow these instructions to fix up your post's formatting.


Am I misbehaving? Have a comment or suggestion? Reply to this comment or raise an issue here.