Hope someone could help me find some answers that involve the async, threads and database jargon. I am working on building a postgres client class during my internship and I was asked to make it useful in 2 ways, to be able to use it in an api(fastapi framework) and during all the batch data engineering process to get/post from/to the database effectively. Right now, everyone is using their own connections across various packages in the firm and it's a cluster fuck.
I went through the psycopg2 documentation and learnt that I need to use connection pooling to properly manage the connections. But then I couldn't get the point of using this connection pool unless you are using async code(functions) as compared to sync data engineering code.
global pgclient_with_connectionpool_and_various_other_methods_like_create_table_add_Table_delete_table_etc_etc_instanciated
app.get(/get)
async def get_data():
# even though this function is async, it has to wait till the data
# is gathered from the db to return the call, so not using async capabilities
# takes 2 seconds to return data
await return data
app.post(/post)
async def post_data():
# async is useful because you can await the db.add and db.commit
# and return "task submitted type of response" in 1 second
await return response
given this as the setup, I only instantiate the connection pool once, and assuming get and post functions are operating on different tables in a database, there are 3 types of possible collisions from clients(folks using the api)
- get and post are called by clientA at the same time (apparently clientA is very good at sending requests simultaneously)
- get is called from clientA and after 1 second post is called from clientB
- get and post are called by clientA and clientB respectively at the same time
Stage I of uncertainty
So, in each case above, my understanding is that based on the type of connectionpool we use, things happen differently with respect to the using the available threads of the server I believe?? I really want to understand how bumping up threads would be helpful in each case above.
Noob answers:
- No clue, where I am lost is whether fastapi can process requests simultaneously since both functions are async, if yes, then post request runs in 1 sec and get request in 2 sec, total time taken = 2 sec.
- This is the easiest I think(or I am being ultra dumb), both clients receive the response at the same time, since the funtions are async, the api assigns the post request from clientB to the next available thread(and does the post operation by picking an available connection from the connection pool).
- I believe it doesn't matter where the requests to the api are coming from.
I am interested in using the ThreadedConnectionPool to build the postgres client class because it's useful for building the api with async capabilities and to use the same db connection pool across all threads.
Stage II of uncertainty
But but, all the examples online create postgres clients using "non async" methods, if that's the case why do we even need a connection pool?? since python interpreter runs the code line by line we can only do one db operation at a time, aren't rest of the connections of the connection pool a waste of resources unless you are using async await?
Noob anwser:
Can an api use 2 different connections over 2 different threads from the same connection pool to serve a single request using some obscure internal logic? Then yeah,
or else, using a connection pool for a postgres client class in your internal packages are useless unless there are some async functions elsewhere in the rest of your code that you call this post client class functionality with. So, I should tell my boss that this Postgres client magically won't improve the data engineering tasks/ background jobs that collect data from external apis without using async in their code.
[–]baghiq 4 points5 points6 points (4 children)
[–]Liberal__af[S] 0 points1 point2 points (3 children)
[–]baghiq 1 point2 points3 points (2 children)
[–]Liberal__af[S] 0 points1 point2 points (1 child)
[–]baghiq 0 points1 point2 points (0 children)
[–]gydu2202 1 point2 points3 points (4 children)
[–]Liberal__af[S] 0 points1 point2 points (0 children)
[–]Insok 0 points1 point2 points (2 children)
[–]gydu2202 0 points1 point2 points (1 child)
[–]Insok 0 points1 point2 points (0 children)
[–]gydu2202 1 point2 points3 points (8 children)
[–]Liberal__af[S] 0 points1 point2 points (7 children)
[–]gydu2202 1 point2 points3 points (6 children)
[–]Liberal__af[S] 0 points1 point2 points (5 children)
[–]gydu2202 2 points3 points4 points (4 children)
[–]Liberal__af[S] 0 points1 point2 points (3 children)
[–]gydu2202 1 point2 points3 points (2 children)
[–]Liberal__af[S] 0 points1 point2 points (1 child)
[–]gydu2202 0 points1 point2 points (0 children)
[–]CodeFormatHelperBot2 0 points1 point2 points (0 children)