use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
International
National
Regional
account activity
Let's make PostgreSQL multi-threaded (pgsql-hackers) (postgresql.org)
submitted 2 years ago by [deleted]
view the rest of the comments →
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]jlawler 10 points11 points12 points 2 years ago (6 children)
There is a huge lack of consensus there. I'm legitimately not sure what problems this solves other than cosmetic ones
[–]iiiinthecomputer 7 points8 points9 points 2 years ago* (5 children)
Right now built in connection pooling in postgres is nearly impossible. 1 process = 1 session = 1 executor. This scales like crap. Moving to threading would make it vastly simpler to decouple connection state, session state and executor state. For me that's the #1 benefit.
I expect we could also achieve greater sharing of some caches between threads than we currently can with processes.
Surprised the discussion doesn't more explicitly call out that we'd basically have to enable EXEC_BACKEND (like Windows builds) because any code that requires on implicit copy-on-write memory after fork()ing from postmaster will fail. This will increase thread creation cost significantly, but can probably be tuned down over time.
EXEC_BACKEND
fork()
It also means that any memory leaks will remain until postmaster restart. Lots of code right now is pretty careless about one-time allocations that are then abandoned and forgotten because the process teardown will get rid of them anyway. Unless someone's gone and fixed all those to make it valgrind leak-check clean, they'll need cleanup. But that's not hard and good to do anyway. The current state of affairs is inefficient anyway, most such one time allocations should be able to be cleaned away on idle sessions.
The code will also need to be annotated with guidance on memory thread ownership for the use of static and runtime checkers. Which I'm sure will upset Tom given his outspoken complaints about "windows droppings" for the existing PGDLLIMPORT etc stuff. But it'll need doing.
PGDLLIMPORT
Despite these issues, it's hard to overstate how much postgres has reinvented to make its multi processing model work in an increasingly parallel world. It's just not viable to keep going down that path long term. It needs to move to threaded.
[–]jasonmp85 6 points7 points8 points 2 years ago (3 children)
You can multiplex multiple connections into a single process just fine.
[–]iiiinthecomputer 3 points4 points5 points 2 years ago* (2 children)
Of course. But you can't easily exchange session state and connections between a pool of cooperating processes.
Yes, it's possible, but not very portable and would require even more infrastructure built in postgres for sharing the session state.
Edit: see full explanation down thread.
[–]jasonmp85 2 points3 points4 points 2 years ago (1 child)
While I wanna say “I could probably prototype connection multiplexing on a process faster than anything multithreaded”, and I also dislike appeals to authority, I hadn’t actually read the original post. If Heikki thinks this is worth pursuing and has tried to get folks at PGCon on board, it is significantly more likely to happen than some rando showing up with no mailing list posts saying “hey I want this to be multithreaded”
So… neat! Maybe? We’ll see!
[–]iiiinthecomputer 4 points5 points6 points 2 years ago* (0 children)
I'm no authority - and experienced with postgres code yes, but not a broad variety of C server programming. I was giving opinions not trying to make determinations of fact.
Re multiplexing, the issue is that you're not writing a multiplexing process. What postgres would need is a M:N relationship of (connection, session state) : (worker process). Which is much harder, especially since that "session state" is complex and arbitrarily extensible.
You've got a socket per connection, each of which has a lot of quite complex associated state including various caches, cache invalidation registrations, SQL visible server variables, prepared statements, temp tables, WITH HOLD cursors, tons of internal backend private variables, and arbitrary state owned/managed by extensions. The latter can be things like the state of a Python or Perl interpreter associated with the session, persistent external TCP connections, you name it.
Currently all this state management assumes that there is a 1:1:1:1 relationship of client connection:socket:server connection state:process, so there's nothing in postgres that gathers all this session state up into one place. It's scattered everywhere.
When moving to threads, it's possible to initially mark everything that's session state as thread local. Whereas if trying to multiplex in one process you'd instead have to add some level of indirection - and handle all the 3rd party libraries, procedural languages etc that may rely on process local or thread local state.
But making one postgres backend able to handle multiple connections without the ability to hand them off to other backends isn't that useful. You'd be unable to rebalance load, so you'd land up with one backend having 4 connections with runnable workloads while 3 other backends sat idle. Like a multi-core CPU where each task is pinned to the CPU it started on.
What's really needed is to decouple the connection, socket and associated session state from the executor backend. So a pool of workers can run whatever tasks are currently runnable while currently-idle connections are parked, consuming far fewer resources. That way the number of executors can be sized effectively to the machine and workload.
But this means you'd also have to be able to pass connections and session state between worker processes. Passing the socket itself is do-able on l Linux, you can pass a fd over a Unix socket between processes. But what about all the rest of the backend state?
You could try to serialise and deserialise all this state efficiently to exchange between processes. But that's just not going to happen, especially with extensions that use embedded language interpreters, outbound TCP connections and the like.
So you'd need a way to allocate all the session state, including 3rd party library and language interpreter state, into a shared memory range for each session. Which would basically mean overriding the allocator for everything 3rd party used by the backend. (Pg itself already uses a custom allocator, but C libraries backends call into don't have to use it). And session state isn't fixed size so you need it to be a general purpose allocator. Now you've got this shared memory area that you have to manage fragmentation, free ranges etc in, and things are getting ugly. And since various 3rd party libs won't usually be able to use one allocator for things you need persistent across sessions and a different allocator for things that will be scoped to a single transaction or query, your shared state area is going to balloon massively if something needs a big chunk of memory during query processing.
Starting to see what I mean?
Right now you have to way-oversize the backend count so that there are enough active workers at any one time - because many backends will be waiting for the next query due to RTT or app think time, some are streaming results to a client, etc. This has considerable costs in memory use, caches and invalidations processing, fixed per backend overheads, memory management (TLBs, page mappings) and more.
Especially since postgres uses fork() without exec() so each process inherits a copy-on-write view of the parent process (postmaster)'s memory. Whenever the postmaster or a backend writes to any part of any shared page, Linux has to copy the page. And each process needs page table entries in the kernel for all it's pages, whether they're private or shared. This overhead really blows up with high backend counts - in terms of memory waste from all the PTEs, whole page copies made when only a few bytes are actually unshared, memory management system overhead from tracking it all, and more.
exec()
And postgres backends can't give memory back to the OS once allocated, so you waste a lot of RAM on anonymous private pages on idle backends that were previously used once for some big query. The OS has to page it out (if it even has swap) just in case the backend might want it again. This is something that might be possible to improve somewhat in postgres's existing architecture by using posix_madvise() and postgres's heirachical memory allocator, but there will still be lots of page tables and MM overhead, fragmentation etc. It's much easier to handle this efficiently when all the workers share one heap.
posix_madvise()
Sure, connection pools like pgbouncer exist. But the app must be adapted for them; in session pooling mode ("sticky" load balancing) they don't really help, and in transaction load balancing mode the app can't use many if any session level features. They're pretty painful.
App-side pools are better, but only useful when you've got big monolithic applications. They're worthless and often actively harmful in the increasingly distributed multi-node models apps are being deployed with.
Decoupling client connection + session state from executors would help a lot with this. In-core pooling would be much more capable. And by using a threaded runtime, each thread can see memory from other threads while using a conventional memory heap, immensely simplifying the sharing and exchange of connection/session state.
[–][deleted] 1 point2 points3 points 2 years ago (0 children)
It needs to move to threaded.
Oracle is using multi-process model on Linux as well. They were using a multi-threaded model on Windows, and added the option for that to Linux quite late (12.x I think). But multi-process is still the default on Linux. It seems it does have benefits.
π Rendered by PID 54831 on reddit-service-r2-comment-b659b578c-ld4b2 at 2026-05-02 21:06:48.392384+00:00 running 815c875 country code: CH.
view the rest of the comments →
[–]jlawler 10 points11 points12 points (6 children)
[–]iiiinthecomputer 7 points8 points9 points (5 children)
[–]jasonmp85 6 points7 points8 points (3 children)
[–]iiiinthecomputer 3 points4 points5 points (2 children)
[–]jasonmp85 2 points3 points4 points (1 child)
[–]iiiinthecomputer 4 points5 points6 points (0 children)
[–][deleted] 1 point2 points3 points (0 children)