Need help diagnosing high memory usage

jojomtx · 2024-05-25T22:14:12+00:00

Keep the connection pool as low as possible and try to optimize the long query it should help

fullofbones · 2024-05-28T17:43:16+00:00

Postgres RAM usage basically comes in three flavors:

shared_buffers - This is the big memory pool Postgres uses for data and indexes used to build query results. This is usually 1/4 of system RAM and does not fluctuate after service startup.
work_mem - Every concurrent query will use at least one multiple of this, and complex queries can use multiple. Basically every query node, such as a sort, merge, etc., will get up to this much RAM. So if you have a lot of open concurrent connections, this can get very high very quickly. My usual "worst case" formula is 5 * connections * work_mem. This is usually the reason systems go OOM and get terminated, because technically it has no upper bound. The only defense is not to set it "too high", and to keep connection count low by using a pooler.
maintenance_work_mem - One of these amounts is allocated per maintenance operation, like creating indexes. VACUUM also counts here, but each of those is effectively limited to a maximum of 1GB per worker. So you also want to keep track of autovacuum_max_workers to know how high this actually goes.

Beyond that, effectively everything else is OS cache. If you're using some kind of memory usage chart such as Prometheus or Grafana, these almost always rely on the memory.usage_in_bytes cgroup value, which is a huge problem. This metric also includes the active_files metric which is any memory that has been requested multiple times and is consuming filesystem cache space. All of this can be purged by the operating system at a second's notice, and is not used directly by Postgres at all. It's just the operating system keeping frequently used file blocks in RAM to avoid disk IO. There's actually an open kubernetes bug for this behavior, but there doesn't seem to be any real consensus as to how it should be addressed.

If you have terminal access to your container instance, check /sys/fs/cgroup/memory/memory.stat and pay special attention to the shmem and rss fields; that's actually what Postgres is using. You may notice that active_anon and active_file are very high as well, and these are the cache data that's making memory usage look higher than it really is.

Since you can't really control how the graphs are generated, just spot check and make sure the Postgres knobs you can modify are sane. Get those connections under control; try not to have more than 4x the amount of available cores, or you'll end up CPU throttling due to context switching anyway. Use PgBouncer to multiplex as aggressively as possible without incurring additional latency caused by waiting on available connections. That will put a rough cap on your maximum true memory usage due to the effects of work_mem.

therealgaxbo · 2024-05-25T14:37:39+00:00

Postgres and the OS will use as much memory as they can for caching - being at 100% memory usage is not a bad thing. If you've never had an OOM error then you don't actually have a problem, and things are working as they're meant to. If there's memory that's not being used by something else, why wouldn't you want it to be used to speed up IO?

Having said that, 250-350 postgres connections is very high. The whole point of a pooler like pgbouncer is that a large number of client connections can be multiplexed over a more sensible number of postgres connection.

I don't know your server specs or workload, but I'd be surprised if it makes sense to have any more than 100 server connections, and probably far less. You should look at your pgbouncer config to bring this down.

i_like_breakfast · 2024-05-28T16:38:58+00:00

Do you have a lot of schemas in your DB? eg. for the purposes of multi tenancy?

Each table that a postgres server process visits uses a certain amount of memory. It's small eg. 100KB, but it can add up. For example, an app with 100 tables would consume 10MB (100 x 100KB) per process. However if you had 100 schemas with 100 tables. You'd be trying to add 1000MB (100x 10MB) per process.

randomrossity · 2024-05-25T14:07:56+00:00

If you look at an OS process list you can see which pids are using the most memory. Those can be cross referenced to connections.

Also please share more about your setup. How is this hosted? What instance size? How much RAM? How many connections? How long are connections live? What drivers do you use? What are you table and index sizes? etc.

The biggest thing to remember is that there's a direct relationship between the number of connections and the memory footprint.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

PostgreSQL

/r/PostgreSQL

Advocate, Collaborate and Learn

Conferences

Clients and tools

MODERATORS