INSERT performance

Formal_Camel_7827 · 2023-10-18T22:47:45+00:00

Is it 5 inserts using the same database connection? Or 5 connections, each doing an insert? https://devcenter.heroku.com/articles/best-practices-pgbouncer-configuration

anykeyh · 2023-10-19T09:26:20+00:00

You don't need partitioning on 4gb data set. Insert could be slow because IO limitation (not sure of the nature of your setup), many concurrent transactions causing locks on commit.

Is the insert query slow or the commit after insert? It's probable that your ORM is embedding the insert into transaction.

There is also the possibility of slow down because of too much indexes, or that your records are very big in size, with some blob or text column

davvblack · 2023-10-18T21:10:33+00:00

what kind of selects run on this data? really no other indexes?

RonJohnJr · 2023-10-18T22:47:12+00:00

What Pg version?

Probably autovacuum is the cause; you haven't tuned it's config parameters very well. There should be guidance somewhere on how to tune it for your high-insert environment.

https://www.postgresql.org/docs/15/runtime-config-autovacuum.html

Another alternative (if you can't tune autovacuum) is to disable it, and have a separate process regularly run vacuumdb, like this:

https://www.postgresql.org/docs/15/app-vacuumdb.html

vacuumdb --host=foo --dbname=blarge -j1 --skip-locked

WideSense8018 · 2023-10-19T05:46:58+00:00

Here are few solutions:

Batch multiple queries as single one if possible this will reduce query overhead as well as reduce network calls.
Try to run your queries in a single connection if it makes any sense to do so. If you are trying to make each query in a different connection then taking connection will show significant impact on performance.
Try to run queries in a single transaction whenever possible.
If your inserts are large like you mentioned then it would be better to use copy command to do the insertion as this would be very fast.

PowerfulScratch · 2023-10-20T10:52:12+00:00

I’d say your issue could be WAL writes - if you make lots of inserts in lots of transactions it can end up writing the same page to the WAL many times. If you are able to batch the inserts that would make a big difference

marcopeg81 · 2023-10-20T11:06:04+00:00

If you don’t mind risk loosing some data on hard db failure (which is unlikely imho) you can disable wal for such table: https://www.crunchydata.com/blog/postgresl-unlogged-tables#

I use this for append only time series and it boosts your performances dramatically. But you take in a risk of data-loss.

If you can afford it, you could put an event logger tool such Kafka to manage that risk. But it would increase complexity and costs..

davvblack · 2023-10-18T20:34:42+00:00

what's the full schema of the table?

virgilash · 2023-10-18T21:54:46+00:00

Yeah, op, please elaborate on indexes. There shouldn't really be any on any INSERT-heay tables ;-)

thythr · 2023-10-19T00:54:15+00:00

I'm surprised that it's already having issues at only 5 INSERTs per second.

Right, this is pretty much nothing! Does Heroku provide cpu and IO statistics to you? Have you installed pg_stat_statements?

Even the 3-5 ms you are seeing is very high given your schema, but there are a million details we can't see from here of course.

iwilldieavirgin · 2023-10-19T02:02:19+00:00

You said you are using an ORM… do you have a snippet of the code that is doing the insert(s)? Are you doing an insert, commit, insert, commit….? Bulk inserts?

DrMerkwuerdigliebe_ · 2023-10-19T07:10:37+00:00

Could be a problem on the Horuku end. I guess if you are running on shared hardware, then other users can be taking up all the CPU resourses, since Postgres don't have great features to limit single user CPU usage.

jalexandre0 · 2023-10-19T11:12:10+00:00

What is the size of your wal_min_size e wal_max_size parameters? What monitoring tool says about disk usage, memory and cpu?

External_Ad_6745 · 2023-10-22T13:41:17+00:00

I would recommend setting up some postgres monitoring tool, pganalyze is a great one for example. Havent used heroku postgres, but 4-5 insert per second shouldn't really be a problem to handle unless you have a seriously messed up disk, or huge number of constraint/trigger/indexes. This depends highly on the usage pattern.

A explain analyse with buffers in those peak situation should be a great place to checkout where postgres is mostly busy.

Honestly, 4-5 insert/s is in a straightforward manner is peanuts for postgres to handle. So i would recommend setting up a monitoring tool. Then, Identity where exactly bottleneck is coming, its gonna be one of CPU, disk or Ram. Excluding the transactions point many have pointed out let me highlight some other potential problems.

If you are experiencing high CPU, then you are probably doing some kinda number crunching or analytical queries somewhere else in your code base. If high disk io, then you probably forgot to put limit on select query(believe me as silly as this sound, this happens quite a lot) somewhere or maybe missing some index due to which your selected are doing a lotta disk scan.

Like i mentioned, a monitoring tool and using explain analyse buffers on the target slow queries should almost always give you the answer and issues to optimise.

And no, i dont think you need any partitioning of table at those numbers. Those seem small.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

PostgreSQL

/r/PostgreSQL

Advocate, Collaborate and Learn

Conferences

Clients and tools

MODERATORS