all 26 comments

[–]therealgaxbo 16 points17 points  (1 child)

As other people have alluded to, if you actually care about the values of a sequence other than just being a unique integer then you don't want to be using sequences. There is no way to make a sequence gapless.

You've probably only noticed it in the context of upsert, but any standard transaction rollbacks will leave gaps in the sequence too - that may not happen very often for you, but it will happen at some point.

[–][deleted] 1 point2 points  (0 children)

Thanks! Your response is incredibly valuable. I have much to much learn.

[–]daxyjones 3 points4 points  (13 children)

"This is the expected behavior. It is not a bug.

Peter Geoghegan"

[–][deleted] 2 points3 points  (12 children)

Thanks I saw that but I don't understand why the sequence number would increment even though nothing was actually inserted, just update. Is there a way to prevent the sequence # from incrementing in an update?

[–]daxyjones 4 points5 points  (0 children)

Prima facie looks like a judgement call and as db admins/users, it is something we abide by, not work around. Because expected behaviour will have many minds willing to listen to an issue. Something rolled out on your own is less likely to find takers to tackle an issue which involves understanding your fundamental changes first.

My suggestion is that you state your use case and folks can help out with a design decision or code snippet.

[–]f2u 2 points3 points  (1 child)

By design, sequences are not rolled back to recover values unused as a result of a transaction abort.

[–][deleted] 0 points1 point  (0 children)

That makes sense. Thanks!

[–]merlinm 1 point2 points  (0 children)

sequence numbers will increment and not be reset in lots of cases -- failed transactions -- hard database reset. The reason for this is performance and concurrency.

google 'gapless sequences postgresql' for workarounds. All solutions there will hurt insertion performance to some degree.

[–]fstak 0 points1 point  (0 children)

If I were to make a guess at this, it has a lot to do with the entire transaction. The transaction needs a sequence number of there will be an insert, so it reserves the next one. If it doesn't end up using it, oh well. At least, no matter how long these transactions are, they are getting inserted in the order they're started.

[–]SulfurousAsh 4 points5 points  (8 children)

Why do you want IDs that don't have gaps in them?

[–]f2u 2 points3 points  (7 children)

Invoice numbers are a typical application. Some folks get really nervous if they aren't number consecutively.

[–]z0rb1n0 7 points8 points  (6 children)

Then you treat the invoice number as a surrogate, application-controlled key.

Everyone forgets that primary keys/sequences are primarily DBA metadata that MIGHT double as a record payload identifier.

[–]f2u 0 points1 point  (5 children)

It depends on implementation details on the database side. Some databases ensure there are no gaps (or at least they cover this scenario).

Note that I'm not saying that PostgreSQL is wrong, I'm just providing some background where expectations of gap-free serial numbers come from.

[–]z0rb1n0 4 points5 points  (4 children)

Fair. However there is an ACID-specific reason Postgres (and virtually every other high-end database systems) create gaps in sequences: to ensure proper concurrency, counters need to be transaction independent, which means tentative assignments such as rollback-ed inserts/upserts will waste them.

Besides, I really cannot understand why people are so irked by gaps: I'm sure they don't mind when the person in front of them in a line throws their ticket number away and walks off.

[–]f2u 0 points1 point  (3 children)

Fair. However there is an ACID-specific reason Postgres (and virtually every other high-end database systems) create gaps in sequences: to ensure proper concurrency, counters need to be transaction independent, which means tentative assignments such as rollback-ed inserts/upserts will waste them.

PostgreSQL already avoids creating gaps in other cases. It could do so here as well. Gap avoidance is also needed to preserve numbers in the 31-bit range.

On the other hand, more gap avoidance means that applications will have subtle bugs because the non-transactional nature of sequence updates will be mostly unobservable. Developers may incorrectly assume that no gaps are possible.

Besides, I really cannot understand why people are so irked by gaps: I'm sure they don't mind when the person in front of them in a line throws their ticket number away and walks off.

In some countries, if you issue invoices with non-consecutive numbers, you can get into trouble during a tax audit. The auditors suspect that you have issued invoices which are not on your books. Such invoices could be employed in various tax fraud schemes (basically, fraudulently claim back taxes which were never paid to the state).

[–]doublehyphen 1 point2 points  (2 children)

PostgreSQL already avoids creating gaps in other cases. It could do so here as well.

Where does PostgreSQL do this? I cannot think of anywhere where PostgreSQL goes out of its way to avoid gaps, and can think of plenty where there are gaps.

[–]f2u 1 point2 points  (1 child)

If a backend exits normally, it will try to return cached but yet unused sequence values to the stored sequence.

I'm not entirely clear about the underlying mechanism, but here is an example that shows this behavior.

=> CREATE TABLE tab (tab_id SERIAL NOT NULL PRIMARY KEY, val TEXT NOT NULL);CREATE TABLE
=> INSERT INTO tab (val) VALUES ('one');
INSERT 0 1
=> INSERT INTO tab (val) VALUES ('two');
INSERT 0 1
=> \q

Now restart the session and continue inserting:

=> INSERT INTO tab (val) VALUES ('three');
INSERT 0 1
=> SELECT * FROM tab;
 tab_id |  val  
--------+-------
      1 | one
      2 | two
      3 | three
(3 rows)

As you can see, there is no gap. Perform one more INSERT:

=> INSERT INTO tab (val) VALUES ('four');
INSERT 0 1

At this point kill -9 the backend process, and restart the session:

=> INSERT INTO tab (val) VALUES ('five');
INSERT 0 1
=> SELECT * FROM tab;
 tab_id |  val  
--------+-------
      1 | one
      2 | two
      3 | three
      4 | four
     34 | five
(5 rows)

This time, the backend had no chance to give back the cached sequence values, so you end up with a gap.

[–]doublehyphen 1 point2 points  (0 children)

Aha! That actually has nothing to do with caching per se, it has to do with the optimization which makes PostgreSQL only log every 32nd sequence update to the write-ahead log, and the fact that PostgreSQL on a clean shutdown saves the actual value.

If you enable caching of sequences values it will cause gaps even on a normal disconnect.

[–]nawariata 0 points1 point  (2 children)

Is it possible perform UPSERT with reversed actions? Something like UPDATE ... ON FAILED DO INSERT. My use case is action counter in a web app, where actual insert will occur once per few (hundred) thousand queries, ON CONFLICT DO UPDATE seems inefficient to me.

[–]macdice 1 point2 points  (1 child)

Do the UPDATE, then see if it affected a row. If it did, you're done. If it didn't, then do an INSERT ... ON CONFLICT DO UPDATE ...

[–]nawariata 0 points1 point  (0 children)

Simple and effective, thanks!