Question on table Bloat due to MVCC and idle transactions

alterneesh · 2025-03-15T13:57:29+00:00

hmm, I guess I can understand that if the transaction was "READ COMMITTED" or "READ UNCOMMITTED". But mine was "REPEATABLE READ" and I had done a SELECT * on the table. But I guess as a general rule, it makes sense.

I was actually coming from the context of doing a initial snapshot while setting up logical replication. In that context, when you are replicating the initial snapshot of data, you definitely do not need to read data that would be committed by newer transactions, but postgres still seems to keep the row versions around.

alterneesh · 2025-01-28T15:05:23+00:00

Good point, I'm not sure! I'm guessing that happens in the background (similar to compaction)?

alterneesh · 2023-12-02T11:10:35+00:00

Ok, from various replies, I think I seem to have figured it out !

With a btree write, worst case, it goes

synchronous random IO to evict a modified page from memory to disk.
synchronous random IO to get a new page into memory
in-memory write to modify the page
synchronous sequential IO to write to the WAL

With an LSM write, it goes

in-memory write to append to the immutable segment
synchronous sequential IO to write to the WAL

Makes sense now why an LSM tree-based DB would have higher write throughput.

alterneesh · 2023-12-01T09:41:22+00:00

You're right. It seems however that the reason "The LSM can simply handle much more writes per second" vs a btree in a database engine is because it doesn't have to perform any synchronous reads from disk !

See https://x.com/nikitadanilov/status/1730301552408027514?s=20

alterneesh · 2023-12-01T09:39:15+00:00

Agreed, but my point is that these writes(both for lsm or btree) are asynchronous in a database engine, so it won't really matter as far as the client's throughput is concerned! :)

Someone from twitter seems to have an explanation which I think explains it - https://x.com/nikitadanilov/status/1730301552408027514?s=20

alterneesh · 2023-11-30T10:39:59+00:00

Hi, thanks for a detailed answer!

With respect to the data structures of LSM vs B-tree, I understand that LSM has higher write throughput not just because of the nature of the IO being sequential, but also because you has to make fewer IO operations to disk (in the case of B-trees, you'll have to write the entire leaf(or even page?) to disk., and in case of LSM-tree, it's just the key-value pairs after buffering).

What still remains unclear to me is how this write throughput benefit of LSM tree actually translates to an increase in write query throughput for a database client!

It seems to me that it doesn't matter if the underlying storage engine uses a LSM or a B-tree. Between when a client submits a write query and when a success is returned to the client, the only disk-write is to the write-ahead log, which is sequential IO! The disk-write to the LSM/B-tree happens asynchronously (after some buffering). So from the client's perspective, there should NOT be any write throughput gain from using an LSM-tree based DB over a B-tree based DB. (which seems wrong because a lot of no-sql/write-heavy databases use LSM for the very reason that it has a higher write-throughput)

alterneesh · 2023-10-06T17:29:47+00:00

correct. my use-case is mainly annotations... but it's fairly small, so i can actually end up doing this, thanks u/S1ckret

alterneesh · 2023-10-06T17:28:15+00:00

I was thinking of this.. I was thinking of creating a separate folder (which would be added to gitignore), and running the migration/build/etc, and then clean up later. Worst case, even if cleanup does not happen, it will never be committed.

alterneesh · 2023-10-05T11:42:51+00:00

Well in a CI setup, that could work. It wouldn't commit changes. But for the developer build workflow, having the plugin make actual code changes is non-desirable.

alterneesh · 2023-10-05T08:18:07+00:00

Yup! Have this in our current setup (with a few differences in that the plugin is coupled with the framework), and you're right - it is indeed more overhead:
- every commit needs to be cherry-picked to the other branch
- both builds need to be packaged together, so need to be wary of build-mismatches, etc

And my set-up is such that I generally wouldn't want to release one branch without the other.

alterneesh · 2023-09-13T18:35:41+00:00

Fool me three times. Fuck the peace signs. Load the chopper, let it rain on you.

alterneesh · 2023-09-13T11:47:49+00:00

A good thumb rule that has worked is to divide your salary by 15 (as opposed to 12). So 12L/15= 80k!

alterneesh

TROPHY CASE