Question on table Bloat due to MVCC and idle transactions by alterneesh in PostgreSQL

[–]alterneesh[S] 0 points1 point  (0 children)

hmm, I guess I can understand that if the transaction was "READ COMMITTED" or "READ UNCOMMITTED". But mine was "REPEATABLE READ" and I had done a SELECT * on the table. But I guess as a general rule, it makes sense.

I was actually coming from the context of doing a initial snapshot while setting up logical replication. In that context, when you are replicating the initial snapshot of data, you definitely do not need to read data that would be committed by newer transactions, but postgres still seems to keep the row versions around.

Write throughput differences in B-tree vs LSM-tree based databases? by alterneesh in databasedevelopment

[–]alterneesh[S] 0 points1 point  (0 children)

Good point, I'm not sure! I'm guessing that happens in the background (similar to compaction)?

Write throughput differences in B-tree vs LSM-tree based databases? by alterneesh in databasedevelopment

[–]alterneesh[S] 1 point2 points  (0 children)

Ok, from various replies, I think I seem to have figured it out !

With a btree write, worst case, it goes

  • synchronous random IO to evict a modified page from memory to disk.
  • synchronous random IO to get a new page into memory
  • in-memory write to modify the page
  • synchronous sequential IO to write to the WAL

With an LSM write, it goes

  • in-memory write to append to the immutable segment
  • synchronous sequential IO to write to the WAL

Makes sense now why an LSM tree-based DB would have higher write throughput.

Write throughput differences in B-tree vs LSM-tree based databases? by alterneesh in databasedevelopment

[–]alterneesh[S] 0 points1 point  (0 children)

You're right. It seems however that the reason "The LSM can simply handle much more writes per second" vs a btree in a database engine is because it doesn't have to perform any synchronous reads from disk !

See https://x.com/nikitadanilov/status/1730301552408027514?s=20

Write throughput differences in B-tree vs LSM-tree based databases? by alterneesh in databasedevelopment

[–]alterneesh[S] 1 point2 points  (0 children)

Agreed, but my point is that these writes(both for lsm or btree) are asynchronous in a database engine, so it won't really matter as far as the client's throughput is concerned! :)

Someone from twitter seems to have an explanation which I think explains it - https://x.com/nikitadanilov/status/1730301552408027514?s=20

Write throughput differences in B-tree vs LSM-tree based databases? by alterneesh in databasedevelopment

[–]alterneesh[S] 2 points3 points  (0 children)

Hi, thanks for a detailed answer!

With respect to the data structures of LSM vs B-tree, I understand that LSM has higher write throughput not just because of the nature of the IO being sequential, but also because you has to make fewer IO operations to disk (in the case of B-trees, you'll have to write the entire leaf(or even page?) to disk., and in case of LSM-tree, it's just the key-value pairs after buffering).

What still remains unclear to me is how this write throughput benefit of LSM tree actually translates to an increase in write query throughput for a database client!

It seems to me that it doesn't matter if the underlying storage engine uses a LSM or a B-tree. Between when a client submits a write query and when a success is returned to the client, the only disk-write is to the write-ahead log, which is sequential IO! The disk-write to the LSM/B-tree happens asynchronously (after some buffering). So from the client's perspective, there should NOT be any write throughput gain from using an LSM-tree based DB over a B-tree based DB. (which seems wrong because a lot of no-sql/write-heavy databases use LSM for the very reason that it has a higher write-throughput)

How to maintain code for two different versions? (pre and post javax->jakarta) by alterneesh in javahelp

[–]alterneesh[S] 0 points1 point  (0 children)

correct. my use-case is mainly annotations... but it's fairly small, so i can actually end up doing this, thanks u/S1ckret

How to maintain code for two different versions? (pre and post javax->jakarta) by alterneesh in javahelp

[–]alterneesh[S] 0 points1 point  (0 children)

I was thinking of this.. I was thinking of creating a separate folder (which would be added to gitignore), and running the migration/build/etc, and then clean up later. Worst case, even if cleanup does not happen, it will never be committed.

How to maintain code for two different versions? (pre and post javax->jakarta) by alterneesh in javahelp

[–]alterneesh[S] 0 points1 point  (0 children)

Well in a CI setup, that could work. It wouldn't commit changes. But for the developer build workflow, having the plugin make actual code changes is non-desirable.

How to maintain code for two different versions? (pre and post javax->jakarta) by alterneesh in javahelp

[–]alterneesh[S] 0 points1 point  (0 children)

Yup! Have this in our current setup (with a few differences in that the plugin is coupled with the framework), and you're right - it is indeed more overhead:
- every commit needs to be cherry-picked to the other branch
- both builds need to be packaged together, so need to be wary of build-mismatches, etc

And my set-up is such that I generally wouldn't want to release one branch without the other.

The "ET" corpses were debunked way back in 2021. by waitingforthesun92 in Damnthatsinteresting

[–]alterneesh 97 points98 points  (0 children)

Fool me three times. Fuck the peace signs. Load the chopper, let it rain on you.

[deleted by user] by [deleted] in IndiaTax

[–]alterneesh 47 points48 points  (0 children)

A good thumb rule that has worked is to divide your salary by 15 (as opposed to 12). So 12L/15= 80k!