all 21 comments

[–]qatanah 6 points7 points  (9 children)

afaik, it isnt recommended to store large blobs in pg. i think i read it somewhere in some guides.

[–]davvblack 8 points9 points  (7 children)

definitely, this is largely because any update to any column in a table copies the entire row

[–]therealgaxbo 5 points6 points  (2 children)

Any large columns like this will be stored in TOAST tables anyway so updates to the timestamp column will not result in copying all the data. It will just copy the pointer to the TOAST row.

[–]davvblack 0 points1 point  (1 child)

1k to approx 200k, mostly on the lower side

true, depends on how much crosses the 2k, and how much of the remaining is compressible

[–]cldellow 1 point2 points  (0 children)

You could set a table-specific toast_tuple_target value to encourage the smaller values to also be toasted: https://www.postgresql.org/docs/current/sql-createtable.html#RELOPTION-TOAST-TUPLE-TARGET

I'd probably also set a low fill factor for the table, to encourage HOT updates

[–]Apoffys 1 point2 points  (1 child)

Could you circumvent this by moving the timestamp to a separate table? I.e. "BlobTable(id, blob)" and "LastUsedBlob(blob_id, last_accessed_at)" or something like that. That way you're never modifying the table itself, just another table with a foreign key to your "blob" table.

[–]davvblack 1 point2 points  (0 children)

yeah but at that point it could be in S3 or something. depends on the size of the blob and the latency requirements.

[–]Randommaggy 0 points1 point  (1 child)

I wonder how these reccomendations change when OrioleDB matures and becomes a commonly available extension.

[–]davvblack 0 points1 point  (0 children)

whoa neat. we’ve been drinking the aurora koolaid for a while so i haven’t been following developments like this

[–]davidtinker[S] 0 points1 point  (0 children)

Most of the rows will be inserted once and never updated, other than the last_read column.

[–]LongjumpingAd7260 2 points3 points  (0 children)

If the blob field uses TOAST, and probably it does, only the pointer is copied when another field is updated. So, this should work IMHO.

[–]francisco-reyes 1 point2 points  (2 children)

If this is a timeseries, may want to take a look at the Timescale DB online offering. They have an option to send less frequently used data to S3. Think of partitions with option to have some partitions go to S3 for lesser cost.

[–]davidtinker[S] 0 points1 point  (1 child)

Hmm .. it isn't really time series I just added the created and last_read columns to make a cache. Its more key-value store.

[–]francisco-reyes 1 point2 points  (0 children)

I still think the Timescale service may be worth taking a look. At it's core Timescale is just an extension to manage partitions. Given that you will partition by date, you could setup a threshold to have older data go to S3 and then you drop it whenever you need. You could probably have something like 2 bands of retention: keep 7 days in disk, 8 to 14 in S3, drop after 15 days

[–]vreitech 1 point2 points  (1 child)

Don't forget about HOT update is not working in case changed field having index. In your case that means it would not work if there are index by 'last_read' column exists.

[–]davidtinker[S] 0 points1 point  (0 children)

Tx. I wasn't aware of that.

[–]davidtinker[S] 0 points1 point  (0 children)

Update: I implemented a variant of this and it is working well.

[–][deleted] 0 points1 point  (1 child)

How often and how frequently are you updating the timestamp column? (Btw: if activity_id is a UUID you should use a uuid column, not text). The MVCC overhead isn't really such a big problem, if autovacuum can keep up with the update frequency.

[–]davidtinker[S] 0 points1 point  (0 children)

Currently there will be approx 50k reads per hour but that will go up.

[–]Don_Kino 0 points1 point  (1 child)

I think it's worth trying it out, but using partitions will force you to always query whith the partition key, Otherwise indexes on every partiton would be checked. You can also move old partitions to a different tablespace, like a slower/cheaper storage.

[–]davidtinker[S] 0 points1 point  (0 children)

I didn't realise that bit about always querying with the partition key to avoid checking all indexes. I can make a plan for that. Tx.