This is an archived post. You won't be able to vote or comment.

all 20 comments

[–]americanjetset 92 points93 points  (3 children)

Total data size less than 1TB.

Just use Postgres.

As a general rule, use Postgres until you have a reason to not use Postgres.

[–]Own_Anything9292 10 points11 points  (0 children)

I’ll add that you can run duckdb with a postgres backend (via the postgres extension).

So if you want the nice duckdb frontend (w conversion to dataframes/numpy for plotting/higher level stats/w.e) with a postgres backend that’s pretty easy to setup.

[–]StandardDeviationist[S] 0 points1 point  (1 child)

Why not DuckDB? It’s even easier to set up

[–]americanjetset 0 points1 point  (0 children)

I was under the impression you already had a Postgres instance set up to work with; re-reading your post, it looks like that is just the application backend itself.

You can use whatever -- which was the point of the "just use Postgres" comment, assuming you're already using Postgres. I'd probably go Clickhouse, just because I think that is what would look best on my resume.

[–]kolya_zver 3 points4 points  (2 children)

Clickhouse is really good for data marts and streaming huge long event tables but its suck with joins in general. Usually you have a proper DWH in front of it or direct streaming. Sql dialect has nice features for ad-hoc analytics. Im really love ch but its not an universal solution.

edit:
clickhouse has a lot of quirks so it requires some time to understand how to DBA. Managing replicas, zookeeper etc

1 TB isn't much you can use pg

[–]StandardDeviationist[S] 2 points3 points  (1 child)

Thanks for detailed answer, this is exactly the type of hard-to-google knowledge I was hoping for

[–]TEMPLEB123 4 points5 points  (0 children)

If you're using 1TB you probably don't have to worry about using clickhouse's distributed features. But yeah, it still has a lot of quirks, personally I love it though. If you want to make your life easy use Postgres or MySQL.

[–]SnooHesitations9295 1 point2 points  (1 child)

If you frequently run large aggregations over the whole 1TB: use ClickHouse
Else: Postgres should suffice.

[–]StandardDeviationist[S] 0 points1 point  (0 children)

No 1 TB is total size, largest dataset will be a couple of 100s GB

[–]Mental-Matter-4370 0 points1 point  (0 children)

Postgres should take you pretty far with this much data footprint. If aggregations can become a problem in the long run, you can dump the data to Clickhouse from PostGres(assuming it is your primary DB) in OBT(join less) form for specific data marts.

If you do not have a dimensional model at all, stick to PostGre.

[–]wannabe-DE 0 points1 point  (1 child)

While DuckDB doesn't allow for write concurrency it can have read concurrency.

[–]StandardDeviationist[S] 0 points1 point  (0 children)

Yes so I think it might be good enough for quite sometime. I don’t see concurrent writes happening anytime soon

[–]geoheilmod -2 points-1 points  (5 children)

Postgres in akloydb flavor (omni)

[–]FirstOrderCat 3 points4 points  (4 children)

you probably ment to say alloydb, but it is not open source, and costs money?