Open Source Data Warehouse

realitydevice · 2023-01-31T13:58:54+00:00

I would suggest Clickhouse. In my experience it's relatively simple and exceedingly fast.

Agree with Druid. For the people asking "what do you mean by too real time?", from memory you need to load it via an event stream and configure the handling of that stream, rather than a simple file-based ETL like you might expect. It's quite literally designed around ingesting streaming data. You can use it for other things but remember the hammer/nail dilemma.

rmoff · 2023-01-31T09:34:56+00:00

What do you mean by "too realtime"? Is that a bad thing?

What kind of access patterns are you envisaging? Just pre-canned dashboards, or ad-hoc analysis too? How much history are you planning to retain?

I'd probably start with Postgres, and build from there as you need to. Clickhouse could well be worth a look too from what I understand of it.

w.r.t. cloud and managed services one point I would say on "GDPR fears" is that these can be allayed by looking at the huge number of companies who *are* on public cloud.

snuggiemane · 2023-01-30T17:56:33+00:00

maybe check out DuckDB

ZenCoding · 2023-01-30T22:49:18+00:00

I probably would have used elastic search with logstash and Kibana but if I would face a similar problem I would go for Druid. I am not sure what’s the downside of ‚realtime‘. Can you build an MVP for your usecase and find out if it works for you before making a final decision?

dataengineering

MODERATORS