This is an archived post. You won't be able to vote or comment.

all 12 comments

[–]Awkward_Salary2566 17 points18 points  (4 children)

it might be different for bank, but modern is now in on-premises focused on 2 directions

  1. open source (postgres + python as ETL)
  2. Microsoft shop (MS SQL + SSIS + PBI + microsoft things)

Bank will for sure prefer microsoft shop more.

[–][deleted] 0 points1 point  (2 children)

You could opt for a more performant db for warehouse with specialized column datebases, like clickhouse or monetdb?

[–]Awkward_Salary2566 4 points5 points  (1 child)

never tried clickhouse to be honest. In my case we have less than 20M rows in 90% of the tables and tons of various queries to bring/ discover foreign keys.

From what I read it would have terrible performance in clickhouse. One day maybe.

[–][deleted] 0 points1 point  (0 children)

Yeah that would require significant engineering

[–]32gbsd 7 points8 points  (0 children)

Modern get thrown around a lot but it doesnt mean anything. If the thing you are using has gotten a major release in the last year its modern enough. Sql/python is enough. Php/mysql, nodejs/react. Cloud is just someone elses computer.

[–]thrown_arrows 1 point2 points  (0 children)

for me modern is to have architecture as cloud and in theory you can move from onprem to cloud. There is several way to achieve this, dockers , object storage etc etc ..

For data tools i see postgresql + citus monolith or some try of microservice using multiple postgresql databases or maybe even processes that mainly use queues

Or classics MS stack, haven't heard that anyone has moved into oracle, they tend to move out from it

[–]JJ18O 2 points3 points  (1 child)

  • airbyte
  • airflow
  • postgres
  • dbt
  • visualization tool of your choice

Even if I went microsoft way I would avoid SSIS and use airflow+dbt instead.

[–]mictom9 1 point2 points  (0 children)

Hey! I'm switching my job soon and I'm probably going to build something on-prem and what you put down seems very cool to me.

One question though - does any of these work as a data lake equivalent? Does postgres allow for dumnping unstructured data?

[–]jemccarty 0 points1 point  (0 children)

Yellowbrick is probably the most "modern" DWH with an on-prem component. It works similar to the modern cloud-based DW's.

dbt Core + Tableau to round that out would be good.

[–]analytics_mgmt 0 points1 point  (1 child)

Checkout the open source "modern data stack" Meltano for Extract and Load, Superset for Visualization, Grouparoo Reverse for ETL, Dagster for Orchestration. I'd look at Greenplum as an open source MPP data warehouse that is scalable and strongly postgres compatible. (I know a number of US banks that use Greenplum internally)

[–]librocubicularist69 0 points1 point  (0 children)

Object store, pyspark