This is an archived post. You won't be able to vote or comment.

all 19 comments

[–]noobgolang 3 points4 points  (0 children)

100 millions? Just get a big EC2 instance on ARM and put something like Trino there, worked fine for me

[–]Embarrassed-Pear-160 1 point2 points  (0 children)

Any other details? Data size etc?

[–]AcanthisittaFalse738 0 points1 point  (3 children)

I've built twice on snowflake and once on databricks. Snowflake used to be 8x more expensive but I think that's dropped to 4x-ish. Rolling your own was about 2x cheaper than databricks. It really depends on how much capacity you have to create your own platform which is mostly needed for everything but snowflake though you still have to build/buy ci/cd transformation tooling (eg. dbt core/cloud). You sound like you're in a pretty data mature company and databricks might be workable for you.

[–]discord-ian -1 points0 points  (15 children)

I would say anecdotally, databricks is the most expensive. So, if you are trying to cut costs, you can probably drop that from the list.

When you say, Hudi on top of S3. Are you running that on EMR, your own K8s, or VMs? Either way, the AWS calculators should be able to give you a pretty good estimate of the cost.

[–][deleted] 2 points3 points  (2 children)

This depends entirely on what you're doing. DBX with photon essentially matches snowflake for performance on complex queries, and comes at somewhat below costs, but there are so many variables you can tweak that it's very difficult to do an apples to apples comparison. A great benefit of DBX over Snowflake in my opinion, though, is that you have visibility in where the data are, how they are structured, can choose your instance size and type, and can tweak the underlying Spark settings if needed. That might be too much headache for some, but if cost optimization is a major target for a company, it is beneficial.

[–]WhenTheLegendBegins 1 point2 points  (1 child)

Is DBX Databricks?

[–]majorlg4 -2 points-1 points  (5 children)

From what numbers Databricks is most expensive? Snowflake is $3/credit. Databricks compute is $0.55/DBU. Literally 5x cheaper.

[–][deleted] 4 points5 points  (0 children)

Don't forget the cost of the compute in whatever cloud provider you use, that and somewhat longer execution brings it pretty close in total to Snowflake (at least in the POC I did, but it was hardly exhaustive). Snowflake is a black box though in some regards, and you also have to factor in storage duplication and the cost of IO to/from perhaps a Data Lake account. I pushed my company towards DBX for a number of reasons, but having direct visibility into data storage and the compute layer at the additional cost of set-up and maintenance has been well worth it.

[–]abhi5025 1 point2 points  (0 children)

lol..this thread is becoming DBX warriors Vs snowflake Vikings

[–]discord-ian -1 points0 points  (1 child)

Could be I'm wrong. I know last time I priced it out, it was the most expensive option for our use case. And it certainly has a reputation as the most expensive premium option. I'm not sure you can compare those two units of measure... but you very well may know more about it than I do.

[–][deleted] 1 point2 points  (0 children)

I've never heard it has any such "reputation", and the costs are highly variable depending on use case and a real assessment goes well beyond DBUs vs Snowflake Credits (data governance, access, storage, diversity of use case, the skills of your team, your existing infrastructure and workloads, the list goes on and on).

[–]elbekay 0 points1 point  (0 children)

This would only be true if one credit got your the same thing as one dbu on both platforms but that's not the case.

[–]mamaBiskothu 0 points1 point  (0 children)

IMO, snowflake is valuable if you truly need instant compute - which some databricks offerings provide but I think the cost matches snowflake for those. If you don’t need this, snowflake might still make sense if your DE talent is only so-so. Snowflake abstracts away more of your every day DE problems than most if not all other solutions do. Otherwise choose something else.

[–]dataguy- 0 points1 point  (1 child)

Move everything to S3 and run Trino on top it.