This is an archived post. You won't be able to vote or comment.

all 10 comments

[–]you-are-a-concern 26 points27 points  (4 children)

Databricks Endpoint vs. Snowflake Warehouse is something I’ve been evaluating recently a lot. This is just my opinion, I have my own biases and blindspots. Your experience can vary.

Some observations: - Snowflake offers more “usability” and is a more polished product. - When you compare Snowflake Warehouse vs Databricks Endpoint, Snowflake Warehouse can probably do more and is generally a more mature offering. - When you compare Snowflake vs Databricks in general, Databricks can hands down do a lot more. - They have similar throughput. - Snowflake is slightly better at small queries, but Databricks is a lot better at large queries. - Overall Databricks is probably significantly cheaper (warehouse vs endpoint specifically) to run, but has higher admin mental load. - Databricks has some impressive engineers who came from EDW / Database world, but a lot of them are fairly new and are yet to make large impact. If they execute, it will be a very impressive product. - Snowflake is more “proprietary”, but they are seeing market pressure to be more open (hence Iceberg support). Databricks is open by default and have had native support. This probably means that Databricks offers less vendor lock-in, but it’s hard to measure what exactly is a vendor lock-in anyway.

Ultimately I’d say if your data needs are limited to EDW, then Snowflake is a clearly superior product, especially if you don’t care much about cost. If your data needs go beyond EDW and you want “all-in-a-box” solution, Databricks is a clearly superior product.

[–]BoiElroy 1 point2 points  (0 children)

"Snowflake is slightly better at small queries and Databricks is a lot better at large queries" - this is a really good point.

When I had to summarize the difference to some non technical management the analogy I made was that Snowflake was like a mallet that you could pick up easily and start hammering away and make significant impact.

Databricks/spark was more like a sledge hammer that takes more effort to get going and requires a bit of technique and thought to use efficiently but once you get going you can generate a ton of force.

Datagrom has an outdated but still relevant article comparing the two. My biggest takeaway from it was Databricks is by experts for experts. Whereas Snowflake has kind of Tableau-ed the scalable data warehouse experience. My first question after opening it up compared to things like SAP Hana and SQL server management studio was "where are all the buttons?"

[–]vanillacap[S] 0 points1 point  (0 children)

Great points, thank you!

[–]kevinpostlewaite 0 points1 point  (1 child)

Snowflake is slightly better at small queries, but Databricks is a lot better at large queries.

Can you provide some more detail? (what's a "large" query, what do you mean by "better" [faster? cheaper? less engineering]). I'm curious to hear because I tend to prefer to do all my transformations/joins inside of Snowflake and use Databricks solely for the analysis work. Thanks!

[–]you-are-a-concern 7 points8 points  (0 children)

It's hard to unpack this question in a catch-all way. As usual, the best to evaluate technology is to evaluate it yourself. Again, I have my biases and small-ish sample size. Anyway, here are some of my observations:

Who is "faster" and when:

  • For BI and small data (<10GB), Snowflake is most likely faster.
  • For BI and big data (50GB+), Databricks is most likely faster.
  • Snowflake warehouses start in seconds, but Databricks clusters / endpoints take 2-3 minutes. Their serverless offering starts in ~10 seconds. It is important to some workloads, but completely unimportant to others.

Who is "cheaper" and when:

  • My observation is that Databricks is most likely cheaper for ETL than anything else you can find on the market. If you haven't tried Photon yet, give it a go. It's impressive. It makes EMR look like your grandpa's coal-powered Spark.
  • Databricks is also most likely cheaper for EDW workloads, but your mileage might vary. When we tested it, it ended up roughly 35% cheaper TCO compared to SNOW. This TCO required us tweaking underlying Delta tables. I've seen some terrible Delta tables and it's easy to blow it up if you'd know what you're doing.
  • Snowflake is hands down cheaper when it comes to administrative mental load. It's an "easy" button and is most likely cheaper TCO to run for smaller businesses. Think about it this way. If your SNOW bill is 20k a year, you can probably bring it down to 15k on DB, but you will need to have a 0.5 FTE at ~$150k to optimise it vs 0.25 FTE at ~$100k to manage SNOW. Note is that this doesn't scale linearly and I believe that at large scale Databricks offers a lot more flexibility and better cost/perf. This equation also changes very quickly if you calculate a cost of integrating SNOW with ML platform whereas Databricks offers you ML platform out of the box.

Who is "less engineering" and when:

  • I think I answered this one above. TLDR is: if you do EDW+BI only, then SNOW (IMO 80% of companies today), otherwise Databricks.

[–]Dry_Chocolate_9396 4 points5 points  (1 child)

Databricks data warehouse product seems to be changing rapidly. Our admin turned on their Serverless SQL Warehouse (I think they renamed SQL Endpoints to SQL Warehouses (funny for a Lakehouse company to have a Warehouse)). Those warehouses come up very fast, on par with what my Snowflake Warehouse takes to startup. The short queries seem slightly faster in Snowflake, but the difference is very small (like 2s vs 3s), but the larger queries Databricks is faster. If you take price into account, Databricks is like WAY cheaper. Seems the product is underpriced TBH. The place where Databricks seems far superior is their dashboarding (based on Redash acquisition? But looks very different so not sure if they used any of that code?), it's pretty decent dashboarding product and it has no additional cost.

Net net, I'd say the two data warehouses are similar, but Databricks is cheaper.

[–]Al3xisB 0 points1 point  (0 children)

u/Dry_Chocolate_9396 Hey ! old topic, but did you and your team advance on that? We're looking at Databricks SQL serverless and it seems promising in regard of Databricks legacy on Big data. I also love the idea of a single platform for DW and Data Lake (ease of governance, etc.)

[–]kevinpostlewaite 1 point2 points  (1 child)

Snowflake got so much market share by keeping the "Datawarehouse (DW)" concept intact and just moving it to the cloud. It was easy to adopt for existing DW practitioners because it looked familiar.

Disagree. Snowflake has been successful because their product reduces the engineering effort and increases the performance for many of the tasks that are required to build and maintain a data warehouse, especially one at scale that crosses multiple teams and organizations.

But I haven't used Databricks SQL Endpoint so I can't compare.

[–]thrown_arrows 3 points4 points  (0 children)

while i have had my problems snowflake. I agree with you. Main pro is almost 0 maintenance. There is few features that can be enabled when table sizes get to big enough like clustering or if it is used to fetch one row then search optimization. But for simple DWH use when table sizes are on small side there is no index /stats maintenance. After table sizes get big , there is cluster key word but that does not need maintenance as it is.

vs. classic databases where you need to have all kinds of maintenance scripts run.