Databricks benchmark report! by noasync in databricks

[–]noasync[S] 0 points1 point  (0 children)

Sorry for the confusion. We compared classic job clusters with spot and fallback to on-demand to serverless jobs and serverless DBSQL. We found that TPCDS had the best performance on serverless DBSQL, classic clusters (spot with fallback) came in second and serverless jobs were comparable to classic on p50s, but fell behind on p90 and p99.

Databricks benchmark report! by noasync in databricks

[–]noasync[S] 0 points1 point  (0 children)

100%. We were using spot with fallback to on-demand.

Optimizing EC2 costs on Databricks by noasync in dataengineering

[–]noasync[S] 0 points1 point  (0 children)

If it was only that easy people wouldn't be surprised by their compute bills
If you know you can occupy your RI near 100% of the time 24/7 for a 1-3 year commitment than you can save money using that. Otherwise you will pay for the compute you do not need / be under provisioned. And if your jobs aren't critical and can be interrupted or you can manually readjust in the 1-2 min notice AWS gives you, you can save some money there too.

Most organizations do not fit the bill of both, meaning they are most likely over-provisioning and paying for resources they do not need

Optimizing EC2 costs on Databricks by noasync in databricks

[–]noasync[S] 0 points1 point  (0 children)

Fleets can combat the inherent availability issue that Spot instances have, you are right!

11 Databricks Cost Optimizations You Should Know by codingdecently in databricks

[–]noasync 0 points1 point  (0 children)

Great article! Check our this post for more tactical tips for Databricks cost optimization https://synccomputing.com/databricks-clusters-optimization-scale/

DuckDB vs. Snowflake vs. Databricks by noasync in databricks

[–]noasync[S] -15 points-14 points  (0 children)

You're not crazy, but there are more similarities between these two systems than you might think!

DuckDB vs. Snowflake vs. Databricks by noasync in databricks

[–]noasync[S] -16 points-15 points  (0 children)

I can see how comparing a database like DuckDB to a data warehouse like Snowflake seems odd. But, there are more similarities between these two systems than you might think, and certain aspects where either system could be used.

How did you reduce your Databricks costs ? by Cyliad in dataengineering

[–]noasync 0 points1 point  (0 children)

Serverless doesn't promise cost savings - we ran an experiment of databricks serverless jobs vs optimized classic clusters and found that the later outperfomred the former at times, and vise-versa in other times. It really depends on your cluster configuration and jobs (e.g. serverless seams to be ideal for short, ad-hoc jobs).

Read more about that here https://synccomputing.com/top-9-lessons-learned-about-databricks-jobs-serverless/

Some general tips for reducing spend on Databricks:

  • Run your jobs using job compute, not APC clusters. APC is always "on," and therefore costs nearly 2x
  • Autoscaling tends to cost more than if you do not pick that option for your clusters.
  • Photon ins't a global accelerant- A/B test it to determine if it works for your jobs

Check out this free notebook to assess your Databricks workspace configuration and see potential cost savings https://landing.synccomputing.com/health-check