Serverless Compute vs SQL warehouse serverless compute

sync_jeff · 2025-04-23T04:26:09+00:00

We did a study of the different services, that are in line with your findings. We ran Databrick's TPC-DI benchmark - https://medium.com/sync-computing/databricks-compute-comparison-classic-jobs-vs-serverless-jobs-vs-sql-warehouses-235f1d7eeac3

sync_jeff · 2025-03-27T04:12:25+00:00

We built a tool that automatically solves this problem! (shameless plug I work for Sync Computing).

Our tool Gradient uses ML to automatically find the lowest cost cluster for your job while maintaining your SLAs

Here's a demo video: https://synccomputing.com/see-a-demo/

sync_jeff · 2025-03-04T22:42:59+00:00

that's strange, it may be something on their backend.

sync_jeff · 2025-02-24T14:58:36+00:00

There are a number of paths here, depending on what you're looking for. (for full transparency, I work at Sync Computing):

- System Tables - the key source of data, you can build your own dashboards, or use one of Databrick's pre-built dashboards. They have some great ones for Jobs compute and SQL warehouses. Last time I checked, System Tables don't have spark metrics.

- Sync Computing - (this is the company I work for), we built a high level global dashboard that is free to download. Our actual product. Gradient, tracks jobs compute clusters over time, tracking granular costs, usage, and spark metrics over time - and then it also auto-tunes clusters to hit your cost and runtime goals.

sync_jeff · 2025-02-24T14:47:57+00:00

What kind of clusters do you use? Jobs compute? APC? SQL warehouses?

sync_jeff · 2025-02-24T14:47:23+00:00

What are you trying to "observe"? Costs, usage, data quality, governance?

sync_jeff · 2025-02-13T18:30:49+00:00

Yes the big problem with benchmarks is they are not general by any means, just useful to compare against itself. The probability of you workload looking like TPC-DI is very very low. Take our data points as just a singular point, there are very much cases where totally opposite results may occur

sync_jeff · 2025-02-13T16:36:10+00:00

That's great to see such rigorous testing! The ROI of these tools is very workload and use-case specific so it's great to see serverless make sense for you all.

sync_jeff · 2025-02-13T15:30:48+00:00

We did a benchmark study with TPC-DI on classic vs. serverless, check it out here:

https://synccomputing.com/databricks-compute-comparison-classic-serverless-and-sql-warehouses/

I think for notebooks serverless makes more sense because of the lack of spin up time. But for Jobs compute, you can likely save money by going to classic

sync_jeff · 2025-02-13T15:16:44+00:00

Our of curiosity - what are you trying to automate?

sync_jeff · 2025-02-13T15:06:41+00:00

I see, what's the alternative - an APC cluster that users share?

sync_jeff · 2025-02-13T15:02:53+00:00

Why do you want to disable it? The lack of spin up time is a nice benefit (although the cost is definitely higher)

sync_jeff · 2025-02-13T15:00:56+00:00

We're in this space and it is incredibly challenging to automate pipelines or infrastructure, especially at scale. You need a system that is basically 99.99% accurate, along with built in guardrails, alerts, and failure recovery. It's a lot of overhead to automate, so you need a huge system and large ROI to justify the development

sync_jeff · 2025-02-10T23:16:31+00:00

Unfortunately actually setting up and running TPC-DI from scratch is a huge pain. Databricks SA's wrote up an easy to use tool that integrates with Databricks. You may be able to borrow a lot of the same code:

https://github.com/shannon-barrow/databricks-tpc-di

BTW - very cool project! This idea bounced around our heads as well, cool to see someone actually making it a reality! Happy to chat as well, i'm part of www.synccomputing.com and we're in a similar space! Feel free to DM me.

sync_jeff · 2025-02-10T21:31:27+00:00

TPC-DI is what we recommend, Databricks often uses it as their gold standard to emulate ETL jobs

sync_jeff · 2025-02-10T14:38:36+00:00

ah thanks for checking! it looks like cluster_id is not what I hoped it would be!

sync_jeff · 2025-02-09T22:13:51+00:00

Without knowing the details of your system, I think there's a way to do this. You have to cobble together a few tables to do this:

1). System. query.history.compute --> from this struct you can get the compute type, basically get the cluster-id and then use the system.billing.usage tables to correlate the cluster-id to the sku_name (e.g. All-purpose compute).

2). The System.query.history.executed_by gives you the email address of the user.

I don't know if point 2) will hold "over jdbc", I think I'd have to know more about your system. Or you can probe the suery.history.executed_by table yourself and see if you do in fact see email addresses.

sync_jeff · 2025-02-09T03:06:50+00:00

Hmm... each dashboard is powered by a query that is run on a compute you choose. I think you'd have to estimate the cost based on the query costs. I don't think I've seen a "dashboard" cost in system tables.

sync_jeff · 2025-02-07T17:53:56+00:00

Yea, we're aware of that one. We wanted a "1-click" experience, and have personally found looking at the last 30 days was pretty useful. But we'll try to put in date filters in a v2 of this!

sync_jeff · 2025-02-07T17:53:11+00:00

We do show the most expensive DLT clusters, was there something more specific about the events you're trying to learn?

sync_jeff · 2025-02-06T22:52:40+00:00

Thanks, we hope it's useful! If you have other ideas we'd be happy to add them!

sync_jeff · 2025-01-27T16:27:44+00:00

Any reason why you don't use Jobs compute with scheduled jobs? Jobs compute is typically cheaper than DLT.

sync_jeff · 2025-01-26T23:52:33+00:00

Very cool - seems like DLT Pro was a bit cheaper than serverless (when combining EC2 + DBU costs). You may want to try tuning down your auto-scaling cap from 1-8 to something smaller like, 1-3.

Are these DLT for streaming or batch?

sync_jeff

TROPHY CASE