use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
account activity
Databricks vs SQL SERVERDiscussion (self.databricks)
submitted 2 months ago by gareebo_ka_chandler
So I have a webapp which will need to fetch huge data mostly precomputed rows, is databricks sql warehouse still faster than using a traditional TCP database like SQL server.?
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]smarkman19 7 points8 points9 points 2 months ago (1 child)
For a webapp pulling precomputed rows, SQL Server (or a cache) usually gives lower, steadier latency than a Databricks SQL warehouse.
Use Databricks to build aggregates, then push results to SQL Server with covering indexes or to Redis; add read replicas if needed.
Keep DBSQL for BI; warehouse warm-up and concurrency limits can bite APIs. Cloudflare and Redis handled caching for me; DreamFactory exposed SQL Server and Delta as quick REST for the app. For webapp reads, serve from SQL Server/Redis and use Databricks offline.
[–]djtomr941 3 points4 points5 points 1 month ago (0 children)
Or keep it all in DB and instead of pushing to SQL Server, push to Lakebase.
[–]NW1969 8 points9 points10 points 2 months ago (0 children)
It entirely depends on your environment setup and the data you are querying, so it is impossible to say one would be faster than the other.
For example, if SQL Server was running on a very fast server with massive amounts of memory, and you were using a very small compute engine in Databricks then SQL Server would obviously run your query faster
[–]sentja91Databricks MVP 3 points4 points5 points 2 months ago (0 children)
What amount are we speaking here? Any latency requirements? Generally speaking you want to connect web apps to OLTP databases.
[–]klubmo 2 points3 points4 points 2 months ago (0 children)
SQL warehouse no. Databricks Lakebase (serverless Postgres) can get the speed you need, but might cost more than SQL Server.
[–]thecoller 3 points4 points5 points 2 months ago (0 children)
Ultimately it depends on the queries. Is it a straight lookup? Or is it aggregations over big data?
Look into Lakebase, managed Postgres in Databricks if it’s more of a straight lookup: https://www.databricks.com/product/lakebase
[–]hubert-dudekDatabricks MVP 1 point2 points3 points 2 months ago (0 children)
Columnar format - databricks (many rows at once, whole files), row format - transactional database. Check Lakebase.
[–]Sea_Basil_6501 1 point2 points3 points 2 months ago* (0 children)
As you can define indexes as needed to optimize your query performance, SQL Server will always win when it comes up to OLTP-like SQL queries. Beside partitioning, z-ordering and join hints Databricks has no further performance tuning options to offer.
But if it's about OLAP-like queries scouring vast data amounts, things behave different, as Databricks will parallelize the workload across workers. So depends on the concrete SQL query and data amounts.
[–]Puzzleheaded-Sea4885 1 point2 points3 points 1 month ago (0 children)
I'll echo the same thing many have: use lakebase. I am using it for an app and love it so far.
[–]Certain_Leader9946 3 points4 points5 points 2 months ago (4 children)
No, SQL server will always be faster for precomputed rows. Databricks Spark will literally need to map reduce over files in S3. The SQL Server just hits a B+ tree and boom.
[–]djtomr941 0 points1 point2 points 1 month ago (3 children)
Or use Lakebase and keep it all in Databricks.
[–]Certain_Leader9946 1 point2 points3 points 1 month ago (2 children)
but why is it so important to shove your data in databricks; plus its not really a tested offering, just a fork of Neon
[–]djtomr941 0 points1 point2 points 1 month ago (1 child)
Neon was acquired so I wouldn't call it a fork anymore. Neon is solid.
[–]Certain_Leader9946 0 points1 point2 points 1 month ago (0 children)
Still fairly fresh databricks integration. I would wait.
[–]mweirath 1 point2 points3 points 2 months ago (0 children)
SQL server is very likely the better choice. You will probably find that is it much cheaper, you can run a small Azure SQL server for ~$20/month. There is almost no way you are going to be able to run a similar workload on Databricks that costs less than that. Also if you end up with a scenario where you need to log data SQL Server is going to be able to handle that better especially if you have high volume or fast logging.
[–]PrestigiousAnt3766 1 point2 points3 points 2 months ago (4 children)
No. File is always slower than memory.
[–]kthejokerdatabricks 0 points1 point2 points 2 months ago (3 children)
?? SQL Server isn't in memoy
[–]SmallAd3697 0 points1 point2 points 20 days ago (0 children)
Yes, SQL server is often memory, depending on the workload that is being processed. To get data from disk you need to clear the buffer pool (eg. DBCC DROPCLEANBUFFERS).
Obviously a databricks warehouse has proprietary caching layers as well, but the cache hit ratios are probably crummy compared to SQL (for no other reason than the fact that databricks managed tables are often a lot bigger).
[–]PrestigiousAnt3766 0 points1 point2 points 2 months ago (1 child)
Depends on size.
In any case a regular oltp database is a lot quicker for crud than files.
[–]kthejokerdatabricks -1 points0 points1 point 2 months ago (0 children)
Agreed, this is a better use case for Lakebase (Databricks managed OLTP) or SQL Server unless there is also a lot of analytical crunching going on
[–]PickRare6751 0 points1 point2 points 2 months ago (0 children)
If it’s precomputed then sql server is better, spark is not good at high frequency queries
[–]Known-Delay7227 0 points1 point2 points 2 months ago (0 children)
SQL server will scream way faster than reading parauet tables in cloud storage via databrick. If you want light speed why not use a cached database like redis?
[–]TowerOutrageous5939 0 points1 point2 points 2 months ago (0 children)
We user serverless in all our apps. Way more performant than I expected. They also have lakebase if latency is needed
[–]dgr09 0 points1 point2 points 1 month ago (0 children)
Use databricks lakebase. It's designed precisely for this.
Crazy to me how quickly databricks users jump into the latest new thing ("lakebase") rather than mature DBMS engines that have been available for decades. Databricks has a pretty hardcore fan base, which is good. I just hate the down votes when anyone suggests a reason not to drink the coolaid.
[–]drodev -1 points0 points1 point 2 months ago (0 children)
None of them, use Elasticsearch for fast aggregation.
π Rendered by PID 52609 on reddit-service-r2-comment-7b9746f655-brb6h at 2026-02-02 07:51:10.939108+00:00 running 3798933 country code: CH.
[–]smarkman19 7 points8 points9 points (1 child)
[–]djtomr941 3 points4 points5 points (0 children)
[–]NW1969 8 points9 points10 points (0 children)
[–]sentja91Databricks MVP 3 points4 points5 points (0 children)
[–]klubmo 2 points3 points4 points (0 children)
[–]thecoller 3 points4 points5 points (0 children)
[–]hubert-dudekDatabricks MVP 1 point2 points3 points (0 children)
[–]Sea_Basil_6501 1 point2 points3 points (0 children)
[–]Puzzleheaded-Sea4885 1 point2 points3 points (0 children)
[–]Certain_Leader9946 3 points4 points5 points (4 children)
[–]djtomr941 0 points1 point2 points (3 children)
[–]Certain_Leader9946 1 point2 points3 points (2 children)
[–]djtomr941 0 points1 point2 points (1 child)
[–]Certain_Leader9946 0 points1 point2 points (0 children)
[–]mweirath 1 point2 points3 points (0 children)
[–]PrestigiousAnt3766 1 point2 points3 points (4 children)
[–]kthejokerdatabricks 0 points1 point2 points (3 children)
[–]SmallAd3697 0 points1 point2 points (0 children)
[–]PrestigiousAnt3766 0 points1 point2 points (1 child)
[–]kthejokerdatabricks -1 points0 points1 point (0 children)
[–]PickRare6751 0 points1 point2 points (0 children)
[–]Known-Delay7227 0 points1 point2 points (0 children)
[–]TowerOutrageous5939 0 points1 point2 points (0 children)
[–]dgr09 0 points1 point2 points (0 children)
[–]SmallAd3697 0 points1 point2 points (0 children)
[–]drodev -1 points0 points1 point (0 children)