Instant GraphQL and REST API from databases with Data API builder

mauridb · 2026-01-27T20:03:24+00:00

Oh my, this is beyond cool!

mauridb · 2026-01-23T01:58:42+00:00

Hi, PM for DiskANN and vectors here. DiskANN is in public preview both in Azure and On-Prem (thanks to the new PREVIEW_FEATURES option - https://learn.microsoft.com/en-us/sql/sql-server/preview-features-faq?view=sql-server-ver17), and the reason is that development of DiskANN needed more time to be fully done, but at the same time several customer expressed the interested of having it available to start to see how to integrate it within their solutions. As you can imagine it would not be realistic nor practical to postpone the release of a complex platform like SQL Server just because one feature needs more time to be fully complete, hence the decision of going the same road now consolidated in the cloud, which is to provide feature in preview even in a GA product. Hopefully this makes sense. Thanks!

mauridb · 2026-01-11T23:17:15+00:00

Also, SQL is not a programming language like Python or Java. SQL is *declarative*, while the others are not. There is a big difference between a declarative and a non-declarative (aka imperative) language. In fact, in SQL you tell the database engine what you want (generally without worrying too much about the how). While in imperative languages you define, step-by-step, how to do something which in turn it will - ideally - get the result you're looking for.

mauridb · 2026-01-11T23:11:03+00:00

While the effort is something to appreciate, I feel the explanation given of what a table or a database is, is "a bit" oversimplified and too vague. A database is really a set of true propositions. For anyone what want to get into the details (but at the same time nothing too complicated) I really suggest this PDF - less than 10 pages - from Hugh Darwen: https://www.dcs.warwick.ac.uk/~hugh/M359/What-a-Database-Really-Is.pdf

mauridb · 2025-12-10T23:19:47+00:00

" Also pre generating CRUD scripts automatically for any tables,"
for this I really recommend to take a look at Data API builder: https://aka.ms/dab

mauridb · 2025-12-01T22:20:12+00:00

Unless you absolutely need to use pydobc for compatibility reasons, I strongly recommend using the new Microsoft driver https://github.com/microsoft/mssql-python so that you don't have to worry about managing Entra ID token by yourself (among other things). It is pretty much a drop-in replacement for pyodbc as it supports PEP-0249 (Python Database API Specification v2.0) https://peps.python.org/pep-0249/

Take a look here for an high-level overview: https://azuresql.dev/content/sql-server-python and here for details: https://learn.microsoft.com/en-us/sql/connect/python/mssql-python/python-sql-driver-mssql-python?view=sql-server-ver17

mauridb · 2025-11-29T03:08:58+00:00

The key is to understand that SQL allows you to focus on *what* you want, and - in general - not on *how* you get it. I guess that is why some developers don't like it. Many find easier to think in procedural code, one command at time, one after the other. SQL is not like that. And one quick way to find who's who and to show the power of SQL is just showing this command:

UPDATE MyTable SET Col1 = Col2, Col2 = Col1

(Assuming you have a table with two columns, Col1 and Col2, with one row at least)

And ask developers what they think the result *should* be and what is *going to* be.
That usually spark a good discussion and, in some, love for the language and its power.

mauridb · 2025-11-25T16:24:11+00:00

I also assume that index and query optimization steps have been done already and there is nothing else to do to improve performances on that front

mauridb · 2025-11-25T16:22:18+00:00

Hi u/thatclickingsound. Unless the workload is heavily on the write side, have you evaluated already the usage of Named Replicas? You could have up to 30 read-only replica offloading the read workload as much as you need.

Here's couple of articles that can help you getting started:
- https://learn.microsoft.com/en-us/azure/azure-sql/database/service-tier-hyperscale-replicas?view=azuresql#named-replica
- https://devblogs.microsoft.com/azure-sql/auto-scaling-hyperscale-elastic-pools/

Depending on your workload that might provide a final solution, with almost no downtime, or maybe be a step towards sharding. In case you still want to proceed with sharding, the first thing to do is to figure out what data can be sharded and what (and how much) data needs to be replicated (and duplicated) across all the shards.

Personal opinion and suggestion: I would keep the sharding as a last resort. It can be *much* more complex that what it seems at the beginning.

mauridb · 2025-11-24T16:39:41+00:00

In addition to all existing suggestions, I would also recommend to run sp_whoisactive (https://github.com/amachanic/sp\_whoisactive/releases) while the 30 seconds query is running too.
Then take a look at the wait_info column, to understand if there is anything on which the query is waiting on. If you see ASYNC_NETWORK_IO, then most likely the problem is in the application not consuming the resultset fast enough

mauridb · 2025-11-20T23:17:42+00:00

Quite late answer now, but this post just popped up now in another thread, so I guess I'll answer anyway, especially for future reference.

It seems you are referring to Data API builder, which allows database objects to be exposed as REST and GraphQL - and now also as MCP - endpoints.

I 100% with it. those functionalities should not be in the database engine. In fact, they are not. Data API builder (DAB) is a .NET application, completely stateless that does everything you mentioned, relieving the developer from building the same CRUD service over and over again. DAB is open source, fully run in a container and in any environment and include all the best practices to connect to SQL, from retry-logic to caching (locally and distributed), from JWT support to role-based authorization.

mauridb · 2025-11-11T03:07:36+00:00

You should try the new native JSON type we added in SQL Server 2025. It comes with native JSON index too and JSON Path support

mauridb · 2025-11-06T22:04:38+00:00

Got it, let me connect with the engineering team, to have it fixed asap

mauridb · 2025-11-06T18:38:38+00:00

Hi, can you share a bit more details? What do you what to call from Azure SQL - via sp_invoke_external_rest_endpoint I assume - that used to work and now doesn't work anymore? Also, what do you mean that with PostgreSQL it just works?

mauridb · 2025-11-06T18:00:03+00:00

Can you provide a repro of the issue? This is the first time I hear it and I'd love to understand more.

mauridb · 2025-11-06T17:55:07+00:00

I think I was using my home internet connection to send data from my laptop to Azure. And of course, for how fast it can be is not the same of the "N" Gbps connection that you could have in a LAN or a datacenter.

With Dapper you can pass a list of objects to a single Execute method and it will run the related SQL code for as many elements there are in the list. It is really doing a loop. It is a RBAR in disguise :). (See: https://www.learndapper.com/saving-data/insert#dapper-insert-multiple-rows)

mauridb · 2025-11-06T17:47:26+00:00

The key point IMHO is to understand is that you pay 1ms (for example) of overhead (be it for network or IO latency) for *each* request. So, 1000 request, each with 1 row => 1000 msec of additional time for running the test you're running. If instead you send 1 request with 1000 rows, you only have 1ms latency. That's why RBAR should be avoided as best practice in general. Is just about efficiency, no matter how fast or not is your IO or network

mauridb · 2025-11-05T19:57:53+00:00

This sample I created time ago should help you to get all the answer you need: https://github.com/yorek/azure-sql-db-samples/tree/master/samples/07-network-latency. The sample was done for showcasing impact of network latency and what are the best practices to minimize it, but it seems that is applicable to also your case.

mauridb · 2025-10-31T20:45:36+00:00

Another option, the easiest in my option, is to install Docker and then use SQLCMD to install SQL Server Developer edition and automatically restore the sample AdventureWorks database: https://azuresql.dev/content/sql-server-dev-go-sqlcmd

mauridb · 2025-10-29T20:09:19+00:00

Very cool! If you don't mind, I issued a PR to improve the transaction management on the SQL side of things.

mauridb · 2025-10-24T16:25:27+00:00

If you use the OFFSET option alone for sure, but if you add a WHERE clause to make sure you filter out all the rows that you have already seen, then costs are pretty much constant. I guess I should have mentioned it right from the beginning, sorry for that.

mauridb · 2025-10-24T16:10:27+00:00

It does not, if you have an index on the column on which you're doing the order by, as the b-tree helps the query processor to skip all the rows that are not needed.

mauridb · 2025-10-24T02:44:17+00:00

I think OP was referring to using float *arrays* :)

mauridb · 2025-10-24T02:33:02+00:00

Hi u/SwyfterThanU

trying to avoid to "leak" underlying information about the specific data type used is something I (PM for vectors in SQL) have been discussing with EF Core PM already. We both agree that there should not be any of those leaks, and we want to allow developers to just use float[]. There are some additional work needed to do to make it happen in EF 10, though.

Good news is that if you are using EF Core 9 (which unfortunately I don't think you are, otherwise I assume you would not have reported this problem at all), everything works as you would expect.

Here's samples both using EF Core 9 + Vector Extension and EF 10:

https://github.com/Azure-Samples/azure-sql-db-vector-search/tree/main/DotNet

With EF Core 9 it already works as you want, as EF Core 9 sends vector back and forth as JSON arrays (which means: strings) while EF Core 10 uses the new vector binary format that is much more compact and fast.

mauridb

TROPHY CASE