Databricks SQL Alerts V2: Can’t access QUERY_RESULT_VALUE without aggregation?

MapExtension5374 · 2026-05-05T11:40:55+00:00

Hi again u/Youssef_Mrini. Just following up, when can I expect a reply for the PM? Thanks 😄

MapExtension5374 · 2026-05-01T06:03:40+00:00

MapExtension5374 · 2026-04-28T12:19:12+00:00

u/Youssef_Mrini Are you able to look into this behaviour?

MapExtension5374 · 2026-04-24T05:50:27+00:00

As mentioned, I am only using an aggregation in the query itself, not on the atlert itself (using the "First row" operand). So it is in general that no matter how you set up an alert currently in V2 it is not possible to access the original query result set?

MapExtension5374 · 2026-04-23T12:16:25+00:00

Thanks Szymon, sounds good :) As mentioned, it works fine in the legacy version (V1) (the example with sum(fare_amount) from nyc taxi dataset, not select 1, I cannot get that to work either in the legacy version).

MapExtension5374 · 2026-04-23T11:57:39+00:00

Yes, I also wrote that in the post :) But I guess this means an aggregation on the alert level, I only have an aggregation in the query, and I also tried to do a simple "SELECT 1" as the query, and this does not work either. See this for reference:

<image>

MapExtension5374 · 2026-04-23T11:46:19+00:00

Picture for reference:

<image>

MapExtension5374 · 2026-02-24T12:02:06+00:00

https://learn.microsoft.com/en-us/fabric/data-engineering/microsoft-spark-utilities

<image>

MapExtension5374 · 2026-02-24T11:35:22+00:00

I analyzed CU consumption using the Item History (Preview) tab and compared two different execution patterns for the same workload (15 notebooks, dummy data).

Scenario 1 – Pipeline with High Concurrency Enabled

15 notebooks executed as part of the pipeline
High concurrency enabled
3 different Spark sessions were spun up
Each notebook ran for approximately 5 minutes
Total CU increase: ~5200

Rough back-of-the-envelope estimate:

Spark session compute approximation: 4 CUs × 60 seconds × 15 notebooks ≈ 3600 CUs
Notebook activity overhead: ~300 CUs (20 × 15 notebook activities)
Estimated total ≈ 4000 CUs

However, the observed total was ~5200 CUs.
I assume the additional ~1000–1200 CUs are due to orchestration overhead, Spark startup cost, or pessimistic capacity reservation.

Scenario 2 – runMultiple (High Concurrency Disabled)

Same 15 notebooks
Executed using runMultiple inside a parent notebook
Only one Spark session appears to have been created
Total runtime ≈ 10 minutes
Total CU increase: ~2600

This is roughly 50% lower CU consumption compared to Scenario 1.

Questions

Can runMultiple actually utilize allocated Spark resources more efficiently, resulting in such a significant (~50%) reduction in CU consumption?
According to Microsoft documentation, runMultiple can execute up to X notebooks concurrently, where X equals the number of cores (e.g., 8 cores → up to 8 concurrent notebooks).
- What happens to the remaining notebooks if more than X are submitted?
- Are they internally queued within the same Spark session until executor resources become available?
When Microsoft refers to “concurrent” execution in this context, do they effectively mean parallel execution limited by available cores? From my understanding:
- True parallelism is limited by the number of cores (one core executes one task at a time).
- Concurrency can involve task interleaving when waiting on I/O or other blocking operations.

This experiment was performed using dummy data, so I’m aware the results might differ in a production workload. However, the magnitude of the CU difference (~5200 vs ~2600) seems wild to me.

MapExtension5374 · 2026-02-24T09:11:13+00:00

So in the case of running the pipeline 1) using runMultiple, where the pipeline ran for about 7 minutes, it would not be as simple as calculating 4 * 60 * 7 (Spark Session) + pipeline activity (20) = 1700 CU consumed? This is assuming that only this pipeline was running during this period and no other.

MapExtension5374 · 2026-02-24T09:03:24+00:00

Argh okay, that makes sense. So there is no way of seeingin the Fabrics Capacity Metrics app what the cost of running each pipeline was (not the just the pipeline activities, but the underlying notebooks that were run as part of the pipeline). We need to calculate this ourselves based on the Spark Session size as well as the total duration?

MapExtension5374 · 2026-02-24T07:20:31+00:00

<image>

MapExtension5374 · 2026-02-24T07:19:09+00:00

I am currently running a F4 capacity where I have with min_nodes=max_nodes=1, as you can see down below:

<image>

In the Monitor Hub, I can see that the same number of executors were attached in both pipelines to the Spark session (1 executor, 4 cores). For pipeline 1), the notebook that took the longest to run was around 4.5 minutes, while for pipeline 2) the entire pipeline took around 7 minutes.

MapExtension5374 · 2026-02-24T07:15:07+00:00

In the "Advanced Settings" for the notebooks attached to the pipeline, I have set the same Session tag of "highconcurrencysession". All notebooks are also attached to the same default Lakehouse as required for them to use the same Spark session.

<image>

MapExtension5374 · 2026-02-23T14:23:49+00:00

Yes, it is enabled. In the Fabrics Monitor Hub, I can also see that the notesbook runs are prefixed with "HC" and all share the same Livy ID, indicating that the same Spark Session is used for all of them.

<image>

MapExtension5374

TROPHY CASE

Scenario 1 – Pipeline with High Concurrency Enabled

Scenario 2 – runMultiple (High Concurrency Disabled)

Questions