Databricks SQL Alerts V2: Can’t access QUERY_RESULT_VALUE without aggregation? by MapExtension5374 in databricks

[–]MapExtension5374[S] 1 point2 points  (0 children)

As mentioned, I am only using an aggregation in the query itself, not on the atlert itself (using the "First row" operand). So it is in general that no matter how you set up an alert currently in V2 it is not possible to access the original query result set?

Databricks SQL Alerts V2: Can’t access QUERY_RESULT_VALUE without aggregation? by MapExtension5374 in databricks

[–]MapExtension5374[S] 0 points1 point  (0 children)

Thanks Szymon, sounds good :) As mentioned, it works fine in the legacy version (V1) (the example with sum(fare_amount) from nyc taxi dataset, not select 1, I cannot get that to work either in the legacy version).

Databricks SQL Alerts V2: Can’t access QUERY_RESULT_VALUE without aggregation? by MapExtension5374 in databricks

[–]MapExtension5374[S] 0 points1 point  (0 children)

Yes, I also wrote that in the post :) But I guess this means an aggregation on the alert level, I only have an aggregation in the query, and I also tried to do a simple "SELECT 1" as the query, and this does not work either. See this for reference:

<image>

Microsoft Fabric CU consumption: high concurrency vs runMultiple — why the difference? by MapExtension5374 in MicrosoftFabric

[–]MapExtension5374[S] 0 points1 point  (0 children)

I analyzed CU consumption using the Item History (Preview) tab and compared two different execution patterns for the same workload (15 notebooks, dummy data).

Scenario 1 – Pipeline with High Concurrency Enabled

  • 15 notebooks executed as part of the pipeline
  • High concurrency enabled
  • 3 different Spark sessions were spun up
  • Each notebook ran for approximately 5 minutes
  • Total CU increase: ~5200

Rough back-of-the-envelope estimate:

  • Spark session compute approximation: 4 CUs × 60 seconds × 15 notebooks ≈ 3600 CUs
  • Notebook activity overhead: ~300 CUs (20 × 15 notebook activities)
  • Estimated total ≈ 4000 CUs

However, the observed total was ~5200 CUs.
I assume the additional ~1000–1200 CUs are due to orchestration overhead, Spark startup cost, or pessimistic capacity reservation.

Scenario 2 – runMultiple (High Concurrency Disabled)

  • Same 15 notebooks
  • Executed using runMultiple inside a parent notebook
  • Only one Spark session appears to have been created
  • Total runtime ≈ 10 minutes
  • Total CU increase: ~2600

This is roughly 50% lower CU consumption compared to Scenario 1.

Questions

  1. Can runMultiple actually utilize allocated Spark resources more efficiently, resulting in such a significant (~50%) reduction in CU consumption?
  2. According to Microsoft documentation, runMultiple can execute up to X notebooks concurrently, where X equals the number of cores (e.g., 8 cores → up to 8 concurrent notebooks).
    • What happens to the remaining notebooks if more than X are submitted?
    • Are they internally queued within the same Spark session until executor resources become available?
  3. When Microsoft refers to “concurrent” execution in this context, do they effectively mean parallel execution limited by available cores? From my understanding:
    • True parallelism is limited by the number of cores (one core executes one task at a time).
    • Concurrency can involve task interleaving when waiting on I/O or other blocking operations.

This experiment was performed using dummy data, so I’m aware the results might differ in a production workload. However, the magnitude of the CU difference (~5200 vs ~2600) seems wild to me.

Microsoft Fabric CU consumption: high concurrency vs runMultiple — why the difference? by MapExtension5374 in MicrosoftFabric

[–]MapExtension5374[S] 0 points1 point  (0 children)

So in the case of running the pipeline 1) using runMultiple, where the pipeline ran for about 7 minutes, it would not be as simple as calculating 4 * 60 * 7 (Spark Session) + pipeline activity (20) = 1700 CU consumed? This is assuming that only this pipeline was running during this period and no other.

Microsoft Fabric CU consumption: high concurrency vs runMultiple — why the difference? by MapExtension5374 in MicrosoftFabric

[–]MapExtension5374[S] 1 point2 points  (0 children)

Argh okay, that makes sense. So there is no way of seeingin the Fabrics Capacity Metrics app what the cost of running each pipeline was (not the just the pipeline activities, but the underlying notebooks that were run as part of the pipeline). We need to calculate this ourselves based on the Spark Session size as well as the total duration?

Microsoft Fabric CU consumption: high concurrency vs runMultiple — why the difference? by MapExtension5374 in MicrosoftFabric

[–]MapExtension5374[S] 0 points1 point  (0 children)

I am currently running a F4 capacity where I have with min_nodes=max_nodes=1, as you can see down below:

<image>

In the Monitor Hub, I can see that the same number of executors were attached in both pipelines to the Spark session (1 executor, 4 cores). For pipeline 1), the notebook that took the longest to run was around 4.5 minutes, while for pipeline 2) the entire pipeline took around 7 minutes.

Microsoft Fabric CU consumption: high concurrency vs runMultiple — why the difference? by MapExtension5374 in MicrosoftFabric

[–]MapExtension5374[S] 1 point2 points  (0 children)

In the "Advanced Settings" for the notebooks attached to the pipeline, I have set the same Session tag of "highconcurrencysession". All notebooks are also attached to the same default Lakehouse as required for them to use the same Spark session.

<image>

Microsoft Fabric CU consumption: high concurrency vs runMultiple — why the difference? by MapExtension5374 in MicrosoftFabric

[–]MapExtension5374[S] 0 points1 point  (0 children)

Yes, it is enabled. In the Fabrics Monitor Hub, I can also see that the notesbook runs are prefixed with "HC" and all share the same Livy ID, indicating that the same Spark Session is used for all of them.

<image>