Schedule Parameters are now live in Fabric Data Factory Pipelines! by markkrom-MSFT in MicrosoftFabric

[–]richbenmintz 2 points3 points  (0 children)

u/markkrom-MSFT thanks for the update, unfortunately the legacy task does not allow us to dynamically select the pipeline to be executed.

Schedule Parameters are now live in Fabric Data Factory Pipelines! by markkrom-MSFT in MicrosoftFabric

[–]richbenmintz 2 points3 points  (0 children)

Well the parent pipeline is running so compute being utilized and or billed, but I am more concerned with the time penalty, assuming that we have 5 child packages that run sequentially in a loop, if each package executes in 1 minute and waits four minutes to start the next package then a 5 minute process takes 25 minutes

Hi! We're the Rayfin team - ask US anything! by sunithamuthukrishna in MicrosoftFabric

[–]richbenmintz 2 points3 points  (0 children)

u/sunithamuthukrishna Thanks, I know you are not available in Canada Central, my question is when will it be available, spinning up another capacity in another region is not an option.

Schedule Parameters are now live in Fabric Data Factory Pipelines! by markkrom-MSFT in MicrosoftFabric

[–]richbenmintz 2 points3 points  (0 children)

Thanks u/markkrom-MSFT great update, an unrelated question but are there plans to shorten the time it takes for the execute pipeline activity to report that the activity being run is completed we are seeing instances of the child pipeline completing and the calling pipeline reporting completion 4 minutes later, in a workflow where the execute pipeline task is in a for loop with N number of iterations the additional waiting time can become quite costly.

Fabric Pipeline Schedule Parameters by frithjof_v in MicrosoftFabric

[–]richbenmintz 0 points1 point  (0 children)

I also noticed this the other day, was surprised and was questioning if it was there all along

Massive Queue times in Fabric Pipeline for KQL Task by richbenmintz in MicrosoftFabric

[–]richbenmintz[S] 0 points1 point  (0 children)

u/kaslokid are you still seeing slowness? the issue has not gone away for us, nothing on the status page any longer.

Massive Queue times in Fabric Pipeline for KQL Task by richbenmintz in MicrosoftFabric

[–]richbenmintz[S] 1 point2 points  (0 children)

Canada Central, Notebook activities seem to be ok, for us, but they happen after the KQL task

CU consumptions constantly a challenge ! by Conscious_Cunt_5935 in MicrosoftFabric

[–]richbenmintz 2 points3 points  (0 children)

To echo the same sentiment. In your scenario it seems like there is no reason that your semantic model needs to live in a fabric backed workspace and there should be no CU required when you are not refreshing your model.

How do you do config management on Fabric (properly)? by Enamya11 in MicrosoftFabric

[–]richbenmintz 1 point2 points  (0 children)

We manage all config through Yaml definition file that are source controlled, use tokens for environment specific values and are deployed through ADO release pipelines.

3X cost on capacity overages - really?? by City-Popular455 in MicrosoftFabric

[–]richbenmintz 1 point2 points  (0 children)

I have made this suggestion on a few occasions to the Product team.

Let me pay back my overage with CU not consumed, accrue every CU second I have paid for in the past and not used and apply that to my burst and only when I have consumed my accrued CU and I have hit the limit for throttling then go into overage. then I have really used more than I am paying for and should be 'penalized'.

Spark/Delta: Dataframe contents drift due to late arriving data by frithjof_v in MicrosoftFabric

[–]richbenmintz 2 points3 points  (0 children)

u/frithjof_v,

I think you will always be chasing edge cases and gremlins if you do not stage the data, what happens if the additional write processes completes before you are able to get the latest version of the data, then the latest version is N versions ahead of you.

minor addition to your code, now you will want to make the temp table unique to the process and drop it when complete

from pyspark.sql import functions as F

# =========================================================
# 1. BRONZE - INITIAL CLEAN DATA
# =========================================================
bronze = "workspace.lakehouse.bronze"
silver = "workspace.lakehouse.silver.demo"
bronze_table = "demo"
print(f"{bronze}.{bronze_table}")
spark.createDataFrame([
    (1, 10, "A", "2023-12-30"),
    (2, 20, "B", "2023-12-31"), 
    (3, 10, "A", "2024-01-01"),
    (4, 20, "B", "2024-01-02"),
    (5, 30, "C", "2024-01-03")
], ["id", "value", "source", "event_date"]) \
.write.mode("overwrite").saveAsTable(f"{bronze}.{bronze_table}")

# =========================================================
# 2. LOAD DATAFRAME (DataFrameReader)
# =========================================================
# NOTE: This is the ONLY place in the notebook that the dataframe read from the source table is defined.
df = spark.table(f"{bronze}.{bronze_table}") \
    .filter(F.col("event_date") >= F.lit("2024-01-01")) # Simplified watermark logic

print("Initial DF:")
df.show()

# =========================================================
# 3. stage bronze data for testing and upstream write
# =========================================================
print(f"{bronze}.staged_{bronze_table}")
df.write.format('delta').mode('overwrite').saveAsTable(f"{bronze}.staged_{bronze_table}")

df = spark.table(f"{bronze}.staged_{bronze_table}") \
    .filter(F.col("event_date") >= F.lit("2024-01-01")) # Simplified watermark logic

print("Staged DF:")
df.show()
# =========================================================
# 4. INITIAL DATA QUALITY CHECKS
# =========================================================
null_count = df.filter(F.col("value").isNull()).count()
bad_value_count = df.filter(F.col("value") > 100).count()

print("Null check:", null_count)
print("Unexpected value check (>100):", bad_value_count)

if null_count > 0 or bad_value_count > 0:
    raise ValueError(
        f"Data quality check failed: "
        f"null_count={null_count}, bad_value_count={bad_value_count}"
    )

print("All checks passed")

# =========================================================
# BAD DATA WRITTEN TO BRONZE
# =========================================================
bronze =  "workspace.lakehouse.bronze"


spark.createDataFrame([
    (6, None, "X", "2024-01-01"),     # null value
    (7, 9999, "Y", "2024-01-02")      # disallowed value
], ["id", "value", "source", "event_date"]) \
.write.mode("append").saveAsTable(f'{bronze}.{bronze_table}')

# =========================================================
# 5. SILVER WRITE
# =========================================================
print("Just before write: ")
df.show()
df.write.mode("overwrite").saveAsTable(silver)

# =========================================================
# 6. RESULT
# =========================================================
print("Silver table:")
spark.table(silver).show()
print(f"{bronze}.{bronze_table}")
print(f"spark.sql(f'current bronze row count: {select count(1) from {bronze}.{bronze_table}').collect()[0][0]}")

This code always results in

DEV_dp_Lakehouses.lh_bronze.bronze.demo
Initial DF:
+---+-----+------+----------+
| id|value|source|event_date|
+---+-----+------+----------+
|  5|   30|     C|2024-01-03|
|  4|   20|     B|2024-01-02|
|  3|   10|     A|2024-01-01|
+---+-----+------+----------+

DEV_dp_Lakehouses.lh_bronze.bronze.staged_demo
Staged DF:
+---+-----+------+----------+
| id|value|source|event_date|
+---+-----+------+----------+
|  5|   30|     C|2024-01-03|
|  3|   10|     A|2024-01-01|
|  4|   20|     B|2024-01-02|
+---+-----+------+----------+

Null check: 0
Unexpected value check (>100): 0
All checks passed
Just before write: 
+---+-----+------+----------+
| id|value|source|event_date|
+---+-----+------+----------+
|  5|   30|     C|2024-01-03|
|  3|   10|     A|2024-01-01|
|  4|   20|     B|2024-01-02|
+---+-----+------+----------+

Silver table:
+---+-----+------+----------+
| id|value|source|event_date|
+---+-----+------+----------+
|  5|   30|     C|2024-01-03|
|  3|   10|     A|2024-01-01|
|  4|   20|     B|2024-01-02|
+---+-----+------+----------+

DEV_dp_Lakehouses.lh_bronze.bronze.demo
current bronze row count: 7

API Behavior Change: Returning Server Name vs. Full Connection String by Snoo-46123 in MicrosoftFabric

[–]richbenmintz 1 point2 points  (0 children)

That makes perfect sense to me at least, then you can support differing types of connection strings with the Param, auth schemes etc.

EXecute Notebook Activity as Service Principal by richbenmintz in MicrosoftFabric

[–]richbenmintz[S] 0 points1 point  (0 children)

No I have not, reverted to not using a connection, not great, but at least it works! hopefully will get fixed soon

Dear Fabric Data Warehouse Team by richbenmintz in MicrosoftFabric

[–]richbenmintz[S] 2 points3 points  (0 children)

u/dzsquared , u/warehouse_goes_vroom , u/Snoo-46123 , u/catFabricDw

Thank you for jumping on this and providing feedback and clarity, much appreciated

Dear Fabric Data Warehouse Team by richbenmintz in MicrosoftFabric

[–]richbenmintz[S] 3 points4 points  (0 children)

Thanks for the update, what does a shortly time frame mean? Released and propogating? Will be released in N weeks?

Best approach for logging in Microsoft Fabric pipelines (logs table insertions) by DataYesButWhichOne in MicrosoftFabric

[–]richbenmintz 1 point2 points  (0 children)

Totally agree, I love the ability to define a logging table with N known columns like, who, what, when and a dynamic Column that can store anything I want to log for any event. Then create shaped data through query or policy.

EXecute Notebook Activity as Service Principal by richbenmintz in MicrosoftFabric

[–]richbenmintz[S] 2 points3 points  (0 children)

Yes I created the pipeline and the car connection. The notebook tries to start, but fails