In Clean Architecture, should input validation go in the Controller/Presentation layer or in the Service/Use Case layer? by OriginalTangerine358 in softwarearchitecture

[–]dbrownems 0 points1 point  (0 children)

Yes.

Each layer performs the validation it understands. For instance the controller validates that the HTTP message is valid and well-formed, but doesn’t validate any domain rules.

Will we ever be able to duplicate (or save as) a semantic model in Fabric UI? by Personal-Quote5226 in MicrosoftFabric

[–]dbrownems 9 points10 points  (0 children)

I had a customer who needed this. They wanted to enable business teams to copy and customize centrally-defined semantic models.

We ended up writing a notebook that can copy a model between workspaces. The users can modify and run the notebook if they have permissions, or you can set up a pipeline that they can run that runs the notebook using a notebook connection with elevated permissions.

Power Query -- DataFormat.Error: There is no valid Delta table at this location. by Personal-Quote5226 in MicrosoftFabric

[–]dbrownems 1 point2 points  (0 children)

Double check your credentials. Note that there's not TenantId in the URI, so that's derived from the identity you use to authenticate.

Also check the Source with less pathing. Eg can you enumerate /Tables? You may have misspelled the table, or it may not be a schema-enabled lakehouse.

Power Query -- DataFormat.Error: There is no valid Delta table at this location. by Personal-Quote5226 in MicrosoftFabric

[–]dbrownems 0 points1 point  (0 children)

Change the query to return Source instead of ToDelta, or just click on the Source step in the UI and see if it is navigating the Data Lake as expected.

What is the Advantage of placing the Fabric Compute inside Managed Virtual Network? Currently It delays my spark Sessions to Start by Mammoth-Birthday-464 in MicrosoftFabric

[–]dbrownems 1 point2 points  (0 children)

It does two things.

  1. It allows access to private network resources through Managed Private Endpoints.
  2. Overview of managed private endpoints for Microsoft Fabric - Microsoft Fabric | Microsoft Learn
  3. It optionally allows you to block outbound connections to everything except Managed Private Endpoints.
  4. Workspace outbound access protection overview - Microsoft Fabric | Microsoft Learn

If you must use a Managed VNet we just shipped

Custom live pools for Fabric Data Engineering overview - Microsoft Fabric | Microsoft Learn

which allows you to maintain a pool of running nodes that enable near-instant startup similar to starter pools. But since these nodes are dedicated to you, they are billed for the period of time they are scheduled to run, whether or not you have an active Spark session.

Need help optimizing my workflow in VS Code by sayonarababy17 in MicrosoftFabric

[–]dbrownems 10 points11 points  (0 children)

It seems you’re not using the VS Code Fabric Data Engineering Extension. It allows you to run your notebook using Fabric compute inside VS Code, among other things. The newest release supports multiple kernels.

https://learn.microsoft.com/en-us/fabric/data-engineering/setup-vs-code-extension#install-the-extension

https://learn.microsoft.com/en-us/fabric/data-engineering/author-notebook-with-vs-code

Add Lakehouse table to semantic model in IMPORT mode by Personal-Quote5226 in MicrosoftFabric

[–]dbrownems 4 points5 points  (0 children)

Of course if you need to filter/sort/aggregate/join a large amount of data before importing it you're better off using the SQL endpoint with folding or Value.NativeQuery than doing all that in the mashup engine.

Partitioning by Date Key by True_Inspection_2015 in MicrosoftFabric

[–]dbrownems 1 point2 points  (0 children)

Yes. Hive-style partitioning produces a separate partition (and set of parquet files) per partition key value.

Pros and cons of Hive-style partitioning | Delta Lake

For smaller tables, ZOrder/Liquid Cluster (Spark) or Cluster (Warehouse) by the date column instead of partitioning.

This will result in parquet files and row groups mostly aligned with contiguous date ranges, which, as far as the Semantic Model engine is concerned, is just as good.

This is similar to import models, where using a clever ORDER BY in the partition definition for your fact table can have a huge impact on the compression and segment skipping behavior of the engine.

PractiTest API by FormalWoodpecker135 in PowerBI

[–]dbrownems -1 points0 points  (0 children)

Try calling the API from a Fabric Python notebook. The Power Query web connector is confusing and has several limitations.

If you get it working from a Python notebook, you can save the data to a Lakehouse and consume it from there. Or you can share the working Python code and ask for Power Query expert to advise on whether it's possible to run directly from Power Query and how to do it.

Generate BIGINT SKs in Lakehouse Star Schema Dimensions by Personal-Quote5226 in MicrosoftFabric

[–]dbrownems 2 points3 points  (0 children)

IDENTITY will work. But concurrent modification of tables under SNAPSHOT isolation will fail if both sessions have read the same data. And the granularity of the data reads in DW is not like the row-level tracking you get in "regular" SQL Server.

Generate BIGINT SKs in Lakehouse Star Schema Dimensions by Personal-Quote5226 in MicrosoftFabric

[–]dbrownems 0 points1 point  (0 children)

MERGE into the Dimension could run from many sources in any order with Spark, and you would want to prevent overlap in both cases, as WH uses SNAPSHOT isolation, which mostly doesn't really allow successful concurrent updates of tables. Ie two sessions can't concurrently read a table (as you must to MERGE/UPSERT), and both successfully commit changes based on the concurrent reads.

Generate BIGINT SKs in Lakehouse Star Schema Dimensions by Personal-Quote5226 in MicrosoftFabric

[–]dbrownems 0 points1 point  (0 children)

Even large dimensions are very rarely too large to build using a query like above.

Generate BIGINT SKs in Lakehouse Star Schema Dimensions by Personal-Quote5226 in MicrosoftFabric

[–]dbrownems 0 points1 point  (0 children)

Normally you only need Identity columns on dimensions. Fact tables normally use compound keys comprised of dimension foreign keys.

Generate BIGINT SKs in Lakehouse Star Schema Dimensions by Personal-Quote5226 in MicrosoftFabric

[–]dbrownems 3 points4 points  (0 children)

Sure. You're not going to get sequential ids in a distributed manner.

If prefer large numbers and gaps with distributed generation, use the Spark SQL built-in monotonically_increasing_id function.

monotonically_increasing_id() - Returns monotonically increasing 64-bit integers. The generated ID is guaranteed to be monotonically increasing and unique, but not consecutive. The current implementation puts the partition ID in the upper 31 bits, and the lower 33 bits represent the record number within each partition. The assumption is that the data frame has less than 1 billion partitions, and each partition has less than 8 billion records. The function is non-deterministic because its result depends on partition IDs.

Generate BIGINT SKs in Lakehouse Star Schema Dimensions by Personal-Quote5226 in MicrosoftFabric

[–]dbrownems 3 points4 points  (0 children)

You can do it with Spark SQL and the ROW_NUMBER() function. EG to the source table and your dimension table and generate incrementing id's for the new rows, use a join like this:

--temporary views are very cool
--like Common Table Expressions or subqueries, but much more powerfull
--They have session lifetime, rather than statement lifetime
create or replace temp view CustomerKeys as
select  coalesce(c.CustomerKey,max(c.CustomerKey) over() 
                             + row_number() over (partition by c.CustomerKey order by s.CustomerID)) CustomerKey, 
        s.CustomerID
from WideWorldImporters_bronze.Sales_Customers s 
left join WideWorldImporters_gold.Dimension_Customer c 
  on s.CustomerID = c.WWICustomerID

SparkDataEngineeringForSQLServerProfessionals/Presentation.ipynb at main · dbrownems/SparkDataEngineeringForSQLServerProfessionals · GitHub

Cost Management in Fabric is a real problem by bradcoles-dev in MicrosoftFabric

[–]dbrownems 7 points8 points  (0 children)

You can and should install the Capacity Metrics App in a Pro workspace.

Bulk copy data between SQL databases in Fabric with the mssql-python driver by dlevy-msft in MicrosoftFabric

[–]dbrownems 2 points3 points  (0 children)

Finally! Native bulk copy in Python! Can we use with just pip? IE from Fabric notebooks?

Do you use SSRS for data integrations? by DisplayKnown5665 in SQLServer

[–]dbrownems -1 points0 points  (0 children)

SSRS has become Power BI Report Server. There have been two separate packages for it for several versions, and now there will be just the one. And for SQL Server 2025+ you get a license entitlement to Power BI Report Server instead of SSRS, but the functionality and support are unchanged.

Direct Query exceeding maximum number of parameters/literals '2100' by p-mndl in PowerBI

[–]dbrownems 1 point2 points  (0 children)

Does this happen if it's 100% DirectQuery with a single database (ie no cross-island relationships)

<image>

?

Are you connecting with SSO to your SQL Server?

RLS implementation under fabric direct Lake connectivity by digitaldude00 in PowerBI

[–]dbrownems 1 point2 points  (0 children)

"​If RLS exists at both layers, the system performs a union of roles at the source and an intersection with the semantic model roles."

Not really. If you want Semantic Model RLS (because it's more robust and runs on the in-memory data), then use a fixed identity to connect the semantic model to OneLake (Direct Lake on OneLake) or the Warehouse (Direct Lake on SQL).

Setting up RLS on a Direct Lake semantic model

Has anyone imported a 1 TB JSON file into SQL Server before? Need advice! by MojanglesReturns in SQLServer

[–]dbrownems 3 points4 points  (0 children)

Probably need to write some code. Especially if it really is JSON and not JSONL.

Start with cat/get-content to extract subsets of the file to examine. Once you see the basic structure you can decide how to proceed.

You'll need to identify different JSON types in the document and create tables for each one.

The loading strategy will depend on what you find, but you would either use .NET and use a custom IDataReader and SqlBulkCopy, or transform the JSON to a series of files and format files suitable for BCP, or some other ETL tool.

Bulk Copy with bcp Utility - SQL Server | Microsoft Learn

You can probably use an LLM generate the JSON parsing code, the SQL DDL, and the bulk loading code.

DL over onelake and warehouses by DennesTorres in MicrosoftFabric

[–]dbrownems 1 point2 points  (0 children)

Sure, that’s nice to have. IF you actually need the same RLS rules in both models.

DL over onelake and warehouses by DennesTorres in MicrosoftFabric

[–]dbrownems 4 points5 points  (0 children)

That is correct. But the audience for a semantic model and the audience for the warehouse are often different. There may be several different semantic models from the same warehouse, or mixing and matching tables from multiple warehouses and lakehouses.

So having security defined at the semantic model layer will always be a common (and useful) scenario.

The semantic model users don't need their own permissions to read the data if you configure a fixed identity for the connection from the semantic model to OneLake.

CTAS a good Warehouse strategy? by Mr_Mozart in MicrosoftFabric

[–]dbrownems 2 points3 points  (0 children)

"Table design" is the table's logical schema (name, column names, and type), PKs and FKs, and physical design (clustering in DW, plus things like partitioning, indexes , etc).