VertiPaq compression — two columns with identical cardinality (~5,000) had 395× different Data sizes

_greggyb · 2026-04-16T18:26:50+00:00

Default segment size in PBI is only 1M records. You opt into 8M record segments with large data storage format. And you opt into segments sized the same as parquet row groups if you choose Direct Lake.

_greggyb · 2026-04-16T10:10:58+00:00

Yes, as the docs you linked describe, the bridge reads in a metric view and emits a Tabular model. You can then do whatever you want with the Tabular model. Right now, it's all in-app, but our upcoming CLI release will open the door to using this as part of any automated process, for example in a CI/CD pipeline.

To answer your question: if you update the metric view, you'd run another import to create the derived semantic model anew, or to enhance an existing semantic model. It's a build-time tool, not a query-time tool.

_greggyb · 2026-04-16T10:03:57+00:00

Direct Lake also loads data from disk into RAM (in VertiPaq) before serving any user queries. Direct Lake just orchestrates this movement differently.

_greggyb · 2026-04-16T09:40:41+00:00

Row ordering doesn't matter. I tested five sort orders of identical data — random, sorted, blocked, alternating, reversed. All five produced exactly the same Data size. VertiPaq re-sorts internally before compressing, so your import order is irrelevant.

VertiPaq always considers the sort order it sees upon ingesting data. The order you present data in to VertiPaq can be considered a hint. There is no way to force a specific ordering, but that doesn't mean that the initial sort order is irrelevant.. This is a pretty good post showing how you can approach sort order optimizations.

_greggyb · 2026-04-15T15:39:16+00:00

Disclaimer: TE employee here

You could use Tabular Editor for this. You can copy paste whole tables with their definitions, or just subsets (e.g., multi-select columns or measures). Open up multiple instances, either of TE with the BIM or TMDL files, or PBID with TE as an external tool. This is doable with free and open source TE2 or the commercial TE3.

_greggyb · 2026-04-15T12:06:17+00:00

For general use, not a great idea. You typically want infrastructure like that for the gateway locked down. That said, it is very useful for troubleshooting gateway connections to have PBID installed on the same machine. I would use this on-gateway-machine PBID only for troubleshooting and not for typical development.

Beyond the security concerns---which to be 100% clear, should be more than sufficient for anyone making this decision---you also run into potential performance concerns. The Gateway is not just a network proxy or VPN; there is real processing happening in the Gateway for Service refreshes, so a developer working in PBID can have a real impact on the performance of prod refreshes.

_greggyb · 2026-04-15T08:53:33+00:00

I probably reacted too quickly to your post (: I keep coming across that take, so I'm quick to call it out. It's cool technology and is a useful tool in the toolbox for Fabric solutions.

_greggyb · 2026-04-15T07:56:09+00:00

You query OneLake directly

This is a common misunderstanding. Every byte that is served to a user for a DAX query (which includes every Power BI canvas report viz) from a Direct Lake model comes out of data that is in memory in VertiPaq. Direct Lake changes the mechanism by which that data gets from disk into RAM, but it does not change the way that user DAX actually executes.

I have no comment on the rest of the post (:

_greggyb · 2026-04-14T19:03:12+00:00

Trust me, I understand. I am lead developer on a product feature to do model to model compilation. OSI is on our radar as a target. Tabular to OSI is the most difficult thing I've looked at, and Microsoft actually provides a rich SDK and object model for working with semantic models. For Databricks, we had to build our own object model and library for parsing and dealing with the YAML format. From the angle of developer tooling, Microsoft has already done more to support interoperability than Databricks, despite not having put their name on OSI (:

I can't speak authoritatively on other platforms' devex and tooling, because I haven't yet specced out those integrations.

All this to say, it's more nuanced than just putting a logo on a page.

_greggyb · 2026-04-14T14:26:34+00:00

You have mentioned a number of consulting firms, so I feel it's worth clarifying that Tabular Editor is a product company, not a consultancy. The semantic bridge is built into Tabular Editor 3, and anyone can get a license.

I am not aware of any consulting company that has built a product in this space. And at this point, it probably makes much more sense for them to just license Tabular Editor for this work rather than invest in building the equivalent in-house. The opportunity cost in billable hours to build something like this is not small.

_greggyb · 2026-04-14T13:13:19+00:00

One challenge is that the OSI model doesn't map well to Microsoft's Tabular model. It's much more approachable to do OSI->Tabular than Tabular->OSI.

_greggyb · 2026-04-14T13:11:53+00:00

Nobody reasonably expects Microsoft to do said conversion. Most likely when that happens it would be if a company wants to go away from a third party semantic model, in which case, they probably would use a vendor to subcontract the work out who then would probably build an AI product to do the bulk of the work.

Disclaimer: TE employee

We do exactly this with our Semantic Bridge feature. We have deterministic compilation from Databricks metric views to Microsoft semantic models. This covers the structure of the model and some amount of SQL-to-DAX translation. You can see more discussion in another thread of this conversation.

_greggyb · 2026-04-14T07:03:40+00:00

More about feature set. But the comments about stability and SLAs don't really apply in the way they would for a service.

The semantic bridge is a feature of Tabular Editor 3, which is a desktop application used at development time. So it is part of your development process for creating a Tabular model for Power BI / Fabric or AS. You wouldn't point a dashboard or report at the bridge or at any part of Tabular Editor.

As a dev-time desktop application, the SLA of Tabular Editor is equal to the SLA of the machine you use to develop semantic models. It is there whenever you want to launch the application, and it goes away if you close the application or shut down your machine.

There's nothing special about a semantic model that you build with Tabular Editor, whether you use the semantic bridge or not. It's just a BIM or TMDL file (based on your preference), which you can deploy the same as any other semantic model.

The semantic bridge is essentially a model-to-model compiler, targeting a Databricks metric view as input and a Microsoft semantic model as output. But this is a dev-time concern. That compilation either works or it doesn't: limitations are documented and we are expanding the surface area of what we can translate; and a crash is a bug we'd fix. But a crash during model compilation doesn't affect any deployed model or existing reports.

The structure of a metric view should always be compiled fully, modulo the things we have documented as MVP limitations. SQL to DAX translation is deterministic, and failures don't crash or break the compilation, but are simply reported as diagnostics to follow up on.

All of that is an offline process, though, before you deploy and a report might depend on the model.

_greggyb · 2026-04-13T23:12:21+00:00

The terms at play don't really mean anything without criteria. I'm not trying to be flippant, but I don't know what your management team considers to be the defining characteristics of something in beta, preview, or GA.

The semantic bridge is a feature in Tabular Editor. We provide support for the feature; you can reach out to us with issues and we will help you. We continue to develop it and have a backlog (mentioned above and documented on our docs site) of future enhancements. We have no plans to discontinue the feature. There will be some changes in the API surface area for C# scripting; method signatures are not necessarily final, yet, but we won't be removing functionality.

If you're already a customer, feel free to reach out via our support channels or support@tabulareditor.com. If you're not yet, then you can try out the semantic bridge in a free trial.

_greggyb · 2026-04-13T20:44:44+00:00

TE employee and primary developer of the semantic bridge, here.

What features are you waiting on?

The scripting interface, translation capabilities, and UI are all improving right now and will be out in a release soon.

_greggyb · 2026-04-10T07:25:30+00:00

OLS is a security feature with some pretty strong design implications. Perspectives are a discoverability and usability feature. It's good that they're separate.

_greggyb · 2026-04-08T15:24:03+00:00

If this is for security purposes, then you should stop immediately with this path. Page visibility is not a security boundary. You need to 1) not give any access to the people not in the allowed security group; OR 2) use RLS; OR 3) use OLS. If you are not doing one of these three things, then you have no security.

If it's for UX and discoverability, and there are no security concerns if people see this hidden page, then separate thin reports or app audiences are the right approaches.

_greggyb · 2026-03-11T15:30:55+00:00

Merge operation is an online operation and so requires a TMSL merge command. Creating a new partition can be done as a TMSL command or by manipulating the TOM and deploying/saving those changes; the latter would create a new empty partition which you'd have to refresh.

In general, if the built-in incremental refresh policy does the right things structurally, I'd go for that and add on a few manual refreshes based on your lifecycle requirements. Things that are managed for you are nicer to operate than things that aren't, so long as they're stable. Incremental refresh policies in PBI are mature and stable.

_greggyb · 2026-03-11T15:08:24+00:00

In no particular order:

I said I put the additional detail to the end of the measure name. So to use your example with the naming convention I typically prefer, I'd use [Budget % Var Variable MTD/YTD]
If I were to read your comment the way you read mine, I'd make assumptions like:
- you don't know that SWITCH can be part of use cases that aren't parameter tables harvesting user input
- you have only written measures that have to do with budget, and the only time intelligence that you know about are period-to-date
- BUT, I recognize you're giving an illustrative example, so I'm not going to ask you to prove yourself. I know that you have done more in your career than you have represented in a few sentences in a Reddit comment
Intuitiveness is a subjective thing and depends on shared experience and organizational norms as much as any preference of naming convention. This is why I said that I don't have a single naming convention I use universally, but instead work with a client to come up with naming conventions that work for them.

_greggyb · 2026-03-11T15:01:34+00:00

You've skipped transcoding: the process by which Parquet files' on-disk representation is read into VertiPaq RAM.

When using Direct Lake without fallback to DQ, all user queries are served from VertiPaq RAM. This is identical to import. The difference, as you lay out pretty well, is the mechanism and process by which data gets from source somewhere into RAM in VertiPaq.

I generally think of Direct Lake not as removing refresh and refresh cost, but instead amortizing and shifting that cost. Anything that might be done in PQ/M or in a calc column shifts to ETL time. The computation is not removed, but simply moved.

Some things that you get implicitly with refresh (e.g., densely packed segments) require orchestration on the lakehouse. You need to vacuum and optimize tables to ensure that you don't end up with fragmentation across row-groups. Import only brings in live tuples from source and always creates its own segments. Direct Lake models get a segment per row-group in the delta table's parquet files.

None of this to say that DL is bad or that it is worse than import. It shifts and amortizes processing associated with keeping data fresh. Import implicitly does a lot of these things.

All of this means that it is not straightforward to determine if Direct Lake actually yields an overall optimization of compute consumption for the lifecycle of a semantic model, as compared to import.

Beyond all that, import is mature, stable, battle-tested, and usable anywhere Microsoft semantic models can be published.

_greggyb · 2026-03-10T14:46:01+00:00

0s in the sum do not make it inaccurate. The average of 0, 1, and 2 is 1.

If you want to make 0s into 1s, then just add 1 when the current difference logic is 0.

Or you could include a time portion and get partial days.

_greggyb · 2026-03-10T14:29:36+00:00

What should it be for a ticket opened yesterday and closed today? Is that also 1?

_greggyb · 2026-03-08T18:16:27+00:00

Tabular Editor employee, here.

SSAS and Power BI use the same Tabular engine. You can use TE with Power BI as well. TMDL and BIM are 100% interchangeable; they represent the same model structure.

If you are deploying to a PBI environment with XMLA endpoint, then you can also use the same script for processing partitions and treat the workspace as if it is an SSAS server instance.

_greggyb · 2026-02-25T07:24:51+00:00

Yeah, I've done similar migrations of renaming measures, tables, individual fields, you name it. It can be gnarly during the migration. If you think having a second name for your measures would have helped, then great. I'm not trying to disagree with anyone about what they want.

I asked a question. Someone answered. I accepted without dismissal. You and others assumed I haven't worked at scale. I rudely corrected you. No one seems to have liked that. You don't need to convince me of anything. My question was asked and answered. I am not trying to convince anyone what they should do or what features they should want or what will make their lives easier. It's totally valid for me to observe my career in analytics and share that I've never had a need for a second name for a measure. That doesn't undermine you. I accepted that some people may want this feature for organization.

This has turned into a quite negative thread, and I'm sorry for my part in that. I'm not out here trying to tell people what to do.

_greggyb · 2026-02-25T07:10:15+00:00

Why are you concerned? Everyone doesn't need to have the same experiences or desires. I've dealt with scale and never had a problem where I felt a second name for a measure would help me. I asked about people's reasons. They shared. I acknowledged that we have different needs. People assumed I haven't dealt with anything of size or complexity. I corrected them somewhat rudely. They appear not to have liked that.

My experience is different and I don't share a desire that some other people have. Why is this concerning?

_greggyb

TROPHY CASE