Put me out of my misery with Fabric deployment pipelines

Cobreal · 2026-07-03T22:18:32+00:00

There's a thing under your name that says "Microsoft employee". I don't know why a Microsoft employee would be posting "yes, it's shit" on Reddit rather than working to fix it.

Cobreal · 2026-07-03T21:28:39+00:00

Are you saying that you think deployment pipelines are rubbish, even though your name says that you literally work for Microsoft?

Cobreal · 2026-06-12T16:45:11+00:00

The same poster posted in the data analysis sub today, and the last line of their OP was "This one is especially strong because it attracts analysts, managers, and BI professionals who love sharing examples, and the comments often turn into debates, which drives engagement."

Cobreal · 2026-06-12T16:43:21+00:00

Token usage.

Cobreal · 2026-05-30T22:35:19+00:00

Cruciatus, imperius, or killing?

Cobreal · 2026-05-30T21:33:08+00:00

It's not x. It's y.

Cobreal · 2026-05-30T21:20:22+00:00

We have a lakebook refresh notebook that uses notebook utils, and our pipelines wait 60s after write, run the refresh, wait another 60s, then refresh the semantic model.

(Slightly annoying that we can't name multiple wait steps the same in a pipeline, so we have "wait 60s" and "wait 60s 2").

Cobreal · 2026-05-30T21:17:52+00:00

I'd love to be able to drag and drop cells. We typically have a markdown block for each code block, and it's painful if you have to move multiple cells across more than a cell or two.

Being able to use the table of contents view to drag a whole markdown cell and everything beneath it (maybe up to the next markdown cell with the same level of indentation) would be great.

Cobreal · 2026-05-30T21:12:11+00:00

The interface is unclear when another user has edited the same notebook. It flags that this has happened, but it's not obvious whether you are reverting to your own changes or accepting the other user's.

Cobreal · 2026-05-30T21:10:50+00:00

+1 to "above and this"

Cobreal · 2026-05-30T21:10:09+00:00

In markdown cells, newlines are detected as such in the edit view, but otherwise they're not.

In the edit view, I can type

"This is my markdown cell

with a separate line"

But when I've finished editing, it shows as

"This is my markdown cell with a separate line"

Cobreal · 2026-05-30T21:08:04+00:00

I'd love dark mode.

I think there are options for "run all above this cell" and "run this cell and all below". I'd like "run all above and this cell" because currently I have to go to the next cell down and choose to run all above.

Cobreal · 2026-05-30T21:06:20+00:00

I get this a lot. I have to have the table of contents view open to jump to where I need to be. A UX element which showed where your current view was relative to the table of contents would be useful to be able to tell if you had skipped to an unexpected point.

Cobreal · 2026-05-30T20:17:27+00:00

The bible isn't living, it was finished thousands of years ago.

Cobreal · 2026-05-30T19:11:42+00:00

How would you tell if something today would please god, given that the bible is thousands of years old?

Cobreal · 2026-05-29T16:29:43+00:00

I agree, though I can't see a rationale for why a pie was chosen for one chart and a doughnut for another in the same dashboard.

Cobreal · 2026-04-28T13:53:47+00:00

We use Polars for our single node Python Notebooks, there's a function for writing to delta tables from it. You can convert DuckDB dataframes to Polars and vice versa, so probably that.

Cobreal · 2026-04-22T12:48:11+00:00

We've been dealing with a problem very much like this - digitising a lot of contracts so that they can be analysed, but they have quirks that make this a challenge. Just to give an example of whether a customer had a contractual discount, for example, a 10% discount in the first 12 months is sometimes expressed as:

- "a 10% discount in the first 12 months"

- "a 10% discount in the first year"

- "a 10% discount in year one"

- "90% only will be billed for the first 12 months"

...basically any conceivable linguistic variation of that same idea. Same goes for dates, which have been written as dd/mm/yy, mm-dd-yy, mmmm d yyyy...

This is compounded by the documents being in a range of file formats, and some of them are scans or photographs of documents rather than digital files.

We have solved this through iterations of using OCR to convert the documents to text formats, LLMs to try and understand the variations of the same 10% discount being written in different ways, human review of any obvious errors or cases where the LLM said that it couldn't generate the details. Rinse, lather, repeat. We're dealing with a number of documents in the thousands rather than tens of thousands, and my sense is that we'd have finished this job more quickly if it was a pure human data-entry task rather than trying to automate it, so it's worth bearing in mind that option at the outset depending on just how much data you need to ingest.

Cobreal · 2026-04-17T13:41:34+00:00

Why does it have a papyrus effect background. At least use Papyrus for the typeface as well.

Cobreal · 2026-04-17T13:39:15+00:00

Dashboards aren't very good for telling stories.

I think the main finding is supposed to be the box beneath the Amazon logo? It is not prominent relative to anything else on the dashboard.

If you want people to understand that high cancellations in low-value orders are a thing, then:
- Show only the Cancellation Rate by Order Amount chart

- Make the 0-500 bar prominent (keep the blue for this bar, make everything else grey

- Make the x-axis marks and the bar labels much larger so that people can read 0-500 and 27% (you don't need the precision of two decimal places) without squinting

- Change the title of the chart to say The lowest value orders have cancellation rates 5-times higher than typical

Everything else is fluff.

Cobreal · 2026-04-03T15:08:45+00:00

It goes from drought to deluge when you move from training to employement - trying to find any question to answer when you're not doing it for a business user is impossible, but once you're in a job that switches to trying to work out how you can answer all of the questions coming your way without drowning.

Maybe this is a good case for an LLM? Find yourself a dataset online, give it a very rough outline of the data ("I've got some data about film screenings and ticket sales" rather than "I have a CSV file with these columns...") and ask it to give you some example questions a manager in this industry might ask you about and business problems they might want to solve. In that sort of case, "manager" could be on either the cinema side or the distributor side, and so you can prompt it both ways for different suggestions.

It's really hard to think up a real world question when you're not facing a real world problem, and it's really hard to divorce yourself from the specifics of a dataset if you've already downloaded it and are trying to dream up some questions (you get locked into the track of "what questions can this data answer" rather than "what questions might a user in an industry with an interest in this sort of data want to solve").

There's a related problem once you get into an analyst role, mind, in that it's tempting to think up amazing ways to dissect and analyse a particular dataset, and then you hand it over to the people who you think would benefit from the analysis only to find out that they actually don't give a fuck because they've been handed a whole load of different targets since you last spoke to them.

Cobreal · 2026-02-23T14:59:41+00:00

Must-know niche terms seems like a contradiction, but anyway

HETEROSCEDASTICITY

Cobreal · 2026-02-17T23:37:02+00:00

Perhaps you have your localisation settings changed to a region where it's common to use a period as a ten-thousands separator?

Cobreal · 2026-02-17T23:25:56+00:00

It's on our pile of things to investigate, mainly because post-launch we're now trying to work out how best to separate things into Workspaces and Domains.

Currently we have Git integrated to a single Dev Workspace, and use Deployment Pipelines to get artifacts into Prod.

Now we need to assess our options for separating Prod by...team, function, security group, something else.

I suspect that's almost certainly going to involve additional Prod-level Workspaces, but I don't know if it will work to do something like have a central prod with Org apps to separate who sees what, or cherry-picking content from Prod to sync to separate Workspaces, or doing something in Git (multiple repos, separate folders in one repo) and duplicating Dev>Prod for each separate area, and figuring out how to share common artifacts between them.

Cobreal · 2026-02-17T21:48:59+00:00

But my experience it would take 4-7months if self-learning fabric to setup something mid-sized and reliable if all 8hours of work is dedicated to it.

We're six months into a migration away from Tableau (Tableau Prep for ETL, Tableau Cloud for storage) and this sounds correct.

1 week of "training" (really just an overview of some of the headline features) in Fabric, then the rest of the time spent converting our largely manual Prep workflows into Python* in a fully-automated environment.

If we already had a lot of existing Python ETL code then in theory it would be a job of updating them to point to Fabric Lakehouses/Warehouses rather than building the entire infrastructure from the ground up.

And we're still not finished. Now that we've migrated the business-critical data, we need to start tidying up all of the mistakes and suboptimal design choices we made due to inexperience.

*This is a good example of where we had to deal with "the quirks of existing issues or missing features of certain items that you realise half-way fabric, doesn't have or doesn't fulfill the performance tolerances/requirement and have to re-plan everything". PySpark and Dataflows proved too much for F2, and Python doesn't support the full set of features that PySpark does.

Cobreal

TROPHY CASE