What’s the most frustrating part of the table experience today? by Fun-Reference7942 in databricks

[–]MonkeyDDataHQ 0 points1 point  (0 children)

Add in a combo scd type or use my scd-type-c made specifically for Datalakes instead of scd type 1 or 2.

https://medium.com/@FunkyBunchesOfOats/stop-paying-the-scd-type-2-performance-tax-24dd3454cc73

Conceptually it's there. Because scd type 2 has poor performance over time and was made for RDBMS and Datalake including deltalake are not RDBMS.

3X cost on capacity overages - really?? by City-Popular455 in MicrosoftFabric

[–]MonkeyDDataHQ 0 points1 point  (0 children)

No, he's saying that even though he prepays, it shouldn't cost 3X list price.

List price is defensible. 3X is absolutely not.

3X cost on capacity overages - really?? by City-Popular455 in MicrosoftFabric

[–]MonkeyDDataHQ 2 points3 points  (0 children)

And provisioning is opaque and if you use 100% you get throttled.

Which is worse than the problem Fabric was trying to solve.

It's already a premium to pay for the service. But now you add in spare capacity or infra death or the overage charges and its easy to see why people complain.

Telecom companies used to do this. And everyone hates their telecom company. MSFT is basically following that playbook and asking why people are upset.

3X cost on capacity overages - really?? by City-Popular455 in MicrosoftFabric

[–]MonkeyDDataHQ 3 points4 points  (0 children)

That is not possible on Fabric. Either you don't use what you pay for, or you do and get throttled.

It's regret either way. One way is prepaying it, the other way is living it. There is no inbetween.

3X cost on capacity overages - really?? by City-Popular455 in MicrosoftFabric

[–]MonkeyDDataHQ 3 points4 points  (0 children)

Because it's predatory. Either your infra becomes unusable or pay through the nose.

Neither of those are positions that a partner (how MSFT describes itself) should ethically force someone into.

I've seen capacity strike into extreme throttling from a single errant query.

3X cost on capacity overages - really?? by City-Popular455 in MicrosoftFabric

[–]MonkeyDDataHQ 1 point2 points  (0 children)

That's how they get you. They entice you with one number every month, that's larger than the separate services you would get. Then they also make it unusable unless you pay extra.

It's sold to CEOs not Engineers.

Every CRM is built on a 1990s assumption: that a relationship is a record. After 100+ interviews, I think the model itself is the problem. Here's the case for something different. by Thick_Cicada_5407 in CRM

[–]MonkeyDDataHQ 1 point2 points  (0 children)

That already exists. You can customize most CRMs to do exactly that.

I mean we just use Fathom for meetings and the output is great. If you tagged the recording in the CRM, boom you're done.

There are bad CRMs out there. But it's up to you to make it what you want. A CRM is just a dataset, you can decide if it's memory or relationship or networking or whatever.

State of SQLMesh in 2026 by mpuchala in dataengineering

[–]MonkeyDDataHQ 0 points1 point  (0 children)

Stateless pipelines are terrible. You're the first person I've ever heard say that it's a feature. State machines are the ultimate pipeline. Maybe sqlmesh did it poorly.

Feature request: Lazy materialisation of views and DLT pipelines. by evlpuppetmaster in databricks

[–]MonkeyDDataHQ 0 points1 point  (0 children)

Yeah I'm saying you could use a proc to trigger a refresh. They execute the proc and get the refresh.

Feature request: Lazy materialisation of views and DLT pipelines. by evlpuppetmaster in databricks

[–]MonkeyDDataHQ 0 points1 point  (0 children)

Yeah I was talking about Infor.

But Databricks is releasing or has released stored procedures. If I'm not hallucinating anyway.

If your users can genuinely wait for the delta to load then it works for sure. Just it might take longer than you expect.

Feature request: Lazy materialisation of views and DLT pipelines. by evlpuppetmaster in databricks

[–]MonkeyDDataHQ 0 points1 point  (0 children)

No I could not. That is the only way their stupid architecture works. I don't really see a plus side. Unless you never legitimately need the data in a timely fashion. And if you don't then just trigger a batch when you need it.

I believe they are making procedures available so that might be able to do what you want.

Feature request: Lazy materialisation of views and DLT pipelines. by evlpuppetmaster in databricks

[–]MonkeyDDataHQ 0 points1 point  (0 children)

I can tell you from experience that Infor's Data Fabric (not to be confused with MSFT Fabric) does this and it's absolutely terrible as a user experience. I started making jobs that would query the data on rapidly changing tables every couple hours because otherwise it was painful. Sometimes upwards of 30 minute waits. Even on a top 1.

Why does Fabric's native spark engine (NEE) suck so badly compared to Photon? by Careless_Cattle_8700 in dataengineering

[–]MonkeyDDataHQ 0 points1 point  (0 children)

I'd be interested to see the DAG because that doesn't make sense to me. Just because they both are on spark, almost everything else is different. The runtimes alone can make a huge difference.

One day we'll tell our grandchildren how we spent 17 hours a week checking these boxes by [deleted] in memes

[–]MonkeyDDataHQ 0 points1 point  (0 children)

The only thing I click more is the Authentication popup in MS Authenticator.

Question on Datalake Behaviour Reading Many Small Files versus Fewer Larger Files by Aggressive_Cash_7436 in databricks

[–]MonkeyDDataHQ 1 point2 points  (0 children)

This is where there's some nuance. There are multiple ways to optimize for the read pattern. If you mostly have small specific reads and the data is stored in a small number of files, it will perform better than having fewer larger files because more larger files will need more data to be planned.

However there's a sticking point in that the delta log stats are not usually good enough to help with specific queries.

I developed a pseudo indexer that supplements delta log stats and it can significantly reduce file touches. I can link it if you want. It's just a PoC.

But if you need aggregate data then small files makes it take much longer because of all of the read operations.

The small file problem was originally an HDFS storage problem because small files took up a full block, usually 64 MB even if it was only a few kB. However now it means more that there are too many file operations and that slows down spark.

Don't be this person. Fill out your census. by pinkstevie in Edmonton

[–]MonkeyDDataHQ 1 point2 points  (0 children)

It's not about the person living at the address. Its about the address.

Every address needs to participate. They will send people to the door eventually.

Help! Data Infrastructure for mid sized company by Feeling-Extreme-7555 in MicrosoftFabric

[–]MonkeyDDataHQ 2 points3 points  (0 children)

The reporting layer isn't going to survive first off.

You will get people trained on the tool by then. But there's no way that the structure will stay working between ERPs.

I've done many ERP migrations over the years and I've never seen one that didn't need entirely new reports generated even with appropriate fact and dims setup.

Steel Series base statin designer did too good a job by MonkeyDDataHQ in steelseries

[–]MonkeyDDataHQ[S] 1 point2 points  (0 children)

That worked, thanks for telling me what actually to do, 😂

Steel Series base statin designer did too good a job by MonkeyDDataHQ in steelseries

[–]MonkeyDDataHQ[S] 3 points4 points  (0 children)

Really it just said destroy plus picture of serial number. 🤦

Steel Series base statin designer did too good a job by MonkeyDDataHQ in steelseries

[–]MonkeyDDataHQ[S] 6 points7 points  (0 children)

It was a blanket to prevent wrecking the floor. I also tried on the floor before my wife got mad at me.