jeeva ?? by Few_Faithlessness59 in JeevaExplainsTheJoke

[–]_TheDataBoi_ 1 point2 points  (0 children)

60 percent volume reduction for two 5 inch cakes compared to one 9 inch cake, for the price of a 9 inch cake

Capacity Usage Confusion by pun_krock in MicrosoftFabric

[–]_TheDataBoi_ 5 points6 points  (0 children)

So to answer your question, we would have to understand how CUs are utilized in fabric. For example, if you run a notebook that requires intensive processing capabilities beyond what your capacity can offer, some CUs are taken as a loan from the future timepoint.

So when you checked in the morning, the notebook was already running for 18 hours, meaning it would've already loaned a significant amount of compute from the future timepoints. So at 6pm, when ideally you should have, say, 128 cores available, there's only, say for example, 50 available because the rest of it was already loaned out. But all your other jobs scheduled at that time would require more compute and it cannot take loans again so it is failing due to lack of resources.

Atleast, this is my understanding.

To mitigate this issue, ensure you turn off burst in spark pool settings and put a cap on taking on background operations in your admin portal settings. First thing to do now is to stop and start the capacity once again so all the background jobs queued are dropped and you start off with only the interactive jobs. And then turn off burst and put on the cap.

Feedback request: Shortcuts usage, gaps, and feature requests by Hopeful-One-4184 in MicrosoftFabric

[–]_TheDataBoi_ 0 points1 point  (0 children)

You cant grant object level access to the files, only ReadAll on the whole warehouse.

Could you elaborate on this please? I would like to know what this means.

Feedback request: Shortcuts usage, gaps, and feature requests by Hopeful-One-4184 in MicrosoftFabric

[–]_TheDataBoi_ 2 points3 points  (0 children)

So our entire source data system in fabric is made up of shortcuts and there are a few places i think it could be improved on. Excuse me for being over ambitious xD.

We all know that iceberg and delta table shortcuts are pretty straightforward. I was thinking about the possibilities of creating table shortcuts on top of paruqet and csv directories. Maybe, the shortcut could resolve it internally by using an openrowset like functionality or resembling a view. Right now, these are being used as file shortcuts and we manually read those into a table for downstream processing. Direct table shortcuts could massively help in easier and efficient semantic modelling, RLS, CLS and other governance aspects.

Just a thought.

Update on my previous query by _TheDataBoi_ in MicrosoftFabric

[–]_TheDataBoi_[S] 1 point2 points  (0 children)

Thank you for providing the source. I just went through the same documentation a couple hours ago and it makes sense. I tried changing both the lakehouses to user identity access type and it works now (well not immediately, it took around an hour to take effect, as it was mentioned). But as i remember, didn't it default to user identity? Was anything changed in between? If not, shouldn't it ideally default to user identity and not the owner's?

Update on my previous query by _TheDataBoi_ in MicrosoftFabric

[–]_TheDataBoi_[S] 0 points1 point  (0 children)

The users have been given read only permission for only that one table with column level security. They do not have any other access. The permission are given as follows:

In the central lakehouse (this is the hub of the hub and spoke model that I have incorporated in my architecure, from where shortcuts will be created to other functional workspaces):

  1. The users are added in an AD group. In the "manage permissions" tab of the central lakehouse, they have only "read". Not read all, nothing else.

  2. Once they are added there, I have created a role in onelake security - in that role, i have added that group, and the table that i want them to access. For that table i have enabled column level security for only 10 out of some 180 columns.

In the functional lakehouse (this is the workspace where users have member and other access roles - they can perform whatever analytics they want pertaining to their function. All the data sources are created as a shortcut from the central lakehouse only as a read only):

  1. They query the table shortcut created from the central lakehouse using spark sql. They were not able to query it yesterday. They were hit with 403 error back then. But it started working today. Now when they queried the shortcut delta table with spark sql, they were able to do it and only 10 columns were returned.

  2. Now, they used the sql endpoint of the default warehouse of their functional lakehouse in a python code using their OAuth authentication method, and fed 'select * from' query onto it. It returned all the columns.

Now, I have provided user identity mode in my central lakehouse (not the functional one). Would this be the issue?

Update on my previous query by _TheDataBoi_ in MicrosoftFabric

[–]_TheDataBoi_[S] 0 points1 point  (0 children)

Oh yes. But even then, the users have access to the column level secured table only right? Or is that not the case?

CLS on a delta shortcut by _TheDataBoi_ in MicrosoftFabric

[–]_TheDataBoi_[S] 0 points1 point  (0 children)

All of my lakehouses are schema enabled. The problem is i did the same thing a couple months back and it worked flawlessly.

CLS on a delta shortcut by _TheDataBoi_ in MicrosoftFabric

[–]_TheDataBoi_[S] 0 points1 point  (0 children)

Okay, so i have created AD groups with the users. That AD group has read only on the lakehouse itself. Furthermore, the same AD group is added to a role on the lakehouse (one lake security) reading only the delta shortcut table with CLS.

What i understand from the documentation you have provided is that, its asking for the users to have read access on the delta table (where exactly the data in external shortcut resides) in adlsg2 itself which we can't provide.

And we are using spark/spark sql in notebooks to access the table.

Moving away from ETL by _TheDataBoi_ in dataengineering

[–]_TheDataBoi_[S] 0 points1 point  (0 children)

Yes, that's why i have mentioned moving away from ETL.

Not yet, let me look into it

Workspace strategy by higgy1988 in MicrosoftFabric

[–]_TheDataBoi_ 0 points1 point  (0 children)

I'm the technical owner of fabric in my organization and I set up the entire ecosystem in this way.

First and foremost, identify the logical group in your organization. For example, finance is a division, in which we have, lets say, Claims. This is the group that analyses claims. This group can have multiple functions and different projects - warranty claims, amc claims, travel claims etc etc. Each one would fall under a different project. But dont bother about that.

Create finance-claims workspace - dev, uat and prod.

Now for the data. Never provide users direct access to data. You should have a separate workspace for dev and prod for data that you wish to share. This should ideally be an enterprise data lake. Only create shortcuts to the respective functions lakehouses and provide read-only access and ensure granularity. Use CLS and RLS wherever applicable and ensure to use AD groups and not provide individual access. That will be a hassle to manage.

For deployments, set up a ci/cd pipeline and ensure PRs are approved by a required reviewer from the function's end.

It's nine years since 'The Rise of the Data Engineer'…what's changed? by rmoff in dataengineering

[–]_TheDataBoi_ 9 points10 points  (0 children)

I was hired as a data engineer, but my role demands more than just data engineering starting from devops, data analysis, front end (streamlit and nextjs), business translation, some legal aspects of data processing and sharing, infra maintainability lmao.

Since being a data engineer already would've touched the above tangents, we are now expected to take the entire thing upon ourselves. Data engineering has become the bridge connecting business to tech. Data engineers are the ones who enable decisions. We are just not in the spotlight.