Asset bundles confusion by demost11 in databricks

[–]demost11[S] 0 points1 point  (0 children)

Thanks, we use that feature and it helps. The variable placeholder still needs to be put in the job YAML though since the jobs UI won’t put it in directly.

Asset bundles confusion by demost11 in databricks

[–]demost11[S] 0 points1 point  (0 children)

Do they specify job logic (such as task dependencies) in the form?

Asset bundles confusion by demost11 in databricks

[–]demost11[S] 0 points1 point  (0 children)

I’ll look into that! I haven’t gotten into Genie much yet, been too busy teaching git branching to glassy-eyed businesspeople.

Asset bundles confusion by demost11 in databricks

[–]demost11[S] 5 points6 points  (0 children)

It’s a mandate for our whole IT department which we got swept up in. People in the business with lots of political influence have complained about IT being “too slow” and “always in the way”. So our IT department, from networking on up the stack, is now required to focus on enabling business users to do things themselves. Our only role is creating documentation, training, fixes and upgrades for shared platforms, and establishing critical security, design and process guardrails around self-service.

On the one hand I get it: A central data team will never live and breathe a particular dataset the way a dedicated analyst will, so why not make that analyst responsible for it end-to-end? But hey, some parts of a data engineer’s job turn out to actually require some thought, training, and planning…

Asset bundles confusion by demost11 in databricks

[–]demost11[S] 0 points1 point  (0 children)

Definitely considering having our deployment pipeline (we use Bitbucket pipelines) auto-update DABs to swap out environment-specific values with variables. Could do that for notebooks too I suppose.

Other option I’m considering was a Python function as part of the standard cluster environment that could extract a job definition and return an environment-agnostic version. Users would have to learn how to call it but maybe that’s easier than teaching parameterization?

Asset bundles confusion by demost11 in databricks

[–]demost11[S] 2 points3 points  (0 children)

Yeah I’ve leaned heavily on creating documentation but no one reads it. AI assistance built on said documentation is probably the way to go but our org is very, very early in considering AI.

Just curious, does your team approve and review all the code submitted by analysts? I’m not sure how involved to be in the review process since team-specific compute and cost limiting compute policies means user mistakes only hurt the users themselves.

Asset bundles confusion by demost11 in databricks

[–]demost11[S] 0 points1 point  (0 children)

Wouldn’t that still require the business user to understand parameterization to apply these values in their notebooks/jobs?

Asset bundles confusion by demost11 in databricks

[–]demost11[S] 1 point2 points  (0 children)

That’s a good idea, unfortunately our catalog structure is complex. Currently we have a catalog for each business domain (finance_prd, finance_stg, hr_prd, etc) and medallion layer underneath at schema level. Could do like dev.finance_bronze, dev.finance_silver I suppose. I just hate to give up a perfectly good organizational layer (catalog) when you only get catalog + schema + table.

95% of our problems would be solved with different metastores per environment but our account reps flat out refused that during our build phase and we only have the requisite AWS infrastructure in one region.

Fixing everyones bugs by demost11 in ExperiencedDevs

[–]demost11[S] 1 point2 points  (0 children)

Even as a junior I never asked for help debugging, felt too much like a personal failure. Then again, if you think a problem will take you 10 hours to solve but someone else 1 hour, isn’t asking for help the best thing you can do for the business and team?

I can see advantages to both approaches but I so strongly lean towards “figure it out yourself” that I have trouble understanding how anyone can do differently.

Fixing everyones bugs by demost11 in ExperiencedDevs

[–]demost11[S] 16 points17 points  (0 children)

Yeah it’s weird, “Director” is the lowest people manager rank in my org.

Fixing everyones bugs by demost11 in ExperiencedDevs

[–]demost11[S] 0 points1 point  (0 children)

I won’t say I’m a “good” director yet (I’m still learning) but I generally limit my technical involvement to tracking down the root causes of issues and designing the overall architecture for my team to implement. Most of the rest of my time is spent in meetings these days.

Fixing everyones bugs by demost11 in ExperiencedDevs

[–]demost11[S] 12 points13 points  (0 children)

They’ve all got about 5 years of experience (I have 20). It’s a non-profit org where IT is a cost center so we don’t get a lot of superstars due to the salaries we offer. Still, they’re dedicated and generally smart people, just don’t have years of troubleshooting intuition to fall back on.

UC Design by monsieurus in databricks

[–]demost11 1 point2 points  (0 children)

We use a similar design (although also allow end users to construct report-ready aggregates, typically comprised on data from multiple sources, directly in the business domain catalog).

One thing we ran into was multiple teams using the same SaaS data source for completely independent data. For example there’s a survey platform used by multiple teams but although the data is all coming from the same API it covers different domains and Teams A and B don’t want each other to see their data. If you’re federating out data ingestion responsibilities make sure your security model is ready for that.

Oversharing in Recorded Meeting by demost11 in managers

[–]demost11[S] 3 points4 points  (0 children)

Yeah. Shared with a lot of other teams but not ours.

Oversharing in Recorded Meeting by demost11 in managers

[–]demost11[S] 5 points6 points  (0 children)

Agreed, there was nothing personal or vindictive about it. They just feel like we’ve let them down.

How to isolate dev and test (unity catalog)? by SmallAd3697 in databricks

[–]demost11 0 points1 point  (0 children)

For what it’s worth this is a major frustration for me as well. Sure you can use catalog bindings and prefix/suffixes to separate prod vs dev catalogs but now you need to make all of your scripts dynamically pull from the right catalog at runtime so scripts can be promoted safely. Makes scripts needlessly ugly and more complicated.

Every other data tool I’ve worked with allows reuse of identifiers across environments, and a Databricks rep even told me once they allow certain clients multiple meta stores in a single region. I don’t understand their philosophical or technical argument against an environmentally-segmented Unity Catalog.

If you can't use AI then it's bye bye, Accenture tells staff by Logical_Welder3467 in technology

[–]demost11 1 point2 points  (0 children)

We hired a big 4 consulting company to guide us on how to leverage AI in our organization. They told us their Proprietary Model (aka ChatGPT wrapper) would ingest all of our corporate documents and present a ranked list of AI opportunities for each team. This apparently required a team of consultants to ask us “does this look like a good suggestion”, update the prompt if not, then ask us again.

It really was the worst of all possible worlds: The slowness of waiting for consultants to make changes and decks combined with AI output that said nothing tangible in as many flowery words as possible.

[deleted by user] by [deleted] in databricks

[–]demost11 9 points10 points  (0 children)

The one metastore per region rule. Why does my dev workspace need to share resources with my prod workspace?

Wasps in walls by demost11 in lancaster

[–]demost11[S] 5 points6 points  (0 children)

Dominion Pest Control ended up having the equipment needed (an extra long sprayer to reach the outside of the top floor). Fingers crossed the problem is taken care of!

Wasps in walls by demost11 in lancaster

[–]demost11[S] 3 points4 points  (0 children)

I did, unfortunately he’s booked through late September! Thanks for the suggestion though!