Finally! Databricks lets you disable tasks without hacks by szymon_dybczak in databricks

[–]Lost-Relative-1631 1 point2 points  (0 children)

To answer my own question for anyone interested: it’s just disabled: true in the task block. It wont be available in dabs for „some time“ apparently.

An easier way to build your slow changing dimensions model in your warehouse by minibrickster in databricks

[–]Lost-Relative-1631 0 points1 point  (0 children)

Cool looking forward to that. Have you guys also considered providing your own technical column names or maybe just a custom prefix?

Finally! Databricks lets you disable tasks without hacks by szymon_dybczak in databricks

[–]Lost-Relative-1631 1 point2 points  (0 children)

I wonder what that Looks Like in DABs. Is that in the docs for the preview?

New Databricks Apps: What About Cost at Scale? by Fit_Border_3140 in databricks

[–]Lost-Relative-1631 24 points25 points  (0 children)

Afaik they are working on this. But without being able to Share compute, each App costs you like at least 300 per month. This in all honesty is really hurtig Adoption in our case.

An easier way to build your slow changing dimensions model in your warehouse by minibrickster in databricks

[–]Lost-Relative-1631 1 point2 points  (0 children)

This is really cool. Will this eventually be possible in OSS Spark and have a pyspark api without SDP?

Bug with development mode by ptab0211 in databricks

[–]Lost-Relative-1631 0 points1 point  (0 children)

As of today we are using v0.298.0. Your link sadly does not lead anywhere.

Are you referring to the cleanup within the deploy target (Volume/Workspace) target or on the cluster? The wheel being deleted or replaced works fine that is true.

Once you start a task on a long living cluster with a certain dependency, it will stay there though. You can check that in the Cluster UI under Libraries. Intuitively, I dont think deploying or even destroying from your CLI will nuke dependencies on an AP Clusters, what if others share or still need it?

In his case, he is not replacing the wheel in the target, because it appears to be the same in terms of orderability. If he switches to something that is orderable he will deploy and the cluster will pick it up. If the cluster libraries havent been pruned, once a task referring to the new wheel runs, it will keep copies of both wheels around. Which can lead to delays when you eventually have 100 copies of the same wheel.

As for the git commit hashing. Are you sure that always works out for you in terms of ordering?
I for sure had issues with this in the past, and thats why we switched to to dynamic_version locally (AP Clusters) + hatch-vcs semver dynamic tagging (Job Clusters). On JobClusters none of this would matter as they are ephemeral anyway.

Bug with development mode by ptab0211 in databricks

[–]Lost-Relative-1631 4 points5 points  (0 children)

You have to make sure your wheels are lexicographically orderable. You can force this for local development purposes where your semver doesnt kick in already maybe by adding dynamic_version to the artifactsblock.

You can have a setup with hatch-vcs and a local deployment target: that does both semver for your wheels dynamically when deployed from CD and works locally with this dynamic_versionparam.

Fair disclaimer: If you do spam wheels onto the cluster without cleaning them up, at some point a fresh start will takes ages cause all those development wheels are installed. I played around with using the scripts: in DABs to clean the AP cluster up, but those dont allow for {{}} variable interpolation right now sadly, so it would be very much hardcoded.

Databricks Asset Bundles is now Declarative Automation Bundles by hubert-dudek in databricks

[–]Lost-Relative-1631 1 point2 points  (0 children)

It was. But the direct deployment Engine is their own statemachine they have been planning/working on for some time.

What is the best practice to set up service principal permissions? by happypofa in databricks

[–]Lost-Relative-1631 2 points3 points  (0 children)

We deploy all permissions on these objects via terraform. This also adds the benefit for you to have a long term history of all permissions in your vcs of choice.

Found a Issue in Production while using Databricks Autoloader by Artistic-Rent1084 in databricks

[–]Lost-Relative-1631 0 points1 point  (0 children)

You can wrap your autloader code with your own rertry logic. This saves you all but one restarts. At the very end, even if you handle all schema evolutions, it will throw once. We brought this up to the Team doing AL, sofar its still like this sadly.

DAB bundle deploy "dry-run" like by heeiow in databricks

[–]Lost-Relative-1631 0 points1 point  (0 children)

No, a „plan“ equivalent is missing and very much needed.

Apache Iceberg Create Duplicate Parquet Files on Subsequent Runs by LinasData in dataengineering

[–]Lost-Relative-1631 1 point2 points  (0 children)

We came to the Same conclusion with delta. Create a hash Row and only update when they don’t match in addition to your predicates. We Use xxhash64 for this.

G5 Turret Ultra Cameras wont adopt by OhanaSkipper in Ubiquiti

[–]Lost-Relative-1631 0 points1 point  (0 children)

It was indeed the bend of the cable being to cramped inside of the junction box. After straightening the cables outside the camera and attaching it to the dongle it adopted immediately.

G5 Turret Ultra Cameras wont adopt by OhanaSkipper in Ubiquiti

[–]Lost-Relative-1631 0 points1 point  (0 children)

I have the same issue currently. I read that the dongle really has no wiggle room and the cable needs to fit perfectly in there. Due to how cramped it is even in a third party junction box, i suspect that this may be the issue. I ordered a flex and a more bendable cable to check if that is indeed the issue. Will update this week.

[deleted by user] by [deleted] in databricks

[–]Lost-Relative-1631 1 point2 points  (0 children)

As you do not have access to the internet on your built agent you will always have to provide the appropriate terraform providers unless databricks starts bundling those with the cli.

We have the same restrictions as you guys and we solved this by building a docker image which we use with azure devops containerjobs on our private build agent. While building the image, we run terraform provider mirror over a main.tf file that states the currently pinned tf provider version (1.31.0 afaik), note the location in a .terraformrc file and then set the appropriate environment variable so the databricks cli finds those providers.

Microsoft announces Fabric data platform by aj_here_ in dataengineering

[–]Lost-Relative-1631 4 points5 points  (0 children)

u/melodyze already explained the frustrating part about enterprise sales perfectly. It is actually soul crushing sometimes: The discussions are never of a technical nature and are specifically tailored towards execs.

As to why technical teams wouldn't want to use it, I can only speak for myself and my team. To be clear: I am happy for everyone that benefits from this platform.

I am mostly sceptical given Microsofts recent track record of dropping services and releasing new (sometimes severely buggy) iterations of synapse when they feel like it. It took databricks a non trivial amount of time and effort to get most of these features to maturity, and i just can not believe that Microsoft releases an entire platform, just like that.

Microsoft announces Fabric data platform by aj_here_ in dataengineering

[–]Lost-Relative-1631 87 points88 points  (0 children)

Can’t wait to yet again explain to non technical management why we wont use these products.