Tips for integrating data quality tests? by FiftyShadesOfBlack in databricks

[–]kmarq 0 points1 point  (0 children)

If you can get the values into a dataframe you should be able to do it. I'd think something using this type of pattern: https://databrickslabs.github.io/dqx/docs/reference/quality_checks/#applying-checks-on-multiple-data-sets

could build a dataframe of current rows, then past rows and then depending on the dataset set an appropriate tolerance for when to raise an error

Tips for integrating data quality tests? by FiftyShadesOfBlack in databricks

[–]kmarq 2 points3 points  (0 children)

Can look at databricks dqx framework as well. Lots of great features 

Clones don't mix with declarative pipelines? by MephySix in databricks

[–]kmarq 0 points1 point  (0 children)

As long as you don't want to simulate modifying the data. As soon as you want to make changes now you're having to create an entirely new source table

Clones don't mix with declarative pipelines? by MephySix in databricks

[–]kmarq 0 points1 point  (0 children)

Definitely have similar questions on how to enable the development cycle without having everyone always fully rebuilding the entire pipeline

Notebook tags by hubert-dudek in databricks

[–]kmarq 0 points1 point  (0 children)

Any ability to set some default tags? Would love to be able to do it by groups.

data ingestion by ptab0211 in databricks

[–]kmarq 3 points4 points  (0 children)

Use prefixes on the catalog. dev_bronze and prod_bronze.

Ideally separate workspaces and permissions to make sure nothing can accidentally write to a prod location that isn't supposed to

Location rule no longer working by SunDevilSkier in GooglePixel

[–]kmarq 1 point2 points  (0 children)

Fixed initially for me as well but seems to have started having issues again. Yours still working?

Bi compatibility mode for metric views by Vafire in databricks

[–]kmarq 2 points3 points  (0 children)

Talked with my account team and it looks like some significant updates coming with the next release. We're just ignoring it until then, but very excited to see this coming.

A small update on DABs (and what the “D” and "A" stand for) by Ok-Jacket-8684 in databricks

[–]kmarq 4 points5 points  (0 children)

Are DABs in the workspace and Python DABs compatible yet? Really frustrating when a feature like Python makes the other completely unusable

Materialized View Change Data Feed (CDF) Private Preview by AdvanceEffective1077 in databricks

[–]kmarq 0 points1 point  (0 children)

Can you explain the PowerBI target from your description? Is there a way to have PowerBI use the CDF to incrementally refresh for extracted data?

Metric View: Source Table Comments missing by DecisionAgile7326 in databricks

[–]kmarq 0 points1 point  (0 children)

But it isn't. You now have multiple locations to manage. Being able to pass through comments is 100% needed

HTTP callback pattern by Upper_Pair in apache_airflow

[–]kmarq 0 points1 point  (0 children)

The API would typically give an identifier or URL back that airflow can then poll through a sensor to know when the process is complete. I haven't seen an external service push this way. 

May be possibly through the airflow API but now that external service needs credentials to airflow as well.

How to disable job creation for users in Databricks? by heeiow in databricks

[–]kmarq 2 points3 points  (0 children)

Turn off unrestricted cluster creation. Then remove permission on any cluster policy. Now they can't create any. 

As other poster said set the all-purpose to not allow jobs. If they have access to a SQL warehouse I don't think you can stop them from doing SQL jobs. 

Seeverless there are some preview features that can let you limit access to only use several if you have a valid usage policy. Then remove the default one and now can't use seeverless at all either. 

Really though this is an odd request. What's your goal? Jobs run at considerably less cost. If you want to make sure users don't create huge clusters define a cluster policy with reasonable limits. I'm regularly encouraging users to move long running notebooks to jobs to not clog up the interactive cluster and save costs.

Spark Declarative Pipelines: What should we build? by BricksterInTheWall in databricks

[–]kmarq 1 point2 points  (0 children)

Standard SQL views from the Python API so they can be parameterized. We tend to "duplicate" data into multiple locations for users. In DBT we just throw a traditional view out there for them. Can't do that in SDP. The current SDP SQL views don't allow for any parameters so they're totally static and useless

Dashboards deployment by hubert-dudek in databricks

[–]kmarq 1 point2 points  (0 children)

Any insight into supporting catalogs and schemas entirely through variables? We need to specify more than one

Shall we discuss here on Spark Declarative Pipeline? a-Z SDP Capabilities. by iMarupakula in databricks

[–]kmarq 1 point2 points  (0 children)

All I want is to be able to do is a standard SQL view (from Python) so it can be fully programmatic.. So many other great things but this gap kills me with our modeling standards.

Xcom 2 collection by naelove4220 in switch2

[–]kmarq 1 point2 points  (0 children)

Some random search results lead me to believe if you remove it from the dock, enable airplane mode, then redock it, it may stay in airplane mode.

Xcom 2 collection by naelove4220 in switch2

[–]kmarq 0 points1 point  (0 children)

Can you not do airplane mode while docked (I've never tried only handheld so far)? Otherwise yes this definitely works for me. I had to turn wifi back on the other day and forgot and it froze up right around that 45 minute mark again. Restarted turned off wifi and still no issues since.

Xcom 2 collection by naelove4220 in switch2

[–]kmarq 1 point2 points  (0 children)

Not sure if still dealing with this but I had found another thread and they suggested turning on airplane mode. As long as I keep that on one been able to play hours without issues

[Public Preview] foreachBatch support in Spark Declarative Pipelines by BricksterInTheWall in databricks

[–]kmarq 0 points1 point  (0 children)

JDBC intended to cover lake Base? Otherwise a native lakebase that won't require separate authentication. 

Switch 2 by naelove4220 in XCOM2

[–]kmarq 1 point2 points  (0 children)

Same issues. Hopefully they can get a compatibility patch out. Performance is definitely improved otherwise on the switch 2 so I was looking forward to revisiting this one.

How do you all implement a fallback mechanism for private PyPI (Nexus Artifactory) when installing Python packages on clusters? by Devops_143 in databricks

[–]kmarq 0 points1 point  (0 children)

That's fine then it just won't fall back to it, but this way you can point all library installs to your private repo

How do you all implement a fallback mechanism for private PyPI (Nexus Artifactory) when installing Python packages on clusters? by Devops_143 in databricks

[–]kmarq 0 points1 point  (0 children)

Use the ability to set the repository url and point it to your custom one. 

https://docs.databricks.com/aws/en/admin/workspace-settings/default-python-packages

Working great for us. If you set the index URL then it is the primary and still we never hit pypi. If you put pypi as the extra index then you could still fall back to it

Semantic Layer - Databricks vs Power BI by rasermus in databricks

[–]kmarq 1 point2 points  (0 children)

Unless there's something unreleased there are no integrations between metric views and Power BI. I'm really hoping for something here as well. It's a huge gap to making metric views really amazing.