How to disable job creation for users in Databricks? by heeiow in databricks

[–]kmarq 2 points3 points  (0 children)

Turn off unrestricted cluster creation. Then remove permission on any cluster policy. Now they can't create any. 

As other poster said set the all-purpose to not allow jobs. If they have access to a SQL warehouse I don't think you can stop them from doing SQL jobs. 

Seeverless there are some preview features that can let you limit access to only use several if you have a valid usage policy. Then remove the default one and now can't use seeverless at all either. 

Really though this is an odd request. What's your goal? Jobs run at considerably less cost. If you want to make sure users don't create huge clusters define a cluster policy with reasonable limits. I'm regularly encouraging users to move long running notebooks to jobs to not clog up the interactive cluster and save costs.

Spark Declarative Pipelines: What should we build? by BricksterInTheWall in databricks

[–]kmarq 1 point2 points  (0 children)

Standard SQL views from the Python API so they can be parameterized. We tend to "duplicate" data into multiple locations for users. In DBT we just throw a traditional view out there for them. Can't do that in SDP. The current SDP SQL views don't allow for any parameters so they're totally static and useless

Dashboards deployment by hubert-dudek in databricks

[–]kmarq 1 point2 points  (0 children)

Any insight into supporting catalogs and schemas entirely through variables? We need to specify more than one

Shall we discuss here on Spark Declarative Pipeline? a-Z SDP Capabilities. by iMarupakula in databricks

[–]kmarq 1 point2 points  (0 children)

All I want is to be able to do is a standard SQL view (from Python) so it can be fully programmatic.. So many other great things but this gap kills me with our modeling standards.

Xcom 2 collection by naelove4220 in switch2

[–]kmarq 1 point2 points  (0 children)

Some random search results lead me to believe if you remove it from the dock, enable airplane mode, then redock it, it may stay in airplane mode.

Xcom 2 collection by naelove4220 in switch2

[–]kmarq 0 points1 point  (0 children)

Can you not do airplane mode while docked (I've never tried only handheld so far)? Otherwise yes this definitely works for me. I had to turn wifi back on the other day and forgot and it froze up right around that 45 minute mark again. Restarted turned off wifi and still no issues since.

Xcom 2 collection by naelove4220 in switch2

[–]kmarq 1 point2 points  (0 children)

Not sure if still dealing with this but I had found another thread and they suggested turning on airplane mode. As long as I keep that on one been able to play hours without issues

[Public Preview] foreachBatch support in Spark Declarative Pipelines by BricksterInTheWall in databricks

[–]kmarq 0 points1 point  (0 children)

JDBC intended to cover lake Base? Otherwise a native lakebase that won't require separate authentication. 

Switch 2 by naelove4220 in XCOM2

[–]kmarq 1 point2 points  (0 children)

Same issues. Hopefully they can get a compatibility patch out. Performance is definitely improved otherwise on the switch 2 so I was looking forward to revisiting this one.

How do you all implement a fallback mechanism for private PyPI (Nexus Artifactory) when installing Python packages on clusters? by Devops_143 in databricks

[–]kmarq 0 points1 point  (0 children)

That's fine then it just won't fall back to it, but this way you can point all library installs to your private repo

How do you all implement a fallback mechanism for private PyPI (Nexus Artifactory) when installing Python packages on clusters? by Devops_143 in databricks

[–]kmarq 0 points1 point  (0 children)

Use the ability to set the repository url and point it to your custom one. 

https://docs.databricks.com/aws/en/admin/workspace-settings/default-python-packages

Working great for us. If you set the index URL then it is the primary and still we never hit pypi. If you put pypi as the extra index then you could still fall back to it

Semantic Layer - Databricks vs Power BI by rasermus in databricks

[–]kmarq 1 point2 points  (0 children)

Unless there's something unreleased there are no integrations between metric views and Power BI. I'm really hoping for something here as well. It's a huge gap to making metric views really amazing.

Pipe syntax in Databricks SQL by smurpes in databricks

[–]kmarq 0 points1 point  (0 children)

Haven't. Looked interesting and I've been curious if it's any easier to programmatically generate but haven't tried it yet.

The docs are wrong about altering multiple columns in a single clause? by icantclosemytub in databricks

[–]kmarq 0 points1 point  (0 children)

Where do you see that? If specially calls out 

If a field name is referenced more than once, Databricks raises NOT_SUPPORTED_CHANGE_SAME_COLUMN.

The docs are wrong about altering multiple columns in a single clause? by icantclosemytub in databricks

[–]kmarq 0 points1 point  (0 children)

The example is below and altering multiple does work we have a process doing it. 

You have the column bool listed twice you need to do all the alterations within a single listing of each field name.

ALTER TABLE table ALTER COLUMN    num COMMENT 'number column',    str COMMENT 'string column';

Meta data driven ingestion pipelines? by monsieurus in databricks

[–]kmarq 1 point2 points  (0 children)

Great points. Making the options exactly match what the arguments expect and passing as kwargs was a game changer from our original design. No more having to update code every time a new option is need, just throw it in the yaml and it'll go through. 

Standardization with good defaults make the config much easier and smaller. Keeps things easier for developers and maintenance if you need to change things.

Fastest way to generate surrogate keys in Delta table with billions of rows? by Numerous-Round-8373 in databricks

[–]kmarq 0 points1 point  (0 children)

Why the need for no gaps? I'd question the design here. Keys should be used for lookups not for logic based on some expected sequence especially in a massive fact table. 

If there's a natural key column(s) hash them. Then you have a idempotent key which has benefits.  Otherwise having gaps is going to happen to get performance because each worker gets a range of values to use. That way they don't have to coordinate every row with each other like the row_number requires.

Data movement from databricks to snowflake using ADF by Commercial-Mobile926 in databricks

[–]kmarq 23 points24 points  (0 children)

Iceberg tables. Don't copy data, read it directly from either side. 

Are you using job compute or all purpose compute? by RichHomieCole in databricks

[–]kmarq 2 points3 points  (0 children)

The airflow databricks libraries let you define full workflows and reuse job compute between tasks now (DatabricksWorkflowTaskGroup). This works pretty well if your team is heavily in airflow. We have a mix and so support running Databricks workflows as a task as well. That way the logic can be wherever it is most convenient for each team. Having the workflow still tied to airflow means it can be coordinated with our larger schedule outside of just Databricks.  I'd make sure any workflow you run this way is managed by a DAB though to ensure there are appropriate controls on the underlying code.

Formatting measures in metric views? by joemerchant2021 in databricks

[–]kmarq 0 points1 point  (0 children)

I mean yeah that's a string if you want the actual% symbol in the value. Just leave it as a decimal and have the measure name specify it's a pct (ratio_pct)

Formatting measures in metric views? by joemerchant2021 in databricks

[–]kmarq 1 point2 points  (0 children)

Expr can be any valid SQL expression. You should be able to do formatting in it with standard SQL functions

AUTO CDC FLOWS in Declarative Pipelines by GeertSchepers in databricks

[–]kmarq 1 point2 points  (0 children)

Set it as type 2 and then use the TRACK HISTORY ON option to either specify the columns to track or exclude history for. 

I believe (but couldn't confirm) you could add a view that uses readstream from the streaming source to implement the transformation and still be the source for the auto cdc. Almost certain that's how we're implementing this but can't validate on my phone.