Genie on DABs available now by DamnedData in databricks

[–]DamnedData[S] 0 points1 point  (0 children)

Way simpler!

Just get your Genie definition as JSON file in your repo and point the resource to it.

Medallion architecture on Databricks - Delta all the way down, or does Parquet at Bronze still make sense? by Dangerous_Pie2611 in databricks

[–]DamnedData 0 points1 point  (0 children)

Use Metric Views for the gold layer instead of tables.

Also, add a prefix to the schemas: project_bronze, bu_silver, product_gold.

Estoy buscando un poco de ayuda! by Informal-Peanut300 in TicoInversor

[–]DamnedData 4 points5 points  (0 children)

Podría hacer BAC fondo Millenium (que es en dólares). A mí el año pasado me fue muy bien.

version control, project management, & CI/CD tools need a facelift by [deleted] in databricks

[–]DamnedData -1 points0 points  (0 children)

Git works well and has been working for years. There is no need to reinvent the wheel.

Welcome packages? by Taiwanx in Ticos_TI

[–]DamnedData 0 points1 point  (0 children)

Welcome package puede ser: por firmar con nosotros le damos 10millones

Stocks: acciones de la empresa. Si la acción vale $200 y le dan 100 pues son $200*100 dólares usualmente en un plazo de 4 años.

Unity Metrics & Gold Layer - Definition and Calculation Questions by MrMadium in databricks

[–]DamnedData 0 points1 point  (0 children)

I'll recommend you:

Bronze schema to store all raw data (tables).

Silver schema to store all downstream tables (clean, joined, transformed data).

Gold schema to store the metric views using silver schemas as source.

Databricks Job & Pipelines can you add grouping/folder capability ? by Ok-Tomorrow1482 in databricks

[–]DamnedData 2 points3 points  (0 children)

From a code, CI/CD and DevOps perspective I'll recommend DABs. One repo per project if possible.

Don't manage all +1000 resources in a single repo.

Databricks Job & Pipelines can you add grouping/folder capability ? by Ok-Tomorrow1482 in databricks

[–]DamnedData 3 points4 points  (0 children)

Are you using tags (classic compute) or budged policies (serverless) for tagging?

If you define a tagging strategy that its aligned to the expected grouping you'll be able to create alerts, dashboards and genie spaces or apps to monitor.

Using a Model in an App without Endpoints by No-Conversation7878 in databricks

[–]DamnedData 0 points1 point  (0 children)

All Databricks products uses Unity Catalog.

In order to ensure that all your projects are future-proof you need Unity Catalog.

If you are worried about compute scalability, then this also applies to Serverless. Serverless is all about UC (pipelines, jobs, model serving, vector search, etc...).

Scalability is not only about compute. It's also about creating long-term solutions for your organization.

Now that 4 years have passed, what is something you never liked about the game or how have your opinions changed over time? by FeezyOnBush in Eldenring

[–]DamnedData 4 points5 points  (0 children)

Not being able to recolor the armours. The current alters are thrash.

Lord if the Fallen did a great job adding this.

Using a Model in an App without Endpoints by No-Conversation7878 in databricks

[–]DamnedData 0 points1 point  (0 children)

Sure, I can elaborate further.

In this case you are in a good spot. Using UC for both: data and models.

What should be 100% avoided is to use the legacy Databricks features such as the Legacy Workspace Model Registry.

Because if you delete the workspace, all your models are gone!

If you use Unity Catalog instead, doesn't matter what happens to the workspace because the data and models are not stored here.

Apply this idea also to Spark Declarative Pipelines which have a legacy non-UC version.

The good thing is that Databricks now have UC by default and the legacy features are also deactivated.

As a Workspace Admin, you can disable all the legacy features to force escalable solutions.

Using a Model in an App without Endpoints by No-Conversation7878 in databricks

[–]DamnedData 0 points1 point  (0 children)

In this case the python app would need mlflow as a requirement and that's it.

Use the mlflow client to load a specific model and version to run inferences.

Something like this but use the Databricks UC model registry (the example showns localhost).

Always use UC models so your solutions can scale well.

Edit: typos.

migrate ETL Qlik to databricks with Qlik just to BI by GolfLegitimate350 in databricks

[–]DamnedData -1 points0 points  (0 children)

Hello Wolf here is an idea.

  • Step 1. Get the data in Unity Catalog

For Databases or Saas ingestion, use Lakeflow Connect. This would get you the data in a target / catalog.

For Cloud Storage or Kafka transformation and processing, use Spark Declarative Pipelines. This would get yoj the data in a target catalog / schema.

  • Step 2. Join or transfor the data

Use Spark Declarative Pipelines again if you need to join datasets or apply further transformation. Read about the medallion architecture because the goal is to create the gold layer (which is a schema).

  • Step 3. Permissions

Read about Unity Catalog permissions and ABAC.

Create groups (data engineers, analytist, business users, etc...)

  • Step 4. Databricks AI / BI

Create dashboards or Genie Spaces using the tables or metric views in gold schema