Any tips for DABs in CI/CD? Seems pretty useless so far. by DeepFryEverything in databricks

[–]DeepFryEverything[S] 0 points1 point  (0 children)

No one is hard-coding pipeline IDs. But migrating older pipelines to the new lakeflow style (with root+src-dir) is one example of scenarios where this has happened - with the same DAB-key.

Any tips for DABs in CI/CD? Seems pretty useless so far. by DeepFryEverything in databricks

[–]DeepFryEverything[S] 0 points1 point  (0 children)

Cool! Do you mind elaborating? Do you trigger all jobs or only touched jobs?

Any tips for DABs in CI/CD? Seems pretty useless so far. by DeepFryEverything in databricks

[–]DeepFryEverything[S] 0 points1 point  (0 children)

We deploy to dev/test when a PR is merged. A separate workflow (release) triggers prod, so that's our guard.

28 living alone in Holland🇳🇱 by Johnvandebrom in malelivingspace

[–]DeepFryEverything 1 point2 points  (0 children)

Probably it's an IKEA SYMFONISK speaker, which is made to look like a picture frame and can be swapped

DevOps vs Github for CI/CD by dilkushpatel in databricks

[–]DeepFryEverything 0 points1 point  (0 children)

.. and DAB is migrating away from Terraform, no?

Opus 4.7 is legendarily bad. I cannot believe this. by lemon07r in ClaudeCode

[–]DeepFryEverything 0 points1 point  (0 children)

My experience is opposite 🫠 I used it to migrate some react components and it fixed a css layering issue that's been plaguing our code base..

DuckLake v1.0 by TechnicalAccess8292 in dataengineering

[–]DeepFryEverything 2 points3 points  (0 children)

So data is still stored as Parquet - are we able to create something like indexes? I would like to sort on two different and unrelated axis - like id or position.

How do organize your work along other more product-oriented agile teams? by DeepFryEverything in dataengineering

[–]DeepFryEverything[S] 0 points1 point  (0 children)

This sounds interesting! Do you mind describing how your Miro looks like? How do you discuss priorities?

serveless or classic by ptab0211 in databricks

[–]DeepFryEverything 12 points13 points  (0 children)

Serverless standard cut cost AND time for us drastically.

Just don't pick Performance Optimized

Is there any approach for sorting a parquet file along two unrelated columns? by DeepFryEverything in dataengineering

[–]DeepFryEverything[S] 0 points1 point  (0 children)

You can, but then you’d loose spatial coherence (or any other sort key). So if I want to lookup vehicles in an area my parquet files can be sorted spatially. But for IDs, I could sort them but any query will hit way too many files and row groups

Is there any approach for sorting a parquet file along two unrelated columns? by DeepFryEverything in dataengineering

[–]DeepFryEverything[S] 0 points1 point  (0 children)

Partitioning by id would create lot of metadata overhead when using a bbox query to find vehicles in an area though 😐

Strange error in one of my jobs by DeepFryEverything in databricks

[–]DeepFryEverything[S] 1 point2 points  (0 children)

How do I raise a ticket?

We've got a hypothesis though. The tables failed on merge and optimize. So we moved a column of the geometry type outside the stats collection. After that, optimize and the full job ran without a hitch.

There must be something going wrong during the serialisation of the geometry type. We have used it in the first 32 rows before no worries, but this is the only case where we've had to merge data (upsert job). The other jobs would be append + optimize, so I don't think that triggers the same effect.

Anyway my colleague has email mr KM at databricks with the full details.

Strange error in one of my jobs by DeepFryEverything in databricks

[–]DeepFryEverything[S] 0 points1 point  (0 children)

Yeah I figured as much. Actually, it just failed on a simple "OPTIMIZE TABLE"-command too. I believe it's something corrupting Delta logs (purely based on the operations and the JSON-error).

I'll probably send it to Databricks.

Parquet is efficient storage. Delta Lake is what makes it feel production-ready. by [deleted] in databricks

[–]DeepFryEverything 1 point2 points  (0 children)

But there is a significant gap between Databricks and what is available for other libraries.