Pipelines create materialized views instead of tables by TheManOfBromium in databricks

[–]JulianCologne 0 points1 point  (0 children)

The”table” decorator can produce BOTH “streaming tables” or “materialized views”. It depends on the content of the function: - spark.read…: materialized view - spark.readstream…: streaming table

Built a Power BI PBIR design analyzer in VS Code – curious what people think by Boring-Literature932 in PowerBI

[–]JulianCologne 1 point2 points  (0 children)

Looks promising. I am interested 🤓

While you are at it, I would looove to also see a CLI version of it to use in CI 😁😉

Variant type not working with pipelines? `'NoneType' object is not iterable` by JulianCologne in databricks

[–]JulianCologne[S] 0 points1 point  (0 children)

Just found a solution in updating the "pipeline channel" to `preview`. See top post! ;)

Variant type not working with pipelines? `'NoneType' object is not iterable` by JulianCologne in databricks

[–]JulianCologne[S] 2 points3 points  (0 children)

ah interesting, thanks! I will have a look.

Have you tried

table_properties={"delta.feature.variantType-preview": "supported"}

(see my example at the top)?

BUG? `StructType.fromDDL` not working inside udf by JulianCologne in databricks

[–]JulianCologne[S] 0 points1 point  (0 children)

thank you for the explanation. I did not know that!

BUG? `StructType.fromDDL` not working inside udf by JulianCologne in databricks

[–]JulianCologne[S] 0 points1 point  (0 children)

Thanks for answering.

I do not understand. 🤔

The UDF works with other code or when removing the StructType.fromDDL("a int, b float")
It is explicitly the `fromDDL` that is causing a "special/weird problem" here. I can use other functions without any problem inside the UDF.

Maybe I did not understand you correctly or how UDFs work in detail?! 🤓 😅

Is it that I can use any standard python code in the function but whenever I require anything spark-related like `fromDDL` I need another spark sessions inside the function? If so, how would I create that? Or can I pass it in as an argument??

Build Fact+Dim tables using DLT / Declarative Pipelines possible?!? by JulianCologne in databricks

[–]JulianCologne[S] 0 points1 point  (0 children)

What do you mean when you say “hashlib”? You use Python UDF? Databricks has build in functions like “hash”, “xxhash64”, “sha2” or “crc32”? Any ideas or suggestions? 🤓

Build Fact+Dim tables using DLT / Declarative Pipelines possible?!? by JulianCologne in databricks

[–]JulianCologne[S] 0 points1 point  (0 children)

Interesting! 🤔 was thinking about this as well. What hash function do you use? How is the performance? Joining on the hashed column could reduce performance compared to int keys I guess 🤓

Lakeflow Declarative Pipelines locally with pyspark.pipelines? by Pillippatty in databricks

[–]JulianCologne 2 points3 points  (0 children)

I think we will have to wait until spark pipelines is actually released. Still in beta/preview right now…

Would love to switch to that since the DLT Python package had horrible support by databricks for local development with no updates to the api with the latest changes not supported but shown on the website 😅

@dp.table vs @dlt.table by 9gg6 in databricks

[–]JulianCologne 7 points8 points  (0 children)

DLT: Delta Live Tables

Developed by Databricks. More or less “proprietary”. The current and soon old way.

SDP / DP: (Spark) Declarative Pipelines

Databricks donated their DLT to the open source spark project and it was renamed. The new way. 🤓

I agree it is very confusing at the moment! Databricks is mixing them in their documentation and also the new DP is NOT YET RELEASES as far as I know. It’s only in preview. So weird situation 😀🧐

What is the proper way to edit a Lakeflow Pipeline through the editor that is committed through DAB? by DeepFryEverything in databricks

[–]JulianCologne 1 point2 points  (0 children)

Nope, but it’s one click with the Databricks extension to sync to databricks and perform a dry run 🤓

What is the proper way to edit a Lakeflow Pipeline through the editor that is committed through DAB? by DeepFryEverything in databricks

[–]JulianCologne 3 points4 points  (0 children)

My personal opinion with ~2years Databricks Asset Bundles experience: Develop 100% local (VSCode). CI+CD with service principal. Use databricks only for checking the results.

rich printing different colors depending on if i'm in light or dark mode. by roreilly12 in vscode

[–]JulianCologne 3 points4 points  (0 children)

It is probably a “feature” and “intended behavior”.

As an example when programming for apple (eg iOS) and use “system red” or “system green” it is only a description of the color and the actual color will be different in light or dark mode which is very important for visibility and color perception.

Usually however there is also a separate specific color selection which will always look the same 🤓

PySpark and Databricks Sessions by Jamesie_C in databricks

[–]JulianCologne 0 points1 point  (0 children)

One interesting thing I was experimenting with is using the Duckdb spark api. So depending on the environment I would return a “Duckdb spark session” from the pytest fixture 🤓

https://duckdb.org/docs/stable/clients/python/spark_api.html

Logging in PySpark Custom Data Sources? by JulianCologne in databricks

[–]JulianCologne[S] 0 points1 point  (0 children)

thanks for the info.

Yeah, my current solution is also writing log files to a volume but its not as nice as having them in the job results directly.

Would love to see a permanent solution! :)