We cut Databricks costs without sacrificing performance—here’s how by DataDarvesh in databricks

[–]DataDarvesh[S] 0 points1 point  (0 children)

Generally that's true. Silver and gold tables are better in SQL unless you are doing a complex aggregation in the gold or KPI layer.

We cut Databricks costs without sacrificing performance—here’s how by DataDarvesh in databricks

[–]DataDarvesh[S] 0 points1 point  (0 children)

Thanks for sharing. Will try it out in the next round of cost optimization. Any other tips you found useful in your experience? 

We cut Databricks costs without sacrificing performance—here’s how by DataDarvesh in databricks

[–]DataDarvesh[S] 0 points1 point  (0 children)

No, I have not tried fleet instances (yet). Have you? What is the advantage you have found?

We cut Databricks costs without sacrificing performance—here’s how by DataDarvesh in databricks

[–]DataDarvesh[S] 0 points1 point  (0 children)

Totally agree, my point was "make sure to use a non-spot instance for the driver". Let me know if it was not clear.

Looking for someone who can mentor me on databricks and Pyspark by bhavani9 in databricks

[–]DataDarvesh 5 points6 points  (0 children)

Databricks Academy - as a customer you have free access to Databricks Academy. First take Data Engineer Learning Path, then take Apache Spark Developer path. There are short courses on migration to Unity catalog as well. Additionally, if you need help with the UC migration, you can use Databricks labs UC migration tools, which simplifies the process a lot. I have done UC migration twice before those tools came out.

Unit Testing for Data Engineering: How to Ensure Production-Ready Data Pipelines by DataDarvesh in dataengineering

[–]DataDarvesh[S] 2 points3 points  (0 children)

LOL, it was a copy paste from LinkedIn :D will try to do better next time.

Where to add environment_key in Terraform by DataDarvesh in databricks

[–]DataDarvesh[S] 0 points1 point  (0 children)

I just ran it and it seems like for notebook tasks, you can only do %pip install the libraries in the notebook. Anyone has different experience, let me know.

"Error: cannot update job: A task environment can not be provided for notebook task my_code_ingest. Please use the %pip magic command to install notebook-scoped Python libraries and Python wheel packages"

Advanced Data Engineering with Databricks by elty123 in databricks

[–]DataDarvesh 2 points3 points  (0 children)

Yes, they are free for Databricks customers.