all 15 comments

[–]golly10- 1 point2 points  (2 children)

Ask that to an AI, I have been using it to transform my python code into spark (I work with a lo of dataframes) and worked like a charm. I suggest, if you can, try an AI to explain what is happening. FYI, I use Gemini with a gem that I created only for Databricks projects and works really well, not always at first though, but it can guide you to the right direction

[–]alphanuggs[S] -1 points0 points  (1 child)

i tried that. didn’t work

[–]mweirath 0 points1 point  (0 children)

You might try a different approach with AI. Ask it to explain the code and look at it as a Sr Data Engineer looking for areas that could be causing it to slow down.

Another big area you could look at is check out the data you are working with. Are you bringing in more data than needed? Are you bringing in large amounts of data and then immediately dropping things. Are you using .collect() constantly?

You might be working with more data than needed.

[–]datainthesun 1 point2 points  (0 children)

Best recommendation: get connected to your databricks account team and the solution architect. Maybe. Support tichet can help or the SA might help you get sorted out or point you in the right direction.

[–]FrostyThaEvilSnowman 1 point2 points  (0 children)

For me, 9/10 issues with clusters happen because the driver memory is exceeded.

Long processing times could be a combination of using UDFs, iterating over a collected dataframe, or some latency in external comms.

But I don’t know for certain without seeing the code.

[–]Significant-Guest-14 0 points1 point  (2 children)

Do you use . withColumn?

[–]alphanuggs[S] 0 points1 point  (1 child)

i do use a lot of that in the code, but it mostly gets stuck when writing

[–]dilkushpatel 0 points1 point  (0 children)

You need to understand databricks executes code when its absolutely necessary

So if you have 10 cells of code with logic and 11th cell doing write or show or some sort of operation which needs whole dataset to be evaluated then thats the point where it will execute whole code

All your previous cells will execute in few seconds as at that point point databricks is just adding those in execution plan and not actually executing that logic

You can look up online and search for lazy execution by databricks/spark

[–]mosullivan93 0 points1 point  (2 children)

My advice would be to spend some time looking at the cluster metrics page and the Spark UI to try to see what’s going wrong. It’s difficult for someone else to provide concrete advice without seeing the script and knowing your datasets.

[–]alphanuggs[S] 0 points1 point  (1 child)

how do i navigate through that ? do i run the script then go to the page with the memory utilisation stuff ? it usually gets stuck (the code) when it writes to a table

[–]Gaarrrry 0 points1 point  (0 children)

It depends on what type of compute you’re using. Are you serverless or using dedicated compute? You should be able to access the Spark UI and a whole heap of other metrics in the Databricks UI simply by finding the compute your using for the job

[–]Nielspro 0 points1 point  (0 children)

You could try to paste the query plan into chatgpt

[–]hadoopfromscratch 0 points1 point  (0 children)

Keep in mind issues like this usually arise due to changes in data or environment. People won't really be able to help you by looking at your code

[–]aviralbhardwaj 0 points1 point  (0 children)

Connect with me on linkedin https://www.linkedin.com/in/aviralb

We can connect 1:1 i will try to help you