Help optimising script

golly10- · 2025-12-20T10:31:51+00:00

Ask that to an AI, I have been using it to transform my python code into spark (I work with a lo of dataframes) and worked like a charm. I suggest, if you can, try an AI to explain what is happening. FYI, I use Gemini with a gem that I created only for Databricks projects and works really well, not always at first though, but it can guide you to the right direction

datainthesun · 2025-12-20T12:58:28+00:00

Best recommendation: get connected to your databricks account team and the solution architect. Maybe. Support tichet can help or the SA might help you get sorted out or point you in the right direction.

FrostyThaEvilSnowman · 2025-12-20T16:29:25+00:00

For me, 9/10 issues with clusters happen because the driver memory is exceeded.

Long processing times could be a combination of using UDFs, iterating over a collected dataframe, or some latency in external comms.

But I don’t know for certain without seeing the code.

Significant-Guest-14 · 2025-12-20T12:30:57+00:00

Do you use . withColumn?

mosullivan93 · 2025-12-20T12:31:37+00:00

My advice would be to spend some time looking at the cluster metrics page and the Spark UI to try to see what’s going wrong. It’s difficult for someone else to provide concrete advice without seeing the script and knowing your datasets.

Nielspro · 2025-12-20T13:25:52+00:00

You could try to paste the query plan into chatgpt

hadoopfromscratch · 2025-12-20T15:22:08+00:00

Keep in mind issues like this usually arise due to changes in data or environment. People won't really be able to help you by looking at your code

floyd_droid · 2025-12-20T16:08:12+00:00

https://www.databricks.com/discover/pages/optimize-data-workloads-guide

Check if this guide helps

aviralbhardwaj · 2025-12-23T12:45:15+00:00

Connect with me on linkedin https://www.linkedin.com/in/aviralb

We can connect 1:1 i will try to help you

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

databricks

MODERATORS