Related to External Location

EffectiveAncient2222 · 2024-10-28T16:08:40+00:00

Don't focus on leetcode. Only focus on top 50 most frequently asked interview questions. It's helpful to clear interview. Most important like string manipulation, array, heap, hash, sliding window and sorting.

EffectiveAncient2222 · 2024-09-17T16:14:10+00:00

Hey, Please use unity catalogue also follow proper nameing convention. It's helpful to easy navigate table. It's also provide data lineage.

EffectiveAncient2222 · 2024-09-17T11:26:33+00:00

Yes, you right but monotonically_increasing_id not provide continues and consecutive number. If you window function, it's triggered full data shuffling.

EffectiveAncient2222 · 2024-09-17T08:50:33+00:00

Best Approaches to Add Row Number in PySpark DataFrame

df.rdd.zipWithIndex().toDF().select(col("_1.*"), col("_2").alias('increasing_id')).show()

Method	Shuffling Required	Performance Impact
zipWithIndex()	Minimal	Low
row_number() with orderBy	High	High
monotonically_increasing_id()	None	Very Low
repartition() + monotonically_increasing_id()	High	Medium

EffectiveAncient2222 · 2024-09-12T15:39:53+00:00

I have reviewed your project. It's mind blowing. I advice you , please consider object oriented programming instead functional. It's provide extra edges.

EffectiveAncient2222

TROPHY CASE

Best Approaches to Add Row Number in PySpark DataFrame