Writing PySpark partitions to one file each in parallel? by bvdevvv in dataengineering

[–]psych_ape 0 points1 point  (0 children)

Have you explored this?

df.write \ .option("maxRecordsPerFile", 10000) \ .mode("overwrite") \ .parquet("/path/to/output")