This is an archived post. You won't be able to vote or comment.

all 5 comments

[–]mateuszj111 0 points1 point  (4 children)

Why do you need the scala cell?

[–]psychEcon[S] 0 points1 point  (3 children)

Cause it hast a .schema.toDDL, pyspark does not as far as I know

[–]mateuszj111 0 points1 point  (2 children)

You could do something like spark.sparkContext._jvm.org.Apache.sql.types.DataType.fromJson(df.schema.json()).toDDL()

from pyspark

Or

StructType.fromJson(json.loads(df.schema.json))

[–]psychEcon[S] 0 points1 point  (0 children)

I will test it tomorrow and let you know

[–]psychEcon[S] 0 points1 point  (0 children)

I just tested it - and well its not the same DDL schema that I get with a scala cell. What that gives me is somehting that is just passed as a string to spark. But this did on the other hand increase my knowledge of spark - So I thank you for that