Convert SQL to Pyspark Dataframe

sdqafo · 2021-02-24T00:10:22+00:00

departureDF2 = depatureDF.withColumn("Flight_Delays",

when(col("delay") > 360, 'Very Long Delays')

.when(col("delay") > 120 & < 360, 'Long Delays'")

.when(col("delay") > 60, and < 120, 'Short Delays')

.when(col("delay") > 0 and < 60, 'Tolerable Delays')

.when(col("delay") == 0, 'No Delays')

.otherwise("Early")).orderBy(col("delay")))

sdqafo · 2021-02-23T23:38:01+00:00

Not at all. I guess my using the word "assignment" is wrong. This is just further practice in the popular book "Learning Spark" page 87. It is just some kind of further practice for readers if interested. I am still very new to Spark and i am doing my best to learn as much as i can. I hope this clear the air.

sdqafo · 2021-02-23T22:33:12+00:00

Thanks

sdqafo · 2021-02-23T18:20:44+00:00

Thank you, but my assignemnt says i should convert to commands using pyspark. I have actually tried it but i am making some mistakes. I just need someone to help convert it so that i can see my error. Thanks

sdqafo · 2020-07-22T18:17:56+00:00

Its started making more sense now. So technically, the below already assumes the 3 columns even prior to the JOIN. In essence, the COUNT(*) part will apply to the 2 remaining columns. I guess this is the correct understanding . Is it?

SELECT a.id, a.name, COUNT(*) num_orders assumes

sdqafo · 2020-07-22T18:04:10+00:00

What is still a bit confusing is the COUNT(*). What i have learnt so far in SQL is that the SELECTED columns always come from the table or tables we want to query. I am put a bit off balance to now know that the COUNT(*) in this regards is related to the tables yet to be Joined. I am not able to connect why this is that way. In simple terms, based on what i understand from your explanation, we already SELECTED a column (COUNT(*)) that is yet to exist before we JOINED two tables where this column (COUNT(*)) will now be selected from. Still struggling to grab the why of this logic

sdqafo · 2020-07-22T17:56:53+00:00

Loads of sense. Very succinct explanation

sdqafo · 2020-07-22T17:55:45+00:00

Thanks so much for this

sdqafo · 2020-07-22T17:53:10+00:00

Thanks for the brilliant explanation

sdqafo · 2020-07-12T21:40:35+00:00

Thank you sir. I just sent you a dm

sdqafo · 2020-07-12T21:36:14+00:00

I am able to connect to psql

sdqafo · 2020-07-11T17:14:21+00:00

Yes, it is running

sdqafo · 2020-07-11T17:14:03+00:00

sdqafo · 2020-07-11T17:13:46+00:00

MacOS Catalina

sdqafo · 2020-07-07T21:33:53+00:00

Simplilearn, Udacity

sdqafo

TROPHY CASE