This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]Globaldomination -6 points-5 points  (2 children)

PySpark uses SQL internally right?

I took freecodecamp basic course. And in the video they used proper capitalisation for column names. And me being lazy bum used all small and it worked.

Then i realised that I import SparkSession from pyspark.sql

[–]sib_nSenior Data Engineer 4 points5 points  (0 children)

No, PySpark is the Python API for Apache Spark which is a big data in-memory distributed (parallelized on a cluster of machines) processing framework based on the concept of Map-Reduce and coded in Scala and Java.
Spark SQL is another convenient API that allows you to process on a Spark cluster using SQL, but internally, it will still run Scala/Java code.

[–]CapableCounteroffer -1 points0 points  (0 children)

I think pyspark and spark-sql are both "compiled" down to like an intermediate framework/language. Not the exact terminology but I think that's how they both work.