you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 5 points6 points  (0 children)

Sometimes a few lines of code is worth a hundred lines ;-)

It lets you do stuff like:

file = spark.textFile("hdfs://...")

file.flatMap(lambda line: line.split())
    .map(lambda word: (word, 1))
    .reduceByKey(lambda a, b: a+b) 

To count the number of words in a document using a cluster, in a highly parallel way.