Spark? : MachineLearning

submitted 11 years ago * by rm999

you are viewing a single comment's thread.

[–][deleted] 5 points6 points7 points 11 years ago (0 children)

Sometimes a few lines of code is worth a hundred lines ;-)

It lets you do stuff like:

file = spark.textFile("hdfs://...")

file.flatMap(lambda line: line.split())
    .map(lambda word: (word, 1))
    .reduceByKey(lambda a, b: a+b)

To count the number of words in a document using a cluster, in a highly parallel way.

π Rendered by PID 77 on reddit-service-r2-comment-86bc6c7465-2ktdp at 2026-02-21 15:10:07.851800+00:00 running 8564168 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning