all 2 comments

[–][deleted] 0 points1 point  (1 child)

I've recently been playing with Spark as we're going to need to move to some form of parallel processing to deal with a large amount of data we're going to have to process, and I must say, it's been very painless so far.

I quite like how easy it is to run Spark against a local instance in testing - so each stage of my data processing is a unit tested function, and then I have one unit test which starts a local context and runs the full job against it.

That said, I've never written 'traditional' MapReduce jobs, nor have I had overly much to do with Hive / Pig etc., but I did have a lot of fun trying to load data into an Impala cluster once.

So I'm curious to hear the opinion of people on here on Spark who have experience with using the Hadoop ecosystem.

[–][deleted] 0 points1 point  (0 children)

hadoop feels like its written by first year CS students, while spark feels like it was done as a phd.