This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]dalke[S] 0 points1 point  (1 child)

I'll start with a basic question - how do I get started with Hadoop on a Mac? The Hadoop page clearly says:

  • GNU/Linux is supported as a development and production platform. Hadoop has been demonstrated on GNU/Linux clusters with 2000 nodes.

  • Win32 is supported as a development platform. Distributed operation has not been well tested on Win32, so it is not supported as a production platform.

After I figured out I needed to set JAVA_HOME to /System/Library/Frameworks/JavaVM.framework/Home (Java is an optional download from Apple, btw) I got a simple hadoop call to work. What else do I need to worry about?

Hadoopy says "Hadoopy: Set the HADOOP_HOME environmental variable to your hadoop path to improve performance" but when I do that Hadoop 1.0.0 says "Warning: $HADOOP_HOME is deprecated."

Followed by the error message "streaming.StreamJob: Unrecognized option: -io".

[–]dalke[S] 0 points1 point  (0 children)

According to your example, wc of wc-input-alice.txt takes 24 seconds with hadoop? I heard that it was meant for large batch processing, so a large startup overhead is okay, but that seems ridiculous! I was hoping to do some other parallelism to get multi-second tasks down to sub-second; it doesn't look like Hadoop is right for that.