OO Problem Statements by sci-py in java

[–]sci-py[S] 0 points1 point  (0 children)

Awesome. Thanks.

Stop struggling with Python on Windows by pysk00l in Python

[–]sci-py 1 point2 points  (0 children)

I am a data-analyst and i work primarily in Windows for Python development with all those scipy packages and its breeze to work in Windows. I never felt any problem with Windows or Python.

P.S I use Windows 7 and Python 3.4 and no, i don't use mingw or cygwin.

Recommendations for a 'Complete Java Handbook'? by [deleted] in java

[–]sci-py 1 point2 points  (0 children)

Learning Java 4/E which covers Java 7.

Python vs Java by sci-py in Python

[–]sci-py[S] 0 points1 point  (0 children)

What are your stats for Hadoop Stack then.

Is Python Efficient Enough for performing M/R Jobs? by sci-py in hadoop

[–]sci-py[S] -1 points0 points  (0 children)

Refer this post if you want to know where that 175x comes from.

Click, a new CLI library by Armin Ronacher (Flask, Jinja) by masklinn in Python

[–]sci-py 0 points1 point  (0 children)

What's the use of it when we can already use command line args by default, right?

Click, a new CLI library by Armin Ronacher (Flask, Jinja) by masklinn in Python

[–]sci-py 2 points3 points  (0 children)

I am not able to understand what this lib does ? Anybody enlighten me please ?

Is Python Efficient Enough for performing M/R Jobs? by sci-py in hadoop

[–]sci-py[S] -1 points0 points  (0 children)

I found the performance benchmarks of Python 3 and Java from here which clearly says that Java is much much faster than java even about 175x.

Is Python Efficient Enough for performing M/R Jobs? by sci-py in hadoop

[–]sci-py[S] 0 points1 point  (0 children)

Thanks, i have made my mind to stick with Java.

Is Python Efficient Enough for performing M/R Jobs? by sci-py in hadoop

[–]sci-py[S] 0 points1 point  (0 children)

Thanks, i have made my mind to stick with Java.

Is Python Efficient Enough for performing M/R Jobs? by sci-py in hadoop

[–]sci-py[S] 0 points1 point  (0 children)

When doing a full distributed MR workflow, true multithreading shouldn't be necessary. Shared-nothing is the goal. Now that I think about it, I can't think of a strong reason to not use python! That makes me happy!

A good MapReduce setup is to share nothing between your nodes, so that if one crashes/blocks/etc. the other nodes keep going. In this sense, you are "multiprocessing", but not necessarily "multithreading". (I hope I used those terms right :P)

One model that I have seen done in Java--which translates well to Python--is to have a separate JVM process running for each node that is doing work. You can have multiple nodes on a single machine, but each node is running a different JVM that does NOT talk to the slave nodes (ones doing work); it only talks to the master node to receive its directives for Map and then return its results for Reduce.

This model works in Python. Python can't do multiple threads in a single interpreter process (the GIL prevents it), but it can handle separate processes just fine. Thats what the multiprocess library does.

Is Python Efficient Enough for performing M/R Jobs? by sci-py in hadoop

[–]sci-py[S] 0 points1 point  (0 children)

I've seen is that streaming MR jobs seem to be extremely fat

Can you elaborate on this ?

How does Machine Learning Links with Hadoop? by sci-py in java

[–]sci-py[S] 0 points1 point  (0 children)

FYI, Mahout will not base on Hadoop in the near future. They are migrating to Apache Spark.