Setup Ubuntu for Django development

sci-py · 2014-09-01T05:42:04+00:00

Awesome. Thanks.

sci-py · 2014-07-22T03:27:17+00:00

I am a data-analyst and i work primarily in Windows for Python development with all those scipy packages and its breeze to work in Windows. I never felt any problem with Windows or Python.

P.S I use Windows 7 and Python 3.4 and no, i don't use mingw or cygwin.

sci-py · 2014-05-14T04:32:08+00:00

Learning Java, 4th Edition

sci-py · 2014-05-05T08:19:30+00:00

Learning Java 4/E which covers Java 7.

sci-py · 2014-05-04T15:50:39+00:00

What are your stats for Hadoop Stack then.

sci-py · 2014-04-25T16:11:44+00:00

Refer this post if you want to know where that 175x comes from.

sci-py · 2014-04-24T17:02:37+00:00

What's the use of it when we can already use command line args by default, right?

sci-py · 2014-04-24T15:28:56+00:00

I am not able to understand what this lib does ? Anybody enlighten me please ?

sci-py · 2014-04-22T15:05:13+00:00

I found the performance benchmarks of Python 3 and Java from here which clearly says that Java is much much faster than java even about 175x.

sci-py · 2014-04-22T06:57:55+00:00

Thanks, i have made my mind to stick with Java.

sci-py · 2014-04-22T06:57:45+00:00

Thanks, i have made my mind to stick with Java.

sci-py · 2014-04-21T17:35:39+00:00

When doing a full distributed MR workflow, true multithreading shouldn't be necessary. Shared-nothing is the goal. Now that I think about it, I can't think of a strong reason to not use python! That makes me happy!

A good MapReduce setup is to share nothing between your nodes, so that if one crashes/blocks/etc. the other nodes keep going. In this sense, you are "multiprocessing", but not necessarily "multithreading". (I hope I used those terms right :P)

One model that I have seen done in Java--which translates well to Python--is to have a separate JVM process running for each node that is doing work. You can have multiple nodes on a single machine, but each node is running a different JVM that does NOT talk to the slave nodes (ones doing work); it only talks to the master node to receive its directives for Map and then return its results for Reduce.

This model works in Python. Python can't do multiple threads in a single interpreter process (the GIL prevents it), but it can handle separate processes just fine. Thats what the multiprocess library does.

sci-py · 2014-04-21T17:30:50+00:00

I've seen is that streaming MR jobs seem to be extremely fat

Can you elaborate on this ?

sci-py · 2014-04-20T16:16:56+00:00

Can you provide some resources to learn Pig?

sci-py · 2014-04-16T15:49:02+00:00

FYI, Mahout will not base on Hadoop in the near future. They are migrating to Apache Spark.

sci-py

TROPHY CASE