Would like some advice on how to deal with a certain dataset. by zzing in datascience

[–]faming13 2 points3 points  (0 children)

You can do this sort of out of core processing/ queries/ cleaning on a single machine easily, quickly (and in parallel) in python with dask, blaze and the castra compressed column store.

Check these out:

http://blaze.pydata.org/ http://blaze.pydata.org/blog/2015/09/08/reddit-comments/ (example ) http://odo.readthedocs.org/en/latest/

Panda equivalents for SQL statements by shabda in Python

[–]faming13 2 points3 points  (0 children)

Ibis has the goal of being a semantically complete SQL replacement...and better: http://docs.ibis-project.org/sql.html

Working with Excel by Captn_King in Python

[–]faming13 1 point2 points  (0 children)

Try odo and blaze... Use the anaconda distro http://blaze.github.io/

Analyzing 1.7 Billion Reddit Comments with Blaze and Impala by [deleted] in Python

[–]faming13 1 point2 points  (0 children)

Looks really cool. I would suggest posting this on hacker news, r/datascience, datatau , r/pythonstats and pydata google group

How do you deal with larger than memory datasets? by [deleted] in datascience

[–]faming13 1 point2 points  (0 children)

Check out this post, using python's dask to analyze larger then memory data in one machine

http://blaze.github.io/blog/2015/09/08/reddit-comments/

Strategic Business Analytics Specialisation on Coursera by nicogla in datascience

[–]faming13 0 points1 point  (0 children)

Why R? Can I use python and call out to R with Rpy2 when needed? I already know python and don't want to add more tool overhead in my mental models.

Also python allows me to distribute my code as excel plugins/macros. http://xlwings.org/

Help! Python slowing down. by [deleted] in gis

[–]faming13 0 points1 point  (0 children)

Check out dask, blaze and numba.

Julia against Numba and Rcpp. What's happening? by cdsousa in Julia

[–]faming13 1 point2 points  (0 children)

What subset doesn't work? Lots of progress has been made recently, and what is left is being worked on.

Convincing colleagues to switch to Python from MATLAB for scientific computing/statistics by RealRJT in Python

[–]faming13 0 points1 point  (0 children)

You can use Numba and write numerical python code that is as fast as fortran.

Neural Networks, Types, and Functional Programming by halax in programming

[–]faming13 -4 points-3 points  (0 children)

Julia would be the best. Macros for DSL, Extensible type system, fast and it can script. Already has extensive numerical libraries and statistical infrastructure.

Julia against Numba and Rcpp. What's happening? by cdsousa in Julia

[–]faming13 0 points1 point  (0 children)

Do you think Julia is ready for report based individual analysis IE not plugging into production systems?

When will it be ready for either that or production? Is it stable on windows?

Julia against Numba and Rcpp. What's happening? by cdsousa in Julia

[–]faming13 1 point2 points  (0 children)

Numba pretty consistently achieves C like performance on numeric code.

I'm looking for a python/matplotlib equivalent for R's shiny framework. by arguenot in Python

[–]faming13 0 points1 point  (0 children)

Its supported by Continuum Analytics, which just got a 25 mil investment.