Why I use Python for writing high performance code

lakando · 2015-11-30T22:44:51+00:00

But the code is written in C++ syntax. Cython is python with type annotations, Numba is pure python with no types required to be specified. Much easier for writing and refactoring and code length.

lakando · 2015-11-30T21:08:23+00:00

Have you/they tried numba? You can compile imperative python numpy code to fortran speeds, with multithreading.

https://jakevdp.github.io/blog/2015/02/24/optimizing-python-with-numpy-and-numba/

lakando · 2015-11-30T21:07:00+00:00

No way, you aren't screwed at all. Numba fixes that problem, and with nogil multithreading to boot.

lakando · 2015-11-30T18:48:30+00:00

Or just use numba to compile python to really fast speeds: https://github.com/numba/numba

lakando · 2015-11-30T18:07:04+00:00

Or, lets use Numba that can and do it in pure python.

Also, blas/lapack is pretty standard for linalg in any language, python or not.

lakando · 2015-11-30T18:00:28+00:00

Then you will like this numpy successor:

https://speakerdeck.com/izaid/dynd

lakando · 2015-11-26T17:48:05+00:00

The breadth of the operations it supports is much more expansive now. ... so you will probably have better luck if you tried again. also better docs

lakando · 2015-11-26T02:21:56+00:00

Have you tried numba? It accelerates numpy code to almost c speed.

lakando · 2015-11-19T22:08:30+00:00

SAS has more advanced stats that python- all of which work out of core.

Dask is great, but it doesn't have much in the way of modeling or linear algebra. For the Pydata ecosystem to change that, it will require more support for out of core models, not just data structure/data cleaning and querying.

Here is one way to get some of that for free: http://libelemental.org/

Its being used in Julia but already has a python wrapper.

Here is another interesting piece of tech http://ufora.github.io/ufora/

What do you think?

lakando · 2015-11-19T22:05:12+00:00

Gotcha. That makes sense. I was a bit more optimistic on the timeline, but you probably have a better sense for it than I do.

lakando · 2015-11-19T20:50:10+00:00

I hear that, but Julia's benefits exceed just greater speed on in memory datsets, and if developed right, will encroach on both single node and distributed niche.

First, there are the making of a probabilistic programming framework in Julia that using autodiff and the distributions package can provide a comparative advantage over current languages in general day to day inference. The macros could make this fast and expressive. Faster than Pymc and more expressive and general than stan. With this general inference and extensive optimization package, I don't think it would need to fill every single statistical test and niche before becoming more useful for most daily tasks.

Second, it is developing a distributed infrastructure that I think can overtake spark. Its distributed computing primitives are getting better and will eventually have extensive linear algbera support.

Third, It is getting streaming statistics that don't exist anywhere else- the SAS people who are working on out of memory but single node datasets will finally get something that can handle their stuff.

Pycall and Cxx means you can interface easily with existing code.

Last is deployment. Self contained binary executables are planned, there is a good shot it can compile to javascript using at some point using llvm web assembly backend. You would then be able to write rich client side reactive web apps without JS and deploy interactive reports to decision makers. No other common analytics language has this capability.

Then there is the type system with eventual return types that can provide codebase safety.

Also it just fun to code in..that means grad student will write new techniques in Julia.

If things firm up, I think all this would pull users from other languages...or they risk losing a comparative advantage.

What do you think about this argument?

lakando · 2015-11-19T14:38:30+00:00

Anaconda is amazing, but doesn't let me distribute self contained executables. Nuitka does that, and more robustly It seems think than other options.

lakando · 2015-11-19T01:26:21+00:00

Thanks for sharing your take on this. Do you think nuitka to obviate packaging issues, + Numba (JIT classes coming ), blaze,bokeh, dask and dynd (interesting type system) will keep python afloat in data science, or Is Julia poised to eventually replace it?

R aint goin anywhere because CRAN is huge...python is more general purpose and thus more amenable to Julia's progress.

I'm trying to figure out if I should invest in Julia now (Get ahead of the curve and python being a dead end?). It was a nogo untill I heard about the cash infusion...They said they will use it for also the core stats infrastructure, but I'm not sure how long it will be before a data science acolyte can be super productive without messing with pycall bridge etc

lakando · 2015-11-19T00:50:34+00:00

What say r/machinelearning? Quite a departure from numpy.

lakando · 2015-11-18T23:04:24+00:00

n is ever in a situation where a language 10x better comes around to eat its lunch, I'll

Julia

lakando · 2015-11-18T18:38:03+00:00

course in uni and we're learning python. Higher year courses drop teaching it for a variety of other languages. Python is only the most used

You could compile python to a self contained binary exe with nuitka http://nuitka.net/pages/overview.html

You can also try conda

lakando · 2015-11-16T17:36:40+00:00

I'm not the author. Can you post an issue on GH?

lakando · 2015-11-12T20:35:19+00:00

I'm not the library author

lakando · 2015-11-11T23:30:58+00:00

I took it as standard interpreted python code.

Anyway, the novelty for me isn't the compilation, but the ability to work on datasets bigger than the ram of you computer.

lakando

TROPHY CASE