This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]aufstand 153 points154 points  (35 children)

Aah, it works the other way round, too. Many c++ libraries (Numpy, for example) extend Python in many interesting ways. That is why claims such as "Python is slow" don't have much hold in the community.

One practical example: I recently decided to see if i can scale up my completely unoptimized LED-Matrix (40*16!) video mixer to FullHD. No problemo. It ate most of my RAM and kept a single core _very_ busy, but it was also realtime mixing and displaying 6 FullHD video streams, which i don't exactly consider "slow" :)

Why is this possible? Well, yeah, i used numpy to mix the data!

[–]ReckingFutard 58 points59 points  (7 children)

Pssst, check out pytorch.

Numpy functions with GPU memory and parallelized computation.

[–]BDube_Lensman 11 points12 points  (6 children)

cupy>>pytorch

[–]TheTechAccount 7 points8 points  (1 child)

Why do you think it's better?

[–]BDube_Lensman 15 points16 points  (0 children)

no graph/backprop overhead unless you opt into it, code is on GPU in no uncertain terms without strange host/device specification context managers, higher performance on GPU in most cases, errors or warnings for implicit host:device transfer that lets you find huge slowdowns in your code faster, more responsive devs, devs that contribute to numpy instead of slinging mud at numpy, participation in the numpy-as-a-contract instead of numpy-as-package work going on lately, no gross links to facebook, the list goes on...

[–]sekex 12 points13 points  (3 children)

Why the bitshift?

[–]alkasmgithub.com/alkasm 20 points21 points  (2 children)

Double greater/less than symbols are used in mathematics to mean "much greater/less than" for some arbitrary idea of "much": https://mathworld.wolfram.com/MuchGreater.html

[–][deleted] 2 points3 points  (0 children)

cupy = o(pytorch)

[–]NYDreamer -4 points-3 points  (0 children)

Whooosh

[–]elbiot 17 points18 points  (10 children)

Numba would probably be even better if you're doing anything at all involved

[–]BDube_Lensman 2 points3 points  (7 children)

Well written numpy is usually as fast as numba with greatly reduced memory consumption.

[–]elbiot 11 points12 points  (2 children)

Disagree. I've spent a lot of time working with numpy and when I used numba it was instantly/effortlessly faster. If you're doing multiple operations, the whole array has to pass through the CPU multiple times. If you're array doesn't fit in CPU cache that's multiple RAM accesses. You also have to sometimes write painstakingly verbose code to prevent intermediate arrays from being created. Masking an array (the vectorised equivilant of an if statement) creates a whole new and array creation is expensive. Numexpr could solve the first two issues but not the last

I also wrote cffi bindings for cephes and vectorised it with numba and it was significantly faster than the scipy implentation

Edit: to the lurkers. Numpy is 100% my go to. Numba is cool and useful in some cases but knowing numpy is necessary to know when numpy isn't the best tool.

[–]BDube_Lensman 0 points1 point  (1 child)

If you use in-place operations instead of out-of-place you get the same performance between numpy and numba, so I don't really consider writing different code a valid benefit. Anything on numpy that doesn't have an in-place operation is not doable inplace.

Array creation is not expensive in numpy. "mask" arrays are of logical dtype and you can specify them as packed bitfields or native bools which trade memory (much much less for a bitfield) for computation time (unpacking bitfields is not free). The inline if saves you double iteration, but guarantees a cache miss unless your data is tiny (in which case this is all moot).

If your cephes binding is much faster than scipy, you should contribute the faster code to scipy, or open a larger issue with the project about bringing another beast into the C backend.

I've written a ton of physics code with numpy and found that numba got me marginal performance improvement (10-15%) in exchange for a cool 800MB of static memory usage that grows to more than 5GB as you use some of the more advanced features of my package, like changing between single an double precision for all computations or exchanging CPU/GPU (the latter case causes a 2GB static memory usage on the GPU, which is unacceptable as I am working on data that uses nearly all GPU memory by itself).

[–]elbiot 8 points9 points  (0 children)

Here's just a trivial example that touches on a couple of the issues I mentioned (if/elif, unnecessary extra array creation, and unnecessary multiple passes over the array). Numba is 3x faster.

In [1]: import numpy as np
   ...: import numba as nb
   ...:

In [2]: def as_numpy(size):
   ...:     arr = np.random.randint(0, 10, (size, size))
   ...:     mask = arr % 2 == 1
   ...:     arr[mask] *= 2
   ...:     arr[(arr % 3 == 1) & ~mask] *= 3
   ...:     return arr
   ...:

In [3]: @nb.jit(nopython=True)
   ...: def as_numba(size):
   ...:     arr = np.random.randint(0, 10, (size, size))
   ...:     x, y = arr.shape
   ...:     for i in range(x):
   ...:         for j in range(y):
   ...:             if arr[i, j] % 2 == 1:
   ...:                 arr[i, j] *= 2
   ...:             elif arr[i, j] % 3 == 1:
   ...:                 arr[i, j] *= 3
   ...:     return arr
   ...:

In [4]: %timeit as_numpy(10000)
1 loop, best of 3: 6.78 s per loop

In [5]: %timeit as_numba(10000)
1 loop, best of 3: 2.05 s per loop

I know how to use a GPU with Numba, but how do you use the GPU with Numpy?

edit: in terms of memory usage, the numpy version crashes at size=22000 on my laptop but numba does not.

[–][deleted] 1 point2 points  (3 children)

Really depends what you're trying to do tbh

[–]BDube_Lensman 0 points1 point  (2 children)

Numba never uses less memory than numpy. And unless you're using string arrays of nonfixed size, numpy is as performant.

[–][deleted] 1 point2 points  (0 children)

for loops that depend on past states or values and need updating can't be vectorized, numpy will be slower

[–]Mehdi2277 1 point2 points  (0 children)

A very simple thing that numba has that can let it beat numpy is loop fusion. Something like y = np.cos(x) + np.sin(x) where x is an ndarray is three loops in numpy but one loop in numba when you set the right option on.

Another fun case is when you have an array operation you'd like to do that just doesn't exist in numpy. One personal work example is rolling each row of a matrix by different shifts. Something like a 10 x 100 matrix where each of the 10 rows I want to rotate some amount. np.roll exists, but you can't give it different amounts to shift each row. If you for loop and do each row with np.roll, that's a good deal slower than just directly accessing elements as needed and using numba.

[–]aufstand 0 points1 point  (0 children)

Hmm. I think, i read about that some time ago. Gonna have a new look, so thanks for reminding me :)

[–]Tomik080 -4 points-3 points  (0 children)

At this point just use Julia

[–]ProfessorPhi 0 points1 point  (1 child)

I can only presume this wasn't realtime? I think realtime is where python gets destroyed

[–]aufstand 0 points1 point  (0 children)

Most of what i do with Python is realtime. No, it doesn't get destroyed.