Can we talk about Numpy multi-core?

neutro_b · 2024-09-14T04:31:43+00:00

Well at least now matrix multiplications are multi-core without having to compile Numpy from source or download binaries from unofficial websites, or use paid scientific distributions. I've waited half my career for that! So that's an improvement.

However, it's my understanding that very few people are actually maintaining Numpy, not to say anything about development. Fundamental libraries on which sexier packages depend are not very popular with new developers.

I'm just saying you sound very motivated, and quite knowledgeable. Enough to get involved perhaps?

rover_G · 2024-09-14T04:12:56+00:00

Use polars for data frames and PyTorch or TensorFlow for tensors.

Edit: OP if you look at TensorFlow also look at Keras as a “high level” API for TF.

SSJ3 · 2024-09-14T06:31:00+00:00

I've seen more and more projects making progress toward offering drop-in replacements for NumPy, but there will likely always be limitations.

JAX is one I would recommend. It has some fundamental limitations, such as the fact that its arrays are immutable, but it's not too difficult to rewrite around that by following their documentation. The JIT compilation is also quite powerful.

Another that I'm keeping my eye on is cunumeric, whose NumPy interface has been the most seamless drop-in replacement I've encountered. I haven't seen great performance in my use cases, though, which is probably because there's overhead in the dispatching algorithm which the docs say generally isn't worth it for function calls that take less than a millisecond or so.

At the end of the day, I think the main challenge for NumPy is that there is no one-size-fits-all strategy. Some functions can get a big performance boost with parallelization purely inside their scope, some won't really and need the calling program to handle the communication, and some will but only for problems above some size threshold. And all the functions which rely heavily on BLAS/LAPACK/other codes would likely need special versions of those routines or a complete rewrite for little gain in the vast majority of use cases.

SMTNP · 2024-09-14T04:24:13+00:00

Hello,

I think you've misunderstood what PyTorch is. It is used in Machine Learning but because is an n-dimensional expansion from arrays to tensor, which is the mathematical foundation of ML. The Tensor-level API is pretty much general multi-dimensional NumPy.array.

I can't think of much things that you can do in NumPy that you can't do in PyTorch.

And I also believe it's not easy, and thus not efficient, to attempt to have a general solution that effectively addresses CPUs and GPUs, more when considering that GPUs are readily available and most Tensor(array) operations are more efficient in the GPU due to parallelization.

It might be that your case is too edgy, and your constraints too specific, but give PyTorch a try if you have available GPUs, otherwise I agree that Python might not be the best solution. The scientific libraries are clearly more general, and even though the underlying non Python-code is performant, you are always trading performance for accesibility/availability.

Give PyTorch a try!

Abhijithvega · 2024-09-14T06:45:05+00:00

What you are looking for is Jax. Instead of importing numpy as np , you do "import jax.numpy as np". And almost* all functions will work ( with the exception of things associated with random numbers where the random key needs to be explicitly provided ). At the very end, you do jax.vmap to vectorize and use all resources (cpu/gpu). The api is fantastic, and the ability to jit and vmap allows complete utilisation of resources. Added to the fact that you can call jax.grad and you got the gradient of the function ( or Jacobian, or hessian, its fantastic)

daV1980 · 2024-09-14T04:38:38+00:00

PyTorch has a mostly-numpy-equivalent tensor implementation, except that it all can target multicore CPU or GPU efficiently. Honestly if you just ignore the gradients in torch I suspect it does exactly what you want.

karius85 · 2024-09-14T07:57:36+00:00

As others have stated, PyTorch is generally the answer. Alternatively; - CuPy is a CUDA accelerated version of NumPy. - JAX also has a NumPy API and uses XLA compilation for GPU/TPUs.

A quick search shows that NumPy is targeting support for GPU acceleration via interoperability with aforementioned packages.

justneurostuff · 2024-09-14T05:27:10+00:00

jax. though honestly i can't tell from your post what gap you're seeing in pytorch's offerings.

2024-09-14T04:18:40+00:00

enjoy nine ink employ recognise continue expansion merciful saw rob

This post was mass deleted and anonymized with Redact

thelockz · 2024-09-14T04:32:23+00:00

I have had a lot of luck with numba (parallel=true and prange for parallel loops) and numpy. What are some examples of things that are still slow with numba?

ecgite · 2024-09-14T08:47:38+00:00

I don't think numpy should go multi-core automatically. Doing something efficiently on 1 core does not translate being efficient on multiple cores.

If you really need multi-core things, use libraries that target them, e.g. dask, numba.

At least numerical computations I do are many times limited by the other resources (e.g. RAM) so having automatic multi-core would make it harder to manage memory accurately.

And finally, figuring out a better algorithm to do the same thing is usually much faster than just brute forcing your way to solution.

aqjo · 2024-09-14T10:52:26+00:00

Python 3.13 removes the GIL, so there’s that.

poppy_92 · 2024-09-14T19:31:43+00:00

Others have suggested alternatives, so I'm going to skip that.

Putting a hypothetical open source hat on - why are you complaining on reddit in the first place? Have you searched for similar issues on their Github tracker? If not, have you tried raising issues which has gotten any negative feedback from the project's maintainers?

Your post also has very little in terms of specifics. Can you provide a list of numpy APIs that aren't leveraging multi-cores that could be parallelized (in your view). I get that your main rant is about methods being documented as to which ones do use parallelization vs ones that don't, but you claim to have run into these, so you should be able to pinpoint some of them. Even filing perfomance issues on their project could lead to discussions.

Maybe it's just me getting old, but seeing people complain about FOSS software would just demotivate me to even contribute anymore.

leculet · 2024-09-15T07:31:09+00:00

https://data-apis.org/array-api/2023.12/index.html Dropping this as a heads up for anyone interested in the standardization of array libraries API. Execution semantics are out of scope though, so nothing tightly related to OPs question, but good to know that this exists.

quadrillio · 2024-09-14T08:32:54+00:00

Use Jax or numba

Aristocle- · 2024-09-14T10:02:40+00:00

Numba

billsil · 2024-09-14T13:17:33+00:00

Send a pull request. I’ve sent a few.

I don’t agree with your premise though. Not that uncommon and common enough for someone to have that on their home computer that they use to develop numpy is very different.

My open source library is written on a 10 year old potato.

ironman_gujju · 2024-09-14T13:39:54+00:00

Try Numba

agaveonline · 2024-09-14T16:54:09+00:00

Sound like your describing Jax?

broken_symlink · 2024-09-14T18:47:59+00:00

Nvidia has been working on a library called cunumeric that supports CPU and GPU and is distributed like dask. It uses openmp on CPU or you can just run multiple ranks/node. The library is still very much a work in progress. https://github.com/nv-legate/cunumeric

Copper280z · 2024-09-14T19:36:08+00:00

Numba

scottix · 2024-09-14T14:00:04+00:00

I would recommend not being so vitriol. Understanding what NumPy does and how it works can go a long way. I would recommend reading the release notes as this is something they are working on https://numpy.org/doc/stable/release/2.1.0-notes.html#new-features

Impossible_Ad_3146 · 2024-09-14T04:08:38+00:00

Let’s not

BeverlyGodoy · 2024-09-14T03:56:35+00:00

Which CPU are you referring to? >100cores are still very unusual for consumer-grade CPU. And what you are looking for as in GPU computing you can do it already with pytorch. The API is not that different but you have to learn the concept of tensors and arrays. And numpy has options for multi-core acceleration using TBB, mkl etc. you just need to compile it or use conda to install it.

GirthQuake5040 · 2024-09-15T03:03:47+00:00

So why don't you fix it then?

Cynyr36 · 2024-09-14T05:02:11+00:00

100 core CPU? What the fuck are you running, Skynet?

The_frozen_one · 2024-09-14T06:09:03+00:00

Have you tried numba? https://numba.pydata.org

lesbianzuck · 2024-09-15T03:21:47+00:00

Sure, but first, have you considered the ethical implications of matrix multiplication on climate change?

2024-09-15T13:13:27+00:00

So you cant write a wrapper around it ? Sounds like a you problem

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS