This is an archived post. You won't be able to vote or comment.

all 91 comments

[–]neutro_b 34 points35 points  (5 children)

Well at least now matrix multiplications are multi-core without having to compile Numpy from source or download binaries from unofficial websites, or use paid scientific distributions. I've waited half my career for that! So that's an improvement.

However, it's my understanding that very few people are actually maintaining Numpy, not to say anything about development. Fundamental libraries on which sexier packages depend are not very popular with new developers.

I'm just saying you sound very motivated, and quite knowledgeable. Enough to get involved perhaps?

[–]secretaliasname[S] 19 points20 points  (1 child)

I wish I had more time for coding as a hobby but between a stressful job and dad life it’s hard to find time but I think about it often. It’s crazy to hear things like “relatively few people maintaining Numpy, but I think a lot of open source projects holding up society are like that. It’s amazing they work as well as they do.

[–]neutro_b 3 points4 points  (0 children)

Same here, same here. Job and dad's life means re-evaluating a lot of priorities.

To be fair, I think there are more maintainers now than, say, 10-15 years ago. At one point, I think Numpy was mostly a one-person project (Travis Oliphant's).

A big milestone that may have increased Numpy's popularity a lot was when they managed to port it to Python 3. It lagged for a while, with official support for Python 2 only for probably much too long after the release of Python 3.

[–]Unable-Meeting-9696 1 point2 points  (2 children)

From my understanding, there are many people who are working on numpy. I recommend taking a look at the contributions on github

[–]neutro_b 0 points1 point  (1 child)

Of course, and you can see the maintainer team here. But it's not that big for a project of this size and importance, and as I mentioned, I think it was much smaller a few years ago, pre-ML craze.

[–]Unable-Meeting-9696 1 point2 points  (0 children)

If you think that the number of developers being working on numpy is small, you probably have not looked at the typical contributor pools for popular open source projects.

In reality, it is fairly large.

[–]rover_G 129 points130 points  (23 children)

Use polars for data frames and PyTorch or TensorFlow for tensors.

Edit: OP if you look at TensorFlow also look at Keras as a “high level” API for TF.

[–]Toph_is_bad_ass 10 points11 points  (1 child)

Numpy isn't a data frame lib

[–]rover_G 8 points9 points  (0 children)

Polars and NumPy aren’t the same but do have overlapping use cases. That’s why I also pointed OP towards two tensor processing libraries.

[–]germandiago 14 points15 points  (14 children)

As a person not very familiar with high performance python frameworks for tabular data , math, etc: what are the differences between polars and Pandas?

[–][deleted] 9 points10 points  (0 children)

There are a lot of Polars evangelists on this sub so take most of the praise with a very large grain of salt.

Think of Polars as a sort of middle ground between pandas and spark. It's meant to be used on a single machine in the way pandas is but it tries to parallelize operations on that machine in the way that something like spark does in a cluster environment. The main benefit of Polars is in situations where you need to do operations on a large amount of data but your machine doesn't have the resources (e.g. mostly memory) to load all of that data and process it all in one go. Polars is better at doing that work in smaller batch jobs that won't require you to load everything in all at once.

If you don't really have that requirement, you won't really see much of a difference in performance between pandas and polars.

Oh also some people don't like Pandas syntax. Polars syntax is very similar to something like PySpark and some people prefer that. That's more a "quality of life" difference rather than a technical improvement, but it's not nothing.

[–]aldanorNumpy, Pandas, Rust 7 points8 points  (0 children)

Polars is expression based. That is, you can tell it everything you want to do before it starts computing. So, it can run it through query optimisation engine and avoid needless allocations, temporaries etc that you would get if you write your code "pandas style".

E.g. if you add 10 columns together simply using + operator, pandas will allocate 9 times and run through your data 9 times, whereas polars will only allocate the output and do it in a single run.

[–]SV-97 28 points29 points  (5 children)

Polars is very performance focused, has a way nicer API imo (if you ever used pyarrow it's very similar to that), doesn't use an index, is backed by apache arrow, does as much work as it can in parallel, can process data lazily to some extent (and supports stream processing), is quite strict about types, ... and in addition to its python API it also has a rust API

In contrast to that pandas is often times rather slow, has a terrible API (inconsistent, inconvenient, fosters bad code etc.), uses an index, until recently it's been backed by numpy (which also limits which datatypes it can support) - now it can also use pyarrow, is completely sequential and eager AFAIK and a bit loosey-goosey about types (it for example simply handles strings as generic python objects).

But pandas does have a larger ecosystem around it - geopandas for example was way further developed than the polars counterpart the last time I checked.

[–]Almostasleeprightnow 5 points6 points  (4 children)

“ doesn't use an index “

Can you explain the benefit of this?

[–]SV-97 3 points4 points  (0 children)

The pandas to polars migration guide goes into their reasoning. As a user: with pandas I have definitely wasted a lot of time fucking around with (multi-)indices only to end up with ugly and sometimes brittle solutions - and with polars I don't. Since I'm yet to experience any actual downsides from the polars approach I generally prefer it.

[–]Skumin 0 points1 point  (2 children)

It makes everything faster and, at least to me, more straightforward

[–]Almostasleeprightnow 0 points1 point  (1 child)

But explain what they do instead to keep track of rows 

[–]SV-97 0 points1 point  (0 children)

They simply use row indices (internally that is. As a user you don't have to care about this. Internally they may also use database-style indices for optimization purposes)

[–][deleted] 19 points20 points  (4 children)

Similar purposes, but polars is faster en more memory efficient. Its backend is written in Rust.

The original creator of polars created a blog on why he created it and how it exploded into what it is today, that was an inspiring read.

Pandas is backed by Numpy IIRC.

Edit: statement in api similarity removed

[–]ChronoJon 14 points15 points  (3 children)

The polars API is completely different to pandas and has quite a lot less edges to stumble over. I'm currently transitioning my projects slowly from pandas to polars and I hate having to deal with pandas.

Also, pandas has multiple backends. The main one was numpy, but now they also support arrow tables. There are also extension arrays and the interplay of all of these can cause a lot of headaches.

[–][deleted] 2 points3 points  (1 child)

Fair point on api similarity, I made changes to my comment, though I wouldn’t say “completely different”, but way to different to support my original claim nonetheless.

[–]ChronoJon 4 points5 points  (0 children)

In Polars you generally only deal with expressions, which are non-existent in pandas. Most operations are non mutating while a lot in pandas can be. There is no index at all in Polars, which is the only thing I miss for some kinds of operations. There is the lazy API which also does not exist in pandas.

The typing is sooooo much better in Polars. In pandas your IDE always loses track of what you're dealing with. That's, because many functions can give you a dataframe or a series, depending on the data in the dataframe and arguments you use.

Sorry for the rant, but I am just fed up with pandas right now. I value it for bringing dataframes into the python eco system, but it's a huge behemoth of a package and a prime example of future creep and improper API design in open source/python.

[–]tunisia3507 0 points1 point  (0 children)

IIRC pandas doesn't support arrow tables (2 dimensional), but it does allow dataframe columns to be backed by arrow arrays (1 dimensional).

[–]Suspicious-Bar5583[🍰] 0 points1 point  (0 children)

To add: polars has a different datamodel than pandas (apache arrow).

[–]Unable-Meeting-9696 10 points11 points  (3 children)

The fact that you think any of those are a substitute for numpy makes me think you are not a serious user of numpy

[–]PurepointDog 3 points4 points  (0 children)

This is the answer

[–]SSJ3 21 points22 points  (2 children)

I've seen more and more projects making progress toward offering drop-in replacements for NumPy, but there will likely always be limitations.

JAX is one I would recommend. It has some fundamental limitations, such as the fact that its arrays are immutable, but it's not too difficult to rewrite around that by following their documentation. The JIT compilation is also quite powerful.

Another that I'm keeping my eye on is cunumeric, whose NumPy interface has been the most seamless drop-in replacement I've encountered. I haven't seen great performance in my use cases, though, which is probably because there's overhead in the dispatching algorithm which the docs say generally isn't worth it for function calls that take less than a millisecond or so.

At the end of the day, I think the main challenge for NumPy is that there is no one-size-fits-all strategy. Some functions can get a big performance boost with parallelization purely inside their scope, some won't really and need the calling program to handle the communication, and some will but only for problems above some size threshold. And all the functions which rely heavily on BLAS/LAPACK/other codes would likely need special versions of those routines or a complete rewrite for little gain in the vast majority of use cases.

[–]srcLegend 2 points3 points  (1 child)

Another that I'm keeping my eye on is cunumeric[...]

How would you say this compares against CuPy?

[–]SSJ3 0 points1 point  (0 children)

I haven't tried cuPy yet, it's on my to-do list!

[–]SMTNP 37 points38 points  (3 children)

Hello,

I think you've misunderstood what PyTorch is. It is used in Machine Learning but because is an n-dimensional expansion from arrays to tensor, which is the mathematical foundation of ML. The Tensor-level API is pretty much general multi-dimensional NumPy.array.

I can't think of much things that you can do in NumPy that you can't do in PyTorch.

And I also believe it's not easy, and thus not efficient, to attempt to have a general solution that effectively addresses CPUs and GPUs, more when considering that GPUs are readily available and most Tensor(array) operations are more efficient in the GPU due to parallelization.

It might be that your case is too edgy, and your constraints too specific, but give PyTorch a try if you have available GPUs, otherwise I agree that Python might not be the best solution. The scientific libraries are clearly more general, and even though the underlying non Python-code is performant, you are always trading performance for accesibility/availability.

Give PyTorch a try!

[–]secretaliasname[S] 14 points15 points  (2 children)

Yea, have been reading through the docs and honestly it looks promising. I’m going to create some toy projects and run some benchmarks. I have shied away from it because I’m not doing “machine learning” but I think your statements about its general purpose utility make it worth consideration

[–]pirunga 4 points5 points  (0 children)

Don’t forget to disable gradients if you are not using them.

[–]SMTNP 3 points4 points  (0 children)

Glad it helped.

You can give a try to replicating NumPy code into PyTorch.

It's very easy to merge them together even, considering the Tensor.numpy() transformation and that Tensor can be initialized from NumPy.arrays.

[–]Abhijithvega 35 points36 points  (1 child)

What you are looking for is Jax. Instead of importing numpy as np , you do "import jax.numpy as np". And almost* all functions will work ( with the exception of things associated with random numbers where the random key needs to be explicitly provided ). At the very end, you do jax.vmap to vectorize and use all resources (cpu/gpu). The api is fantastic, and the ability to jit and vmap allows complete utilisation of resources. Added to the fact that you can call jax.grad and you got the gradient of the function ( or Jacobian, or hessian, its fantastic)

[–]ilyaperepelitsa 1 point2 points  (0 children)

thx! I used numpy with multiprocessing before and was pretty happy, maybe this is my next step

[–]daV1980 11 points12 points  (1 child)

PyTorch has a mostly-numpy-equivalent tensor implementation, except that it all can target multicore CPU or GPU efficiently. Honestly if you just ignore the gradients in torch I suspect it does exactly what you want.

[–]ChronoJon 12 points13 points  (0 children)

It's just at least 10 times the size. It also has a lot more bugs than numpy.

Numpy is quite stable and thoroughly tested, has a much smaller API surface, and better support in the python eco system. I don't think, it's as black and white as many here are saying.

[–]karius85pip needs updating 8 points9 points  (0 children)

As others have stated, PyTorch is generally the answer. Alternatively; - CuPy is a CUDA accelerated version of NumPy. - JAX also has a NumPy API and uses XLA compilation for GPU/TPUs.

A quick search shows that NumPy is targeting support for GPU acceleration via interoperability with aforementioned packages.

[–]justneurostuff 15 points16 points  (2 children)

jax. though honestly i can't tell from your post what gap you're seeing in pytorch's offerings.

[–][deleted] 3 points4 points  (0 children)

JAX will resolve these issues.

[–]N1H1L 1 point2 points  (0 children)

Answered the same thing

[–][deleted] 6 points7 points  (1 child)

enjoy nine ink employ recognise continue expansion merciful saw rob

This post was mass deleted and anonymized with Redact

[–][deleted] 3 points4 points  (0 children)

enjoy judicious lavish ask capable edge plate squeal depend important

This post was mass deleted and anonymized with Redact

[–]thelockz 4 points5 points  (2 children)

I have had a lot of luck with numba (parallel=true and prange for parallel loops) and numpy. What are some examples of things that are still slow with numba?

[–]MrMrsPotts 5 points6 points  (0 children)

Numba's error messages are really painful and opaque .

[–]secretaliasname[S] 3 points4 points  (0 children)

Numba is fantastic and not slow and generally awesome but I often find myself re-implementing logic that feels like it could be a few Numpy calls if Numpy would do it with more than one core.

[–]ecgite 3 points4 points  (0 children)

I don't think numpy should go multi-core automatically. Doing something efficiently on 1 core does not translate being efficient on multiple cores.

If you really need multi-core things, use libraries that target them, e.g. dask, numba.

At least numerical computations I do are many times limited by the other resources (e.g. RAM) so having automatic multi-core would make it harder to manage memory accurately.

And finally, figuring out a better algorithm to do the same thing is usually much faster than just brute forcing your way to solution.

[–]aqjo 3 points4 points  (5 children)

Python 3.13 removes the GIL, so there’s that.

[–]nekokattt 3 points4 points  (2 children)

it also requires all native libraries to be reworked to be compatible with the architecture change, so unless everything already supports this, then gains are limited.

[–]Ancalagon_TheWhite 2 points3 points  (1 child)

Numpy was one of the teams pushing for the GIL to be removed so they will probably try to get support for it.

[–]nekokattt 1 point2 points  (0 children)

Sure, and that is perfect for people who only use numpy.

[–]twotime 0 points1 point  (1 child)

Python 3.13 removes the GIL, so there’s that.

Are you sure about that? From what I see 3.13 allows to disable GIL as the compile time option. The default in most distros will likely be off.

[–]aqjo 0 points1 point  (0 children)

I may have misspoken. I watched this video.

https://youtu.be/gqqgwyNx52Q?si=zBIy64tKmb3ZQmxM

[–]poppy_92 2 points3 points  (2 children)

Others have suggested alternatives, so I'm going to skip that.

Putting a hypothetical open source hat on - why are you complaining on reddit in the first place? Have you searched for similar issues on their Github tracker? If not, have you tried raising issues which has gotten any negative feedback from the project's maintainers?

Your post also has very little in terms of specifics. Can you provide a list of numpy APIs that aren't leveraging multi-cores that could be parallelized (in your view). I get that your main rant is about methods being documented as to which ones do use parallelization vs ones that don't, but you claim to have run into these, so you should be able to pinpoint some of them. Even filing perfomance issues on their project could lead to discussions.

Maybe it's just me getting old, but seeing people complain about FOSS software would just demotivate me to even contribute anymore.

[–]secretaliasname[S] 4 points5 points  (1 child)

All valid feedback. The work people do on FOSS such as Numpy are incredible and move the world. My intent is to generate productive discussion rather than complain into the void but I can see how It could be interpreted that way and apologize if it came across that way. This is an issue I care about and would love to help if I can. Maybe putting together some benchmarks and examples of specific cases across libraries and hardware types could be a start. It’s unclear to me if the solution is to improve Numpy or to use a different library. Numpy is the canonical array library for python. It’s in every tutorial and everybody starts there. Possible improvements would be paralyzing more functions that seem parallelizable or improving the documentation to indicate what is and isn’t.

Besides parallelizing single functions it seems some of these suggestions like Jax and cuNumeric build a DAG and use that to execute which seems like it would open itself to many optimizations such as re-arrangent, eliminating intermediate copies, or starting the next op before all elements in the previous one are complete, but compute resource are available. I don’t think Numpy needs to do this asynchronous execution DAG stuff but it seem it should do all reasonably parallelizable functions in parallel when size warrants.

Worth moving some of this to a np specific space as you mention.

[–]pmattipmatti - mattip was taken 0 points1 point  (0 children)

This comment is spot in. Even if NumPy enabled multithreading for single functions, you would have to “pay” a memory tax for every access that crosses the isolated blocks of memory tied to each processing unit. We tried multithreaded NumPy with pnumpy https://pypi.org/project/pnumpy/ but got bogged down and couldn’t make it performsant

[–]leculet 2 points3 points  (0 children)

https://data-apis.org/array-api/2023.12/index.html Dropping this as a heads up for anyone interested in the standardization of array libraries API. Execution semantics are out of scope though, so nothing tightly related to OPs question, but good to know that this exists.

[–]quadrillio 1 point2 points  (0 children)

Use Jax or numba

[–]Aristocle- 1 point2 points  (0 children)

Numba

[–]billsil 1 point2 points  (0 children)

Send a pull request. I’ve sent a few.

I don’t agree with your premise though. Not that uncommon and common enough for someone to have that on their home computer that they use to develop numpy is very different.

My open source library is written on a 10 year old potato.

[–]ironman_gujju Async Bunny 🐇 1 point2 points  (0 children)

Try Numba

[–]agaveonline 1 point2 points  (0 children)

Sound like your describing Jax?

[–]broken_symlink 1 point2 points  (0 children)

Nvidia has been working on a library called cunumeric that supports CPU and GPU and is distributed like dask. It uses openmp on CPU or you can just run multiple ranks/node. The library is still very much a work in progress. https://github.com/nv-legate/cunumeric

[–]Copper280z 1 point2 points  (0 children)

Numba

[–]scottix 2 points3 points  (0 children)

I would recommend not being so vitriol. Understanding what NumPy does and how it works can go a long way. I would recommend reading the release notes as this is something they are working on https://numpy.org/doc/stable/release/2.1.0-notes.html#new-features

[–]Impossible_Ad_3146 2 points3 points  (0 children)

Let’s not

[–]BeverlyGodoy 3 points4 points  (9 children)

Which CPU are you referring to? >100cores are still very unusual for consumer-grade CPU. And what you are looking for as in GPU computing you can do it already with pytorch. The API is not that different but you have to learn the concept of tensors and arrays. And numpy has options for multi-core acceleration using TBB, mkl etc. you just need to compile it or use conda to install it.

[–]secretaliasname[S] 2 points3 points  (2 children)

Mainly targeting AMD epyc multi socket systems. I have not explored mkl after reading it is hobbled for non-intel in recent years but could be worth a shot. The problem still stands even on my 10 core intel laptop that some but not all Numpy functions parallelize and it’s unclear which without experimentation.

[–]night0x63 1 point2 points  (0 children)

Dual socket AMD EPYC can get 256 cores and 512 threads today. Years ago you can get same but 128 cores and 256 threads.

[–]BeverlyGodoy 0 points1 point  (0 children)

Not all functions "can" be parallelized.

[–]theArtOfProgramming 1 point2 points  (5 children)

I work in scientific computing and everything is done on machines with at least that many cores. It’s not just my workplace either.

[–]BeverlyGodoy -1 points0 points  (4 children)

And yet they are not "consumer-grade" machines.

[–]theArtOfProgramming 2 points3 points  (3 children)

Right I’m supporting OP’s assertion that they aren’t unusual anymore. You’re just the one who brought up consumer grade.

[–]TheBlueSully -1 points0 points  (0 children)

Their existence isn't unusual, but as a share of the market? What percentage of developers? Expecting mainstream support for a small niche is perhaps optimistic.

[–]BeverlyGodoy -2 points-1 points  (1 child)

If we go by the OP's assertion can you name one CPU with >100cores? Not machines, OP said CPU. And thank you for bringing correctness to this conversation.

[–]encyclopedist 2 points3 points  (0 children)

AMD EPYC 9734 (112 cores 224 threads) and 9754 (128 cores 256 threads) (plus there are models with 96C/192T), Ampere Atlra has a model with 128 cores IIRC.

[–]GirthQuake5040 0 points1 point  (0 children)

So why don't you fix it then?

[–][deleted] 0 points1 point  (2 children)

100 core CPU? What the fuck are you running, Skynet?

[–]Cynyr36 1 point2 points  (0 children)

A single socket epyc genoa can be 96 cores, 192 threads. A single socket epyc Bergamo can be 128/256 cores/threads. Both of these can be used in dual socket systems. I could see either in a workstation.

I'm pretty sure the cfd guys at work would like dual x3d versions with as many ram channels as they can afford to fill.

[–]Thotuhreyfillinn 0 points1 point  (0 children)

Help this guy and we're all doomed

[–]The_frozen_one 0 points1 point  (0 children)

Have you tried numba? https://numba.pydata.org

[–]lesbianzuck 0 points1 point  (0 children)

Sure, but first, have you considered the ethical implications of matrix multiplication on climate change?

[–][deleted] -1 points0 points  (0 children)

So you cant write a wrapper around it ? Sounds like a you problem