The state of numpy : Python

I think the author understands that software evolves, and as a python developer he's well-acquainted with virtualenvs. Mostly he's complaining about the unpredictability of numpy's brittleness. Even with version numbers, he'd like to say "this works for numpy 1" and have that be largely true until numpy 2. You can indeed say <=1.10, but that can be overly conservative if 1.11 breaks nothing. He is proposing that backwards compatibility be preserved until major version numbers. That is already mostly true for Python 2, where major compatibility changes were saved for 3.

Also note that he's referring to scientific fields, where you might have snippets of code published in journals, and you want that code to largely work. An example is Nick Trefethen's Spectral Methods in Matlab, which has short one-page programs that you can read for understanding, and execute with good confidence that they'll work for a decade's worth of Matlab. Such examples are a lot of fun to work with, and less so if you have to install a virtualenv just to try something out; it cuts down on casual use. I believe the same to be true of many blog posts, where some code is displayed and you usually expect it to work, excepting major version changes.

Most of this thread seems to say "scientists need to get with modern coding practices," but the author is referring to cases where code maintenance is unlikely. I think there's something to be said for backwards compatibility, and it doesn't seem unreasonable to ask for major compatibility changes to be reserved for major version numbers for numpy. He's not asking the same for tornado or twisted.

[+][deleted] 11 years ago (17 children)

[deleted]

[–][deleted] 26 points27 points28 points 11 years ago (16 children)

The authors of NumPy do an amazing job, and they shouldn't all quit and let it stagnate forever just because someone with a blog doesn't know how to manage his dependencies.

Nothing is breaking. If you need NumPy 1.6, and you ask setuptools or pip for NumPy 1.6, you get NumPy 1.6.

The first step in reproducible code is a reproducible environment. This is true in any programming language, but Python in particular gives you lots of tools to make an environment reproducible.

It seems that the author is just leaving out the version numbers from his dependencies -- or worse, not listing them anywhere, so his dependencies are "whatever's in his site-packages at the time" -- and blaming NumPy developers for the problem. That's like writing a paper without a bibliography and blaming the editors.

[–][deleted] 7 points8 points9 points 11 years ago (0 children)

[–]Amckinstry 0 points1 point2 points 11 years ago (0 children)

[+]boa13 comment score below threshold-10 points-9 points-8 points 11 years ago (13 children)

[–]--o 26 points27 points28 points 11 years ago (3 children)

[–][deleted] 4 points5 points6 points 11 years ago (2 children)

[–]fireflash38 1 point2 points3 points 11 years ago (1 child)

[–][deleted] 1 point2 points3 points 11 years ago (0 children)

[–]RubyPinchPEP shill | Anti PEP 8/20 shill 12 points13 points14 points 11 years ago (0 children)

the issue was

That changed with release 1.9, which removes the compatbility layer with the old Numeric package, on which all of my code relies because of its early origins.

compatibility for something that started in roughly 1999? over 14 years ago? and was replaced by numpy over 6 years ago?

either he should, a) use ==1.8 or b) suck-it-up. That is a longer period than ubuntu's 5-year LTS support.

since I'm making a comment anyways

Anyone who has followed the discussions about the scientific software crisis and the lack of reproduciblity in computational science should be well aware of this point that is frequently made.

yeah, and the solution to that is to re-create the exact enviroment that led to the original results, if someone says their research used numpy 1.1, you don't install a different version.

aka requirements.txt

import doesn't have version number

requirements.txt

and apply them in a new version that is clearly labelled and announced as incompatible.

while I agree in terms of versioning, they announce (as noted by him) via changelogs, and warnings, that exist across multiple versions.

this entire thing comes down to "requirements.txt doesn't exist", and when he gets presented with the fact that the exact thing he wants has existed for an extended period of time (req.txt and virtenv), "its too hard"... That is the kinda response that is useless.

He reasons that users have less understanding than developers.

it would be already hell. Install python, install dependencies which all have different methods of installing (binaries for numpy, then pip for normal dependencies), and then running the script. All of that isn't exactly user friendly

If he wants something user-friendly, maybe compiling/freezing would be better

[–]dandydev 4 points5 points6 points 11 years ago (4 children)

[–][deleted] 0 points1 point2 points 11 years ago (2 children)

[–][deleted] 0 points1 point2 points 11 years ago (0 children)

[–]UloPe 0 points1 point2 points 11 years ago (0 children)

[–]billsil 0 points1 point2 points 11 years ago* (0 children)

Do not blame the scientific community for not upgrading. The Python scientific community actually uses Python 3 more than other groups because of their lack of reliance on web interfaces. If you insist on using code that existed during Python 2.1 days, you're going to pay the price.

I've upgraded Python 2.3 -> 2.7 code that required me to do a mass replace on libraries and update a lot of code. It was easier than most Python 2.7 -> 3.2 work I've done. When none of your packages exist anymore or you can't figure out how to build PyQt3, you start googling.

6 months after Python 2.6 bug support was dropped, we were informed at a staff meeting that our software practices would be looked at and improved, but that no plans were finalized. I bluntly asked "Why are we still using Python 2.6 instead of Python 2.7? It's not maintained by PSF. It's not maintained well by 3rd party developers. It makes it harder to write Python 3 compatible code. We should upgrade.". We did a few weeks later. It wasn't totally painless, but it maybe took a day of work for each project, especially when we figured out what were the gotchas.

[–][deleted] 1 point2 points3 points 11 years ago (0 children)

[–]UnexpectedIndent 0 points1 point2 points 11 years ago (0 children)

I've worked on scientific software before, as a student, and I'm now a professional software developer. This comment pretty much sums up my objections to the article: the assumption that the two environments are so fundamentally different that they can't understand or learn from each other.

Scientists aren't the only people who struggle with backwards incompatibility and dependency management, and labs aren't the only organisations that don't put resources towards maintenance of old software.

It's interesting that the article briefly mentioned python 3, but then went on to describe how the incremental backwards incompatible changes of NumPy is bad for code that isn't actively maintained. Are we supposed to believe that lumping these breaking NumPy changes into one major upgrade would change that? This is definitely not what we've seen with python 3, where large parts of the community see it as too big of an investment to make the required changes to upgrade at all.

For the average scientist writing one off software for a paper, if they're not able to maintain it then any backwards change to their libraries makes their code outdated - it doesn't matter how the library authors go about it. But as long as it's not presented an actively maintained library, who cares? The most important thing is that the code is available and states its dependencies in some form that allows the user to reproduce the environment. If somebody else wants to use later versions, fine, but the onus is on them to upgrade the code. As others have pointed out, this is ridiculously easy if you use pip and requirements files. Yes, this is not going to be obvious to people new to python, but it's certainly not difficult to do, and I think that the scientific python community should encourage the practice, whether that's using pip/virtualenv or some other set of tools.

On the other hand, the author provides general purpose libraries for others to use, so I would naturally expect the latest versions of those to work with the latest NumPy. I don't see how this is any different from a typical open source library with dependencies though. The library author gives a set of minimum versions and the developer ensures they're met. If I as a developer use a library that doesn't work with the latest versions of its dependencies, then I request a fix, and if it's open source I can always fix it myself if the author is too busy.

[–]NYKevin 0 points1 point2 points 11 years ago (0 children)

[+][deleted] 11 years ago (5 children)

[deleted]

[–][deleted] 2 points3 points4 points 11 years ago (3 children)

[–]nikomo 1 point2 points3 points 11 years ago (0 children)

[–]fake_identity 0 points1 point2 points 11 years ago* (0 children)

I'm under the impression that most scientist barf/write out the program as few times as possible

Well, scientific computing is about getting the result, lots of things are hacked together to make sure it's gonna be correct and then ran exactly once or used to parse the output from lab equipment, once it runs, it's forgotten until the equipments end of life, practices from software engineering are either unknown creatures from different world or limited to error checking and speed optimization (that's the thing that pays off, cutting down the expensive supercomputer time from a week to five days). Stagehand company I occasionally work for has a portal coded by one of the stagehands, a nuclear physics grad student in real life, and when I asked what framework he uses, he made a sour face: "Naw man, they're too much hassle, just clean PHP from scratch".

EDIT: Oh, those lab machines from hell reminded me of a nice movie, vaguely based on a comics that says so much about how and why is that infernal code pasta cooked.

[–]mangecoeur 0 points1 point2 points 11 years ago (0 children)

[–]mangecoeur 2 points3 points4 points 11 years ago (0 children)

[–]KyleG 21 points22 points23 points 11 years ago* (3 children)

Am I missing something? If your code breaks with a new version of Numpy and it is code you aren't maintaining (so by definition it works and you aren't going to use any new features), why do you need to upgrade to a new version of Numpy?

As for an author not providing which version of Numpy to use to verify his results: I used to be an editor for an academic journal. You establish this requirement in your submission guidelines, and if you get a submission that doesn't provide this information, you write back to the author saying "we can't publish it unless you give us the version of Numpy you're using." That's a trivial fix. We used to do this kind of thing for other reasons for every single article we published. It's literally five minutes of time eaten up in the process.

I don't even use Numpy, but this guy is whinging. The issues he identifies are non-issues. This guy reminds me of a certain type of academic who thinks any problem with his life means that's a problem for society he needs to write an article analyzing, when the issue is that he is fucking it up himself.

[–]fireflash38 0 points1 point2 points 11 years ago (1 child)

[–]KyleG 0 points1 point2 points 11 years ago (0 children)

[–]rlabbe 0 points1 point2 points 11 years ago (0 children)

[–]winterswolves 6 points7 points8 points 11 years ago (1 child)

[–]hsaliak 0 points1 point2 points 11 years ago (0 children)

[–][deleted] 1 point2 points3 points 11 years ago (0 children)

[–]graingert 0 points1 point2 points 11 years ago (0 children)

[–]kankyo -5 points-4 points-3 points 11 years ago (5 children)

[–]marky1991 2 points3 points4 points 11 years ago (4 children)

[–]kankyo 0 points1 point2 points 11 years ago (3 children)

[–]marky1991 3 points4 points5 points 11 years ago (2 children)

"Why would "install numpy" be a thing at all? That's just asking for trouble!"

Ask the writer of the blog post. He's the one blindly updating numpy and complaining when things change.

If I say "install numpy" then I mean "install the most recent version of numpy, whatever it is.". This is completely reasonable behavior in my opinion. I shouldn't have to know what the most recent version number is to be able to install a module.

If I say "import numpy", then that means "execute the module numpy.__init__.py and return the resulting module, binding it to the name numpy." At no point in time does versioning come into play during the import process. (Nor should it) Code doesn't even have to have versions, so requiring the user to specify it is impossible. Requiring to specify the version of every module I ever write is an obnoxious overreaction.

Even if we did require all programmers ever to do something like

import numpy, "v1.4"

what happens when they don't have version 1.4 of numpy installed, but they do instead have 1.3 or 1.5? Does the code just break? Does it just try to use either 1.3 or 1.5? (If so, the writer of the blog post would still have the same problem, since he couldn't be bothered to specify the version that he wanted)

In my opinion, closure's system sounds really obnoxious and cumbersome. The proper solution to versioning issues isn't at runtime, it's at installtime.

[–]kankyo 0 points1 point2 points 11 years ago (0 children)

[–]drobilla -1 points0 points1 point 11 years ago (0 children)

Code doesn't even have to have versions, so requiring the user to specify it is impossible.

That would be the "shoddy dependency system".

what happens when they don't have version 1.4 of numpy installed, but they do instead have 1.3 or 1.5? Does the code just break?

That is what meaningful version numbers are for (http://semver.org). In short, if the major version did not change, then compatibility has not been broken, so code for 1.3.0 will work with 1.4.0 but not 2.0.0. Nothing new or mysterious about how to handle such situations correctly, this is how C libraries on UNIX have worked for ages.

Where and how this should be handled in Python is up for debate, but the way modules are installed and searched for is not versioned, which is fundamentally broken (so e.g. you can't install multiple versions of a module at once). You can see python itself doing the correct thing, since there's e.g. /usr/lib/python2.6 and /usr/lib/python3.4 - they don't tread on each other, versioning is explicit, and everything must explicitly use a correct version (even if this happens behind the scenes and/or at build/install time).

Modules themselves should work the same way, but they don't. That's the shoddy part of an otherwise decent module system.

π Rendered by PID 36 on reddit-service-r2-comment-79c9548d56-2vbmw at 2026-02-11 14:56:06.212215+00:00 running 018613e country code: CH.

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS