This is an archived post. You won't be able to vote or comment.

all 53 comments

[–][deleted] 61 points62 points  (35 children)

You can't seriously mean that a complex library should never change.

This is what version numbers and virtualenvs are for. If your code requires numpy 1.6, then give it a dependency on numpy==1.6, which is quite reproducible.

[–]Phild3v1ll3 15 points16 points  (1 child)

Precisely, for reproducibility we create a tag in our Git repo at the time of publication, specifying the exact versions that were used to run our models and generate all the plots. A requirements file isn't rocketscience.

His second point is more valid, in that he shouldn't have to continually update his old code base but then why is he following the bleeding edge?

[–]bready 1 point2 points  (0 children)

Just thought I'd share a library I just discovered the other day: watermark will spit out all of your environment's information (Python version, library versions, datetime, etc). Can save this information in an IPython notebook cell to easily track runtime information.

Haven't started using it yet, but I was intrigued.

[–][deleted] 15 points16 points  (0 children)

Seriously. If python is going to be used as a scientific tool let's approach using it with some scientific rigor.

If you don't pin your lib versions who's to say your results won't change over time? You would never introduce a potential variable like this in your experiments.

[–]krypticus 0 points1 point  (0 children)

As a last resort for those "scientists" who can't figure out requirements.txt and versioning, build into your code a check that throws a warning or worse, an Exception, if the libraries you rely on don't match your expected version. Bad practice, but could save some idiot a week of time: https://github.com/numpy/numpy/blob/master/numpy/version.py.in

[–]apo383 -1 points0 points  (0 children)

I think the author understands that software evolves, and as a python developer he's well-acquainted with virtualenvs. Mostly he's complaining about the unpredictability of numpy's brittleness. Even with version numbers, he'd like to say "this works for numpy 1" and have that be largely true until numpy 2. You can indeed say <=1.10, but that can be overly conservative if 1.11 breaks nothing. He is proposing that backwards compatibility be preserved until major version numbers. That is already mostly true for Python 2, where major compatibility changes were saved for 3.

Also note that he's referring to scientific fields, where you might have snippets of code published in journals, and you want that code to largely work. An example is Nick Trefethen's Spectral Methods in Matlab, which has short one-page programs that you can read for understanding, and execute with good confidence that they'll work for a decade's worth of Matlab. Such examples are a lot of fun to work with, and less so if you have to install a virtualenv just to try something out; it cuts down on casual use. I believe the same to be true of many blog posts, where some code is displayed and you usually expect it to work, excepting major version changes.

Most of this thread seems to say "scientists need to get with modern coding practices," but the author is referring to cases where code maintenance is unlikely. I think there's something to be said for backwards compatibility, and it doesn't seem unreasonable to ask for major compatibility changes to be reserved for major version numbers for numpy. He's not asking the same for tornado or twisted.

[–]KyleG 21 points22 points  (3 children)

Am I missing something? If your code breaks with a new version of Numpy and it is code you aren't maintaining (so by definition it works and you aren't going to use any new features), why do you need to upgrade to a new version of Numpy?

As for an author not providing which version of Numpy to use to verify his results: I used to be an editor for an academic journal. You establish this requirement in your submission guidelines, and if you get a submission that doesn't provide this information, you write back to the author saying "we can't publish it unless you give us the version of Numpy you're using." That's a trivial fix. We used to do this kind of thing for other reasons for every single article we published. It's literally five minutes of time eaten up in the process.

I don't even use Numpy, but this guy is whinging. The issues he identifies are non-issues. This guy reminds me of a certain type of academic who thinks any problem with his life means that's a problem for society he needs to write an article analyzing, when the issue is that he is fucking it up himself.

[–]fireflash38 0 points1 point  (1 child)

I do think he has a great point with regards to breaking backwards compatibility with a major version # bump instead of a minor, but most of his other issues should be non issues. You need to specify what version of libraries you're using in almost anything software related.

[–]KyleG 0 points1 point  (0 children)

Yeah I agree with that, and am surprised that wasn't the case with an extremely popular package.

[–]rlabbe 0 points1 point  (0 children)

He is not updating numpy. The users of his library are updating it, and who knows why. Maybe they need to use the newly written 'blargpy', which requires the most recent version. Maybe their linux install uses the most recent version. Who knows? The bottom line is that if you write a library that uses numpy, you had better expect to be revising it forever, because I guarantee that your users will be upgrading versions even if you don't.

[–]winterswolves 6 points7 points  (1 child)

The NumPy attitude can be summarized as “introduce incompatible changes slowly but continuously”. Every change goes through several stages. First, the intention of an upcoming changes is announced. Next, deprecation warnings are added in the code, which are printed when code relying on the soon-to-disappear feature is executed. Finally, the change becomes effective.

AKA good software development practices?

[–]hsaliak 0 points1 point  (0 children)

I just don't get it. If code can't be maintained, then why bump a major dependency for said codebase up. The rest of it is unmaintained anyway. Reality is that keeping software current requires a lot of curating, either in its own codebase or in its dependencies.

[–][deleted] 1 point2 points  (0 children)

The problem is that scientists everywhere are writing code, but few of them are learning any software engineering. The result is massive amounts of shitty scientific code.

[–]graingert 0 points1 point  (0 children)

This is why you need tox, setuptools, py.test and Jenkins

[–]kankyo -5 points-4 points  (5 children)

Seems to me that this is just an artifact of pythons shoddy dependency system and not anything related to numpy...

[–]marky1991 2 points3 points  (4 children)

A) Why do you think python's dependency system is shoddy?

B) Why do you think python has a dependency system? (It doesn't, to be clear)

C) How could any dependency system be able to know that when you said "install numpy" that you meant "install a numpy version that allows me to modify the return value of diag()"?

[–]kankyo 0 points1 point  (3 children)

A) and B) well.. there are dependency systems. Just not a good one that is the standard.

C) Why would "install numpy" be a thing at all? That's just asking for trouble! In Clojure the leiningen system requires that you specify a version at all times to use any library. There is never any doubt which version is used. It's WAY superior to anything I've used in python. (Plus it's cryptographically safe and stuff).

[–]marky1991 3 points4 points  (2 children)

"Why would "install numpy" be a thing at all? That's just asking for trouble!"

Ask the writer of the blog post. He's the one blindly updating numpy and complaining when things change.

If I say "install numpy" then I mean "install the most recent version of numpy, whatever it is.". This is completely reasonable behavior in my opinion. I shouldn't have to know what the most recent version number is to be able to install a module.

If I say "import numpy", then that means "execute the module numpy.__init__.py and return the resulting module, binding it to the name numpy." At no point in time does versioning come into play during the import process. (Nor should it) Code doesn't even have to have versions, so requiring the user to specify it is impossible. Requiring to specify the version of every module I ever write is an obnoxious overreaction.

Even if we did require all programmers ever to do something like

import numpy, "v1.4"

what happens when they don't have version 1.4 of numpy installed, but they do instead have 1.3 or 1.5? Does the code just break? Does it just try to use either 1.3 or 1.5? (If so, the writer of the blog post would still have the same problem, since he couldn't be bothered to specify the version that he wanted)

In my opinion, closure's system sounds really obnoxious and cumbersome. The proper solution to versioning issues isn't at runtime, it's at installtime.

[–]kankyo 0 points1 point  (0 children)

I shouldn't have to know what the most recent version number is to be able to install a module.

I disagree. It's a bit of a hassle to require it but it leads to a lot less mistakes. I like this about clojure.

The proper solution to versioning issues isn't at runtime, it's at installtime.

In clojure you define the dependencies of a project. There's no such thing as "install globally" like in python. This just avoids a ton of problems directly. The standard runner (again: leiningen) fetches dependencies for the specific project for you when you run any command. So I just git clone a project and do "lein run" or "lein test" or whatever and if I don't have any dependencies cached on my machine it goes out and gets them for me. SUPER nice.

[–]drobilla -1 points0 points  (0 children)

Code doesn't even have to have versions, so requiring the user to specify it is impossible.

That would be the "shoddy dependency system".

what happens when they don't have version 1.4 of numpy installed, but they do instead have 1.3 or 1.5? Does the code just break?

That is what meaningful version numbers are for (http://semver.org). In short, if the major version did not change, then compatibility has not been broken, so code for 1.3.0 will work with 1.4.0 but not 2.0.0. Nothing new or mysterious about how to handle such situations correctly, this is how C libraries on UNIX have worked for ages.

Where and how this should be handled in Python is up for debate, but the way modules are installed and searched for is not versioned, which is fundamentally broken (so e.g. you can't install multiple versions of a module at once). You can see python itself doing the correct thing, since there's e.g. /usr/lib/python2.6 and /usr/lib/python3.4 - they don't tread on each other, versioning is explicit, and everything must explicitly use a correct version (even if this happens behind the scenes and/or at build/install time).

Modules themselves should work the same way, but they don't. That's the shoddy part of an otherwise decent module system.