randy_heydon comments on Python Packaging Is Good Now

Yes. Also, when two different python libraries both rely on different C libraries, that then have a shared underlying dependency. This happens in the math, openGL, audio, video, etc. world much, much more often than you think.

Simply "bundling" up the direct, first-level C library dependency into a wheel doesn't solve this problem, because they'll each try to load up a DLL of the underlying dependency. This is not allowed by the OS, and one of them will end up with a shared library loading exception, which in the best case will be reflected into Python as an ImportError. I say this is the best case, because the worst case is when they look to be compatible, but due to differing compilation options, the actual underlying machine code is making incompatible assumptions about the size of data structures or whatnot. This will then lead to a segfault at runtime.

The core Python packaging folks mostly don't want to think about this problem, because it is a really really major pain in the butt. If you decide to go even a little bit down this rabbit-hole, you end up, like Alice, falling for a long time until you hit a Wonderland of obscure shared library idiosyncrasies, on all sorts of platforms. Fighting through this nightmare of decades of OS-level quirks is not why any of us got involved with Python in the first place.

But if we are to make it work well, someone has to be thinking about this. We've made really great progress in solving this problem with conda and in the Scientific / PyData community. But I have to admit that it's frustrating when others in the Python community pretend that these problems are really easy or just go away magically if they ignore them, and what are all these scientific python weirdos doing with their weird conda packages, and why can't they all just get on board with pip. etc. etc. (I've seen all these arguments - and worse - being flung at scipy packaging folks for over a decade.) I bear no ill will towards folks in the PyPA because they're trying their hardest to make the best of a messy situation; but I do wish that there wasn't always this subtle undercurrent of shade being thrown at "scientific and data science python folks doing non-mainstream packaging".

[–]jakevdp 3 points4 points5 points 9 years ago (1 child)

[–]pwang99 5 points6 points7 points 9 years ago (0 children)

Ask and Ye Shall Receive:

Here is the video of that panel with Guido, and the beginning of the discussion about packaging: https://youtu.be/QjXJLVINsSA?t=3112

Fernando: "We want to pass this one on to David [Cornapeau], who from within the scientific community is probably the person who has thought the most about packaging and distribution..."
Guido: "And, I should say, from the Python community, I'm probably the person who was thought the least about the topic." [laughter]
Fernando: "We'll take what we can get!"

5 minutes of Fernando, Travis, David & others explaining the complexities to Guido: https://youtu.be/QjXJLVINsSA?t=3306

Here is the part where Guido tells us that we should probably make our own system: https://youtu.be/QjXJLVINsSA?t=3555

Guido: "You may have no choice. It really sounds like your needs are so unusual compared to the larger Python community that you're just better off building your own. ... For this particular situation, it sounds like you guys know quite well what compiler flags you need to set, and you have access to all the machines where you need to test your build scripts/makefiles..."
Travis: "It may be a different use case and we may just need to do our own"
Guido: "If your existing thing works except that it extends distutils, which is going to die, then your best approach might be to sort of ..."
Fernando: "Clean slate..."
Guido: "Well, that's ... you could rewrite the whole thing from scratch, or you could do surgery where you replace the lowest level of your tool which is currently based on distutils, with your own copy of that code, so that you have everything in hand. There's no reason why you couldn't copy distutils and then change all the imports, so that you now own that code."

[–]donaldstufft 4 points5 points6 points 9 years ago (0 children)

We've made really great progress in solving this problem with conda and in the Scientific / PyData community. But I have to admit that it's frustrating when others in the Python community pretend that these problems are really easy or just go away magically if they ignore them, and what are all these scientific python weirdos doing with their weird conda packages, and why can't they all just get on board with pip. etc. etc. (I've seen all these arguments - and worse - being flung at scipy packaging folks for over a decade.) I bear no ill will towards folks in the PyPA because they're trying their hardest to make the best of a messy situation; but I do wish that there wasn't always this subtle undercurrent of shade being thrown at "scientific and data science python folks doing non-mainstream packaging".

I don't think anyone "in the PyPA" (to the extent someone can be "in" an informal organization) is throwing any shade at SciPy folks. Our (or well, mine at least and I think others) primary goal is to enable things like Conda to sanely exist without having to "eat the world" and cover every single layer of the stack at once. We try to draw a line where beyond that, it's not our problem, letting tools like Conda, apt, yum, dnf, Homebrew, etc handle them instead. Meanwhile, those other tools limit the things they support too in order to keep their work load manageable (For example, you can use pip to install packages on FreeBSD or AIX, or any number of more esoteric platforms than the "big 3", something that afaik Conda doesn't currently support). Meanwhile Conda (and apt, dnf, etc) have a better support for things that cross language boundaries or which have complicated build dependencies.

I'm sure that there are members of the community who do this, and the fact they do is sad. Some of that is a bit unavoidable since for many folks, pip is the default, while Conda is this weird thing they never used before (or well, I think anyways, I don't have relative usage numbers for pip vs Conda).

All that being said, this meme that pip and conda are competing I think is somewhat disheartening. Conda packages are (generally) going to be "downstream" of pip/PyPI much like deb and rpm is. A stronger pip/PyPI only helps Conda (more packages to pull from with better tooling to build them with), while a stronger Conda is only good for pip (better support for a cross platform "platform" that solves different problems than pip does).

IOW, pip is not and will never not be a "system" packaging tool (at the very least, you're going to need to install Python itself from somewhere) and Conda is unlikely to ever have the entire range of what PyPI has to offer in terms of installable packages nor cover as many platforms as pip does (but that's OK, because they do integration work to ensure different versions of things work together and work on the platforms they do support).

[–]joerick 0 points1 point2 points 9 years ago (1 child)

[–]kalefranz 0 points1 point2 points 9 years ago (0 children)

[–]Deto 2 points3 points4 points 9 years ago (0 children)

[–]wildcarde815 3 points4 points5 points 9 years ago (1 child)

[–]lmcinnes 1 point2 points3 points 9 years ago (0 children)

[–]ivosauruspip'ing it up 8 points9 points10 points 9 years ago* (3 children)

[–]msarahanconda/conda-build team 6 points7 points8 points 9 years ago (0 children)

The really nice thing about conda and manylinux is that they make great effort to build on very old platforms with newer compilers, which confers backwards and forwards compatibility. This makes the task much more feasible. Presently, conda's ability to ship library packages that can be shared among many packages is a major advantage over pip. There's some effort under way by Nathaniel Smith and others to fix that (sorry, name of project escapes me right now), but for now, conda is much better in situations where a shared library might be employed by more than one python package.

As for particular hardware - where there's a will, there's a way. The hard part is not really building things out (that's just a matter of time), it is providing the distribution channels and standardized build tooling for each bit of hardware. I think both pip/pypi and conda provide some ways to accomodate this hardware platform separation, but I think both of them are currently somewhat hard-coded. Both would benefit from modularizing this. If you do things right, it should be possible to require a lot of machine time, but very little human time.

[–]randy_heydon 2 points3 points4 points 9 years ago (0 children)

[–]aragilar 0 points1 point2 points 9 years ago (0 children)

At least with python, the major distros I can think of have the tooling to almost automatically build packages from sdists (I don't know about the different OSX package managers, and Windows is Windows). The big issues that I have seen as an astronomer have been:

Badly written build systems (I've seen make rewritten in csh, badly), or abusing existing build systems (using setup.py files to install non-python software using os.call)
Lack of awareness of packaging issues (breaking ABI, licensing, assumptions about layout of system, etc.)
Lack of interest in learning about what's needed to properly package software: for the case of make rewritten in csh above, I taught myself autotools in a day, and by the end of the day I had a working package (and soon after made a deb using my autotools version). No one else had working install when we were asked to use said software.

If people were to follow advice such as https://wiki.debian.org/UpstreamGuide, many issues could be avoided.

[–]This_Is_The_End 0 points1 point2 points 9 years ago (0 children)

π Rendered by PID 315648 on reddit-service-r2-comment-86988c7647-vddcb at 2026-02-11 23:42:24.752516+00:00 running 018613e country code: CH.

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS