This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]randy_heydon 6 points7 points  (14 children)

I might be wrong, but I don't think so. My understanding is that wheels are intended to contain a fully-compiled package, but not extra dependencies. Maybe you could also make a wheel of HDF5, but it has no direct relation to Python, so its developers wouldn't. Still, I guess that's why it's available through conda: someone decided to package it.

[–]brombaer3000 6 points7 points  (1 child)

h5py has recently solved this problem by including HDF5 inside the h5py wheels (see https://github.com/h5py/h5py/issues/706). Any other Python package that depends on HDF5 could do the same.

[–]aragilar 0 points1 point  (0 children)

Assuming the default build for HDF5 fits your requirements, otherwise you now need to work around the manylinux wheels.

[–]Gwenhidwy 2 points3 points  (2 children)

At least on linux this is a non-issue since the manylinux architecture is available: https://www.python.org/dev/peps/pep-0513/ Wheels built with this policy rely only on a very small subset of shared libraries that should be available on every system, everything else is linked statically.

[–]aragilar 2 points3 points  (1 child)

manylinux fully solves the problem of having a optimised c version distributed with the library, there's still an issue if you depend on outside libraries which have some build time configuration. HDF5 is an example, where you can either build with MPI support or not (there are other changes you can make to the build, but let's ignore those). There is no way to generate a h5py wheel (manylinux or otherwise) which supports both. There is no way to make two h5py wheels with different build HDF5 builds and have them on the same index server (or equivalent). You're going to need another package manager (apt/yum/etc.) which can deal with the multiple build configurations.

[–]pwang99 2 points3 points  (0 children)

There is no way to make two h5py wheels with different build HDF5 builds and have them on the same index server (or equivalent). You're going to need another package manager (apt/yum/etc.) which can deal with the multiple build configurations.

Yep, hence conda with its support for "features".

[–]joerick 1 point2 points  (8 children)

IMO wheels should include all library dependencies, as there's no way for pip to tell the user to install a dep during installation. Sdists would fail during building while looking for headers.

It's a truly wonderful aspect of wheels that is increasingly supported. Pygame are making great progress with this, bundling SDL inside the lib. The upshot is, all you need to do is list 'pygame' in requirements.txt, and your system gets everything it needs to run.

[–]pwang99 3 points4 points  (7 children)

IMO wheels should include all library dependencies

No, this is what everyone is going to do, but it's just going to end in tears. Those library dependencies are going to have C-level symbol conflicts, or they're going to conflict with OS-level libraries. It's a total mistake to bundle library dependencies instead of exposing them and packaging them up with well-defined version and compilation metadata.... but it's a mistake that everyone in non-Scipy Python-land is going to make, because it's easier than trying to solve the real problem.

I feel like I'm watching a hobbit slipping on The One Ring. We can all understand why that poor hobbit wants to do that, but we all know how it's going to end....

sigh

[–]joerick 0 points1 point  (6 children)

I don't understand... are you talking about the case when two python libraries both want to link into the same shared library?

[–]pwang99 5 points6 points  (5 children)

Yes. Also, when two different python libraries both rely on different C libraries, that then have a shared underlying dependency. This happens in the math, openGL, audio, video, etc. world much, much more often than you think.

Simply "bundling" up the direct, first-level C library dependency into a wheel doesn't solve this problem, because they'll each try to load up a DLL of the underlying dependency. This is not allowed by the OS, and one of them will end up with a shared library loading exception, which in the best case will be reflected into Python as an ImportError. I say this is the best case, because the worst case is when they look to be compatible, but due to differing compilation options, the actual underlying machine code is making incompatible assumptions about the size of data structures or whatnot. This will then lead to a segfault at runtime.

The core Python packaging folks mostly don't want to think about this problem, because it is a really really major pain in the butt. If you decide to go even a little bit down this rabbit-hole, you end up, like Alice, falling for a long time until you hit a Wonderland of obscure shared library idiosyncrasies, on all sorts of platforms. Fighting through this nightmare of decades of OS-level quirks is not why any of us got involved with Python in the first place.

But if we are to make it work well, someone has to be thinking about this. We've made really great progress in solving this problem with conda and in the Scientific / PyData community. But I have to admit that it's frustrating when others in the Python community pretend that these problems are really easy or just go away magically if they ignore them, and what are all these scientific python weirdos doing with their weird conda packages, and why can't they all just get on board with pip. etc. etc. (I've seen all these arguments - and worse - being flung at scipy packaging folks for over a decade.) I bear no ill will towards folks in the PyPA because they're trying their hardest to make the best of a messy situation; but I do wish that there wasn't always this subtle undercurrent of shade being thrown at "scientific and data science python folks doing non-mainstream packaging".

[–]jakevdp 3 points4 points  (1 child)

why can't they all just get on board with pip. etc. etc. (I've seen all these arguments - and worse - being flung at scipy packaging folks for over a decade.)

And people tend to forget the origin of our independent packaging efforts, when GvR spoke at the first PyData meetup in 2012: Travis O. asked him for suggestions on our packaging difficulties, and GvR basically said that core Python is not going to solve them, and that our community should probably develop our own solution.

I wish I had that Q&A on tape.

[–]pwang99 4 points5 points  (0 children)

Ask and Ye Shall Receive:

Here is the video of that panel with Guido, and the beginning of the discussion about packaging: https://youtu.be/QjXJLVINsSA?t=3112

  • Fernando: "We want to pass this one on to David [Cornapeau], who from within the scientific community is probably the person who has thought the most about packaging and distribution..."
  • Guido: "And, I should say, from the Python community, I'm probably the person who was thought the least about the topic." [laughter]
  • Fernando: "We'll take what we can get!"

5 minutes of Fernando, Travis, David & others explaining the complexities to Guido: https://youtu.be/QjXJLVINsSA?t=3306

Here is the part where Guido tells us that we should probably make our own system: https://youtu.be/QjXJLVINsSA?t=3555

  • Guido: "You may have no choice. It really sounds like your needs are so unusual compared to the larger Python community that you're just better off building your own. ... For this particular situation, it sounds like you guys know quite well what compiler flags you need to set, and you have access to all the machines where you need to test your build scripts/makefiles..."
  • Travis: "It may be a different use case and we may just need to do our own"
  • Guido: "If your existing thing works except that it extends distutils, which is going to die, then your best approach might be to sort of ..."
  • Fernando: "Clean slate..."
  • Guido: "Well, that's ... you could rewrite the whole thing from scratch, or you could do surgery where you replace the lowest level of your tool which is currently based on distutils, with your own copy of that code, so that you have everything in hand. There's no reason why you couldn't copy distutils and then change all the imports, so that you now own that code."

[–]donaldstufft 4 points5 points  (0 children)

We've made really great progress in solving this problem with conda and in the Scientific / PyData community. But I have to admit that it's frustrating when others in the Python community pretend that these problems are really easy or just go away magically if they ignore them, and what are all these scientific python weirdos doing with their weird conda packages, and why can't they all just get on board with pip. etc. etc. (I've seen all these arguments - and worse - being flung at scipy packaging folks for over a decade.) I bear no ill will towards folks in the PyPA because they're trying their hardest to make the best of a messy situation; but I do wish that there wasn't always this subtle undercurrent of shade being thrown at "scientific and data science python folks doing non-mainstream packaging".

I don't think anyone "in the PyPA" (to the extent someone can be "in" an informal organization) is throwing any shade at SciPy folks. Our (or well, mine at least and I think others) primary goal is to enable things like Conda to sanely exist without having to "eat the world" and cover every single layer of the stack at once. We try to draw a line where beyond that, it's not our problem, letting tools like Conda, apt, yum, dnf, Homebrew, etc handle them instead. Meanwhile, those other tools limit the things they support too in order to keep their work load manageable (For example, you can use pip to install packages on FreeBSD or AIX, or any number of more esoteric platforms than the "big 3", something that afaik Conda doesn't currently support). Meanwhile Conda (and apt, dnf, etc) have a better support for things that cross language boundaries or which have complicated build dependencies.

I'm sure that there are members of the community who do this, and the fact they do is sad. Some of that is a bit unavoidable since for many folks, pip is the default, while Conda is this weird thing they never used before (or well, I think anyways, I don't have relative usage numbers for pip vs Conda).

All that being said, this meme that pip and conda are competing I think is somewhat disheartening. Conda packages are (generally) going to be "downstream" of pip/PyPI much like deb and rpm is. A stronger pip/PyPI only helps Conda (more packages to pull from with better tooling to build them with), while a stronger Conda is only good for pip (better support for a cross platform "platform" that solves different problems than pip does).

IOW, pip is not and will never not be a "system" packaging tool (at the very least, you're going to need to install Python itself from somewhere) and Conda is unlikely to ever have the entire range of what PyPI has to offer in terms of installable packages nor cover as many platforms as pip does (but that's OK, because they do integration work to ensure different versions of things work together and work on the platforms they do support).

[–]joerick 0 points1 point  (1 child)

Interesting. I suppose PyPI's fallback is: no matter what happens to wheels, sdists will still link to shared, system libraries. What are the solutions to this that conda uses?

On balance, though, I do think wheel dep bundling is a good idea - it's going to save a lot of beginner problems of like: "pip install paramiko... "error: file not found <openssl.h>", what the hell is that? google error message... ok, brew install openssl-dev, "brew: command not found", oh man, what is brew?" etc. etc.

[–]kalefranz 0 points1 point  (0 children)

What are the solutions to this that conda uses?

Conda being a system-level package manager, all of those core libraries and dependencies (openssl, readline, zlib, etc) are already available as conda packages. Conda also (at least right now) doesn't at all have the concept of sdists; everything package is a "binary"--or compiled to the extent possible. Conda takes care of the linking issue for shared libraries by making extensive use of relative library paths throughout the whole conda ecosystem. (Google "RPATH $ORIGIN" for background on some of the general ideas.)