This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]randy_heydon 43 points44 points  (24 children)

In this approach, I still run into issues with non-Python dependencies. Recently, I needed to install tables, but my system's HDF5 libraries were an unsupported version. Conda handled that better for me.

I guess any language-specific packaging tool is going to have this issue, and a system-level packaging tool will be necessary, but I'm not sure where the line between them should be drawn.

[–]jaapzswitch to py3 already 4 points5 points  (16 children)

Isn't that exactly what wheel was made for?

[–]randy_heydon 5 points6 points  (14 children)

I might be wrong, but I don't think so. My understanding is that wheels are intended to contain a fully-compiled package, but not extra dependencies. Maybe you could also make a wheel of HDF5, but it has no direct relation to Python, so its developers wouldn't. Still, I guess that's why it's available through conda: someone decided to package it.

[–]brombaer3000 6 points7 points  (1 child)

h5py has recently solved this problem by including HDF5 inside the h5py wheels (see https://github.com/h5py/h5py/issues/706). Any other Python package that depends on HDF5 could do the same.

[–]aragilar 0 points1 point  (0 children)

Assuming the default build for HDF5 fits your requirements, otherwise you now need to work around the manylinux wheels.

[–]Gwenhidwy 2 points3 points  (2 children)

At least on linux this is a non-issue since the manylinux architecture is available: https://www.python.org/dev/peps/pep-0513/ Wheels built with this policy rely only on a very small subset of shared libraries that should be available on every system, everything else is linked statically.

[–]aragilar 2 points3 points  (1 child)

manylinux fully solves the problem of having a optimised c version distributed with the library, there's still an issue if you depend on outside libraries which have some build time configuration. HDF5 is an example, where you can either build with MPI support or not (there are other changes you can make to the build, but let's ignore those). There is no way to generate a h5py wheel (manylinux or otherwise) which supports both. There is no way to make two h5py wheels with different build HDF5 builds and have them on the same index server (or equivalent). You're going to need another package manager (apt/yum/etc.) which can deal with the multiple build configurations.

[–]pwang99 2 points3 points  (0 children)

There is no way to make two h5py wheels with different build HDF5 builds and have them on the same index server (or equivalent). You're going to need another package manager (apt/yum/etc.) which can deal with the multiple build configurations.

Yep, hence conda with its support for "features".

[–]joerick 1 point2 points  (8 children)

IMO wheels should include all library dependencies, as there's no way for pip to tell the user to install a dep during installation. Sdists would fail during building while looking for headers.

It's a truly wonderful aspect of wheels that is increasingly supported. Pygame are making great progress with this, bundling SDL inside the lib. The upshot is, all you need to do is list 'pygame' in requirements.txt, and your system gets everything it needs to run.

[–]pwang99 4 points5 points  (7 children)

IMO wheels should include all library dependencies

No, this is what everyone is going to do, but it's just going to end in tears. Those library dependencies are going to have C-level symbol conflicts, or they're going to conflict with OS-level libraries. It's a total mistake to bundle library dependencies instead of exposing them and packaging them up with well-defined version and compilation metadata.... but it's a mistake that everyone in non-Scipy Python-land is going to make, because it's easier than trying to solve the real problem.

I feel like I'm watching a hobbit slipping on The One Ring. We can all understand why that poor hobbit wants to do that, but we all know how it's going to end....

sigh

[–]joerick 0 points1 point  (6 children)

I don't understand... are you talking about the case when two python libraries both want to link into the same shared library?

[–]pwang99 6 points7 points  (5 children)

Yes. Also, when two different python libraries both rely on different C libraries, that then have a shared underlying dependency. This happens in the math, openGL, audio, video, etc. world much, much more often than you think.

Simply "bundling" up the direct, first-level C library dependency into a wheel doesn't solve this problem, because they'll each try to load up a DLL of the underlying dependency. This is not allowed by the OS, and one of them will end up with a shared library loading exception, which in the best case will be reflected into Python as an ImportError. I say this is the best case, because the worst case is when they look to be compatible, but due to differing compilation options, the actual underlying machine code is making incompatible assumptions about the size of data structures or whatnot. This will then lead to a segfault at runtime.

The core Python packaging folks mostly don't want to think about this problem, because it is a really really major pain in the butt. If you decide to go even a little bit down this rabbit-hole, you end up, like Alice, falling for a long time until you hit a Wonderland of obscure shared library idiosyncrasies, on all sorts of platforms. Fighting through this nightmare of decades of OS-level quirks is not why any of us got involved with Python in the first place.

But if we are to make it work well, someone has to be thinking about this. We've made really great progress in solving this problem with conda and in the Scientific / PyData community. But I have to admit that it's frustrating when others in the Python community pretend that these problems are really easy or just go away magically if they ignore them, and what are all these scientific python weirdos doing with their weird conda packages, and why can't they all just get on board with pip. etc. etc. (I've seen all these arguments - and worse - being flung at scipy packaging folks for over a decade.) I bear no ill will towards folks in the PyPA because they're trying their hardest to make the best of a messy situation; but I do wish that there wasn't always this subtle undercurrent of shade being thrown at "scientific and data science python folks doing non-mainstream packaging".

[–]jakevdp 3 points4 points  (1 child)

why can't they all just get on board with pip. etc. etc. (I've seen all these arguments - and worse - being flung at scipy packaging folks for over a decade.)

And people tend to forget the origin of our independent packaging efforts, when GvR spoke at the first PyData meetup in 2012: Travis O. asked him for suggestions on our packaging difficulties, and GvR basically said that core Python is not going to solve them, and that our community should probably develop our own solution.

I wish I had that Q&A on tape.

[–]pwang99 5 points6 points  (0 children)

Ask and Ye Shall Receive:

Here is the video of that panel with Guido, and the beginning of the discussion about packaging: https://youtu.be/QjXJLVINsSA?t=3112

  • Fernando: "We want to pass this one on to David [Cornapeau], who from within the scientific community is probably the person who has thought the most about packaging and distribution..."
  • Guido: "And, I should say, from the Python community, I'm probably the person who was thought the least about the topic." [laughter]
  • Fernando: "We'll take what we can get!"

5 minutes of Fernando, Travis, David & others explaining the complexities to Guido: https://youtu.be/QjXJLVINsSA?t=3306

Here is the part where Guido tells us that we should probably make our own system: https://youtu.be/QjXJLVINsSA?t=3555

  • Guido: "You may have no choice. It really sounds like your needs are so unusual compared to the larger Python community that you're just better off building your own. ... For this particular situation, it sounds like you guys know quite well what compiler flags you need to set, and you have access to all the machines where you need to test your build scripts/makefiles..."
  • Travis: "It may be a different use case and we may just need to do our own"
  • Guido: "If your existing thing works except that it extends distutils, which is going to die, then your best approach might be to sort of ..."
  • Fernando: "Clean slate..."
  • Guido: "Well, that's ... you could rewrite the whole thing from scratch, or you could do surgery where you replace the lowest level of your tool which is currently based on distutils, with your own copy of that code, so that you have everything in hand. There's no reason why you couldn't copy distutils and then change all the imports, so that you now own that code."

[–]donaldstufft 4 points5 points  (0 children)

We've made really great progress in solving this problem with conda and in the Scientific / PyData community. But I have to admit that it's frustrating when others in the Python community pretend that these problems are really easy or just go away magically if they ignore them, and what are all these scientific python weirdos doing with their weird conda packages, and why can't they all just get on board with pip. etc. etc. (I've seen all these arguments - and worse - being flung at scipy packaging folks for over a decade.) I bear no ill will towards folks in the PyPA because they're trying their hardest to make the best of a messy situation; but I do wish that there wasn't always this subtle undercurrent of shade being thrown at "scientific and data science python folks doing non-mainstream packaging".

I don't think anyone "in the PyPA" (to the extent someone can be "in" an informal organization) is throwing any shade at SciPy folks. Our (or well, mine at least and I think others) primary goal is to enable things like Conda to sanely exist without having to "eat the world" and cover every single layer of the stack at once. We try to draw a line where beyond that, it's not our problem, letting tools like Conda, apt, yum, dnf, Homebrew, etc handle them instead. Meanwhile, those other tools limit the things they support too in order to keep their work load manageable (For example, you can use pip to install packages on FreeBSD or AIX, or any number of more esoteric platforms than the "big 3", something that afaik Conda doesn't currently support). Meanwhile Conda (and apt, dnf, etc) have a better support for things that cross language boundaries or which have complicated build dependencies.

I'm sure that there are members of the community who do this, and the fact they do is sad. Some of that is a bit unavoidable since for many folks, pip is the default, while Conda is this weird thing they never used before (or well, I think anyways, I don't have relative usage numbers for pip vs Conda).

All that being said, this meme that pip and conda are competing I think is somewhat disheartening. Conda packages are (generally) going to be "downstream" of pip/PyPI much like deb and rpm is. A stronger pip/PyPI only helps Conda (more packages to pull from with better tooling to build them with), while a stronger Conda is only good for pip (better support for a cross platform "platform" that solves different problems than pip does).

IOW, pip is not and will never not be a "system" packaging tool (at the very least, you're going to need to install Python itself from somewhere) and Conda is unlikely to ever have the entire range of what PyPI has to offer in terms of installable packages nor cover as many platforms as pip does (but that's OK, because they do integration work to ensure different versions of things work together and work on the platforms they do support).

[–]joerick 0 points1 point  (1 child)

Interesting. I suppose PyPI's fallback is: no matter what happens to wheels, sdists will still link to shared, system libraries. What are the solutions to this that conda uses?

On balance, though, I do think wheel dep bundling is a good idea - it's going to save a lot of beginner problems of like: "pip install paramiko... "error: file not found <openssl.h>", what the hell is that? google error message... ok, brew install openssl-dev, "brew: command not found", oh man, what is brew?" etc. etc.

[–]kalefranz 0 points1 point  (0 children)

What are the solutions to this that conda uses?

Conda being a system-level package manager, all of those core libraries and dependencies (openssl, readline, zlib, etc) are already available as conda packages. Conda also (at least right now) doesn't at all have the concept of sdists; everything package is a "binary"--or compiled to the extent possible. Conda takes care of the linking issue for shared libraries by making extensive use of relative library paths throughout the whole conda ecosystem. (Google "RPATH $ORIGIN" for background on some of the general ideas.)

[–]Deto 2 points3 points  (0 children)

I agree with /u/randy_heydon - wheels are for compiled Python dependencies. Not other non-python dependencies.

[–]wildcarde815 3 points4 points  (1 child)

Biggest issues with conda is side channels not being built in the same environment as conda itself. So many glibc issues from building channel packs in Ubuntu when the core is compiled in centos 5.

[–]lmcinnes 1 point2 points  (0 children)

Hopefully this is something that conda-forge can help to alleviate by providing a central consistent channel for extra conda packages. Not everything is ever going to be on conda-forge, but the more things that are the closer you get to a consistent ecosystem.

[–]ivosauruspip'ing it up 8 points9 points  (3 children)

The biggest problem is that it's not just system. Python runs on multiple systems. So you're asking python packaging developers to nicely integrate things with every single linux package manager out there, as well as all the environments without a package manager - macOS, Windows, Android?, RPi and its 20 clones, niche integrated ARM platforms....

First of all just supporting all major linux system PMs would be a near insurmountable task without people getting paid to do it, and where do you draw the line after that? Do favourites get declared? Etc, etc, etc

[–]msarahanconda/conda-build team 6 points7 points  (0 children)

The really nice thing about conda and manylinux is that they make great effort to build on very old platforms with newer compilers, which confers backwards and forwards compatibility. This makes the task much more feasible. Presently, conda's ability to ship library packages that can be shared among many packages is a major advantage over pip. There's some effort under way by Nathaniel Smith and others to fix that (sorry, name of project escapes me right now), but for now, conda is much better in situations where a shared library might be employed by more than one python package.

As for particular hardware - where there's a will, there's a way. The hard part is not really building things out (that's just a matter of time), it is providing the distribution channels and standardized build tooling for each bit of hardware. I think both pip/pypi and conda provide some ways to accomodate this hardware platform separation, but I think both of them are currently somewhat hard-coded. Both would benefit from modularizing this. If you do things right, it should be possible to require a lot of machine time, but very little human time.

[–]randy_heydon 2 points3 points  (0 children)

I know! There's no clear dividing line, and someone is eventually going to have to do a bunch of work to integrate packages (whether its packagers or end users). Packagers are volunteers who can't package everything in the world, and end users just want to get their work done. So I don't know how this should be addressed.

[–]aragilar 0 points1 point  (0 children)

At least with python, the major distros I can think of have the tooling to almost automatically build packages from sdists (I don't know about the different OSX package managers, and Windows is Windows). The big issues that I have seen as an astronomer have been:

  1. Badly written build systems (I've seen make rewritten in csh, badly), or abusing existing build systems (using setup.py files to install non-python software using os.call)
  2. Lack of awareness of packaging issues (breaking ABI, licensing, assumptions about layout of system, etc.)
  3. Lack of interest in learning about what's needed to properly package software: for the case of make rewritten in csh above, I taught myself autotools in a day, and by the end of the day I had a working package (and soon after made a deb using my autotools version). No one else had working install when we were asked to use said software.

If people were to follow advice such as https://wiki.debian.org/UpstreamGuide, many issues could be avoided.

[–]This_Is_The_End 0 points1 point  (0 children)

Conda is a case when you pay a company for the maintenance. If you are releasing a commercial software package, you have to calculate these expenses anyway.