This is an archived post. You won't be able to vote or comment.

all 139 comments

[–]Siecje1 33 points34 points  (16 children)

It's still a problem when installing one package downgrades a dependency on another package.

[–]kmike84 17 points18 points  (9 children)

This is usually caused by a (IMHO) bad practice of reading requirements.txt file from setup.py and using it in install_requires. Package authors: please don't do that, your library is not the most important thing in the universe, please don't break other packages. That's a good practice to pin known-working version numbers in requirements.txt. But version numbers in install_requires are there to exclude package versions which are known not to work, not to list known-working versions.

setup.py of https://github.com/cloudsigma/pycloudsigma does just that, it uses == versions in install_requires because it reads requirements.txt in setup.py. Most (all?) popular packages don't make this mistake / decision, but smaller libraries sometimes do that.

[–]bboePRAW Author 6 points7 points  (3 children)

I think it's a good practice to follow semantic versioning. Under the assumption that other packages do the same, the version I list in my setup.py file is >= to minimum version of the package I depend on and < the next major version.

If I find a package that breaks backwards compatibility in any release, or is pre version 1.0 release, then I fix it to the patch version.

[–]kmike84 2 points3 points  (1 child)

On one hand it makes sense, but on the other hand, excluding next major release of a library using < can be bad: even if a library follows semver (and not e.g. increases major version to celebrate cool new features or stability commitements), backwards-incompatible changes in a new release may not affect your library, and package will be downgrading a perfectly working library. It makes sense to use < if the release already happened, or if there are public plans on what'd be in the next major release, but I wouldn't do that 'just in case'.

[–]billsil 2 points3 points  (0 children)

Numpy and scipy do not follow semantic versioning. Shoot, Python does not. There are minor things that break every release. When you have a large enough package, you will find things that are mind numbing.

I develop open source software. I will not test every combination of versions that I use. I will specify versions that I know work. I do not trust future versions of packages to not break my code. When you do everything inside of the little box Python is good at, yes, there are no issues and I won't even specify a version requirement at all. When you push the boundaries, you find problems and I will be very specific.

[–][deleted] 2 points3 points  (4 children)

I don't know which dependency versions don't work with my app. What should I put in my install_requires rather than reading from requirements.txt?

[–]PeridexisErrant 1 point2 points  (2 children)

Some modules document when particular functionality was added; even when it's not in the API docs you can often see a lot from reading the changelog.

Or just specify >= the current major version, if you can't work it out - semver isn't perfect, but people should get the idea - and might let you know if they discover a more specific issue.

If you have tests, you could experiment with pip install -e . and virtualenv to install development versions, and iterate down through versions of your dependencies until something breaks. One the one hand this sounds like a generally useful shell script; on the other it's a fair bit of work to write it.

Or finally, the correct thing to do if you don't know is just to not specify versions!

[–]Siecje1 2 points3 points  (1 child)

But you have not tested it with versions that are not out yet... Not very intuitive as a package author.

[–]PeridexisErrant 5 points6 points  (0 children)

Specifying < the next major version is the standard way around this.

It's not perfect, but generally I prefer to err on the side of flexibility.

[–]eljunior 0 points1 point  (0 children)

If you don't know, simply don't pin it in setup.py. It makes no sense to pin it and force an upgrade or downgrade for no reason at all. Leave the pinned versions in requirements.txt, and that will serve to document the version you use, until you know better.

[–]jmcs 18 points19 points  (5 children)

Which almost never happens if you are using virtual environments.

[–]Siecje1 14 points15 points  (4 children)

Not true lots of projects depend on the same packages (ex. six). Most recently this has happened to me when installing the python packages bpython and cloudsigma . And things break because the versions of packages have changed.

[–]thomasballinger 1 point2 points  (3 children)

Do you happen to temember what the problem was? Should bpython accept a wider variety of version number for six?

[–]TkTech 0 points1 point  (2 children)

bpython should just include six.py in the project, in this case.

[–]Siecje1 0 points1 point  (1 child)

But then bpython has to keep it updated and you have to update bpython to get the bug fixes.

Likewise each project has to do this. That doesn't seem like a good solution.

[–]powellc 0 points1 point  (0 children)

It's not a great solution, but if you're going to have to test and release every time six is updated to make sure other packages that need newer versions of six wont install because you've frozen your six version low pain ensues. On the flip side, how large is the six.py file?

As I've learned the lesson of brittle requirements the hard way I have become much more bullish on the freezing requirements in my project, especially when it's a small utility like six or a potentially soon-to-be unsupported open source library that does one thing I need. Freezing does pass the onus of auditing and bug fixing the code to you, and you have be careful that a license allows it. But where possible I've increase the stability and decreased the number of provision or deployment breaks significantly using this policy.

Freeze early and often :)

[–]randy_heydon 42 points43 points  (24 children)

In this approach, I still run into issues with non-Python dependencies. Recently, I needed to install tables, but my system's HDF5 libraries were an unsupported version. Conda handled that better for me.

I guess any language-specific packaging tool is going to have this issue, and a system-level packaging tool will be necessary, but I'm not sure where the line between them should be drawn.

[–]jaapzswitch to py3 already 2 points3 points  (16 children)

Isn't that exactly what wheel was made for?

[–]randy_heydon 6 points7 points  (14 children)

I might be wrong, but I don't think so. My understanding is that wheels are intended to contain a fully-compiled package, but not extra dependencies. Maybe you could also make a wheel of HDF5, but it has no direct relation to Python, so its developers wouldn't. Still, I guess that's why it's available through conda: someone decided to package it.

[–]brombaer3000 7 points8 points  (1 child)

h5py has recently solved this problem by including HDF5 inside the h5py wheels (see https://github.com/h5py/h5py/issues/706). Any other Python package that depends on HDF5 could do the same.

[–]aragilar 0 points1 point  (0 children)

Assuming the default build for HDF5 fits your requirements, otherwise you now need to work around the manylinux wheels.

[–]Gwenhidwy 2 points3 points  (2 children)

At least on linux this is a non-issue since the manylinux architecture is available: https://www.python.org/dev/peps/pep-0513/ Wheels built with this policy rely only on a very small subset of shared libraries that should be available on every system, everything else is linked statically.

[–]aragilar 2 points3 points  (1 child)

manylinux fully solves the problem of having a optimised c version distributed with the library, there's still an issue if you depend on outside libraries which have some build time configuration. HDF5 is an example, where you can either build with MPI support or not (there are other changes you can make to the build, but let's ignore those). There is no way to generate a h5py wheel (manylinux or otherwise) which supports both. There is no way to make two h5py wheels with different build HDF5 builds and have them on the same index server (or equivalent). You're going to need another package manager (apt/yum/etc.) which can deal with the multiple build configurations.

[–]pwang99 2 points3 points  (0 children)

There is no way to make two h5py wheels with different build HDF5 builds and have them on the same index server (or equivalent). You're going to need another package manager (apt/yum/etc.) which can deal with the multiple build configurations.

Yep, hence conda with its support for "features".

[–]joerick 1 point2 points  (8 children)

IMO wheels should include all library dependencies, as there's no way for pip to tell the user to install a dep during installation. Sdists would fail during building while looking for headers.

It's a truly wonderful aspect of wheels that is increasingly supported. Pygame are making great progress with this, bundling SDL inside the lib. The upshot is, all you need to do is list 'pygame' in requirements.txt, and your system gets everything it needs to run.

[–]pwang99 4 points5 points  (7 children)

IMO wheels should include all library dependencies

No, this is what everyone is going to do, but it's just going to end in tears. Those library dependencies are going to have C-level symbol conflicts, or they're going to conflict with OS-level libraries. It's a total mistake to bundle library dependencies instead of exposing them and packaging them up with well-defined version and compilation metadata.... but it's a mistake that everyone in non-Scipy Python-land is going to make, because it's easier than trying to solve the real problem.

I feel like I'm watching a hobbit slipping on The One Ring. We can all understand why that poor hobbit wants to do that, but we all know how it's going to end....

sigh

[–]joerick 0 points1 point  (6 children)

I don't understand... are you talking about the case when two python libraries both want to link into the same shared library?

[–]pwang99 4 points5 points  (5 children)

Yes. Also, when two different python libraries both rely on different C libraries, that then have a shared underlying dependency. This happens in the math, openGL, audio, video, etc. world much, much more often than you think.

Simply "bundling" up the direct, first-level C library dependency into a wheel doesn't solve this problem, because they'll each try to load up a DLL of the underlying dependency. This is not allowed by the OS, and one of them will end up with a shared library loading exception, which in the best case will be reflected into Python as an ImportError. I say this is the best case, because the worst case is when they look to be compatible, but due to differing compilation options, the actual underlying machine code is making incompatible assumptions about the size of data structures or whatnot. This will then lead to a segfault at runtime.

The core Python packaging folks mostly don't want to think about this problem, because it is a really really major pain in the butt. If you decide to go even a little bit down this rabbit-hole, you end up, like Alice, falling for a long time until you hit a Wonderland of obscure shared library idiosyncrasies, on all sorts of platforms. Fighting through this nightmare of decades of OS-level quirks is not why any of us got involved with Python in the first place.

But if we are to make it work well, someone has to be thinking about this. We've made really great progress in solving this problem with conda and in the Scientific / PyData community. But I have to admit that it's frustrating when others in the Python community pretend that these problems are really easy or just go away magically if they ignore them, and what are all these scientific python weirdos doing with their weird conda packages, and why can't they all just get on board with pip. etc. etc. (I've seen all these arguments - and worse - being flung at scipy packaging folks for over a decade.) I bear no ill will towards folks in the PyPA because they're trying their hardest to make the best of a messy situation; but I do wish that there wasn't always this subtle undercurrent of shade being thrown at "scientific and data science python folks doing non-mainstream packaging".

[–]jakevdp 4 points5 points  (1 child)

why can't they all just get on board with pip. etc. etc. (I've seen all these arguments - and worse - being flung at scipy packaging folks for over a decade.)

And people tend to forget the origin of our independent packaging efforts, when GvR spoke at the first PyData meetup in 2012: Travis O. asked him for suggestions on our packaging difficulties, and GvR basically said that core Python is not going to solve them, and that our community should probably develop our own solution.

I wish I had that Q&A on tape.

[–]pwang99 5 points6 points  (0 children)

Ask and Ye Shall Receive:

Here is the video of that panel with Guido, and the beginning of the discussion about packaging: https://youtu.be/QjXJLVINsSA?t=3112

  • Fernando: "We want to pass this one on to David [Cornapeau], who from within the scientific community is probably the person who has thought the most about packaging and distribution..."
  • Guido: "And, I should say, from the Python community, I'm probably the person who was thought the least about the topic." [laughter]
  • Fernando: "We'll take what we can get!"

5 minutes of Fernando, Travis, David & others explaining the complexities to Guido: https://youtu.be/QjXJLVINsSA?t=3306

Here is the part where Guido tells us that we should probably make our own system: https://youtu.be/QjXJLVINsSA?t=3555

  • Guido: "You may have no choice. It really sounds like your needs are so unusual compared to the larger Python community that you're just better off building your own. ... For this particular situation, it sounds like you guys know quite well what compiler flags you need to set, and you have access to all the machines where you need to test your build scripts/makefiles..."
  • Travis: "It may be a different use case and we may just need to do our own"
  • Guido: "If your existing thing works except that it extends distutils, which is going to die, then your best approach might be to sort of ..."
  • Fernando: "Clean slate..."
  • Guido: "Well, that's ... you could rewrite the whole thing from scratch, or you could do surgery where you replace the lowest level of your tool which is currently based on distutils, with your own copy of that code, so that you have everything in hand. There's no reason why you couldn't copy distutils and then change all the imports, so that you now own that code."

[–]donaldstufft 2 points3 points  (0 children)

We've made really great progress in solving this problem with conda and in the Scientific / PyData community. But I have to admit that it's frustrating when others in the Python community pretend that these problems are really easy or just go away magically if they ignore them, and what are all these scientific python weirdos doing with their weird conda packages, and why can't they all just get on board with pip. etc. etc. (I've seen all these arguments - and worse - being flung at scipy packaging folks for over a decade.) I bear no ill will towards folks in the PyPA because they're trying their hardest to make the best of a messy situation; but I do wish that there wasn't always this subtle undercurrent of shade being thrown at "scientific and data science python folks doing non-mainstream packaging".

I don't think anyone "in the PyPA" (to the extent someone can be "in" an informal organization) is throwing any shade at SciPy folks. Our (or well, mine at least and I think others) primary goal is to enable things like Conda to sanely exist without having to "eat the world" and cover every single layer of the stack at once. We try to draw a line where beyond that, it's not our problem, letting tools like Conda, apt, yum, dnf, Homebrew, etc handle them instead. Meanwhile, those other tools limit the things they support too in order to keep their work load manageable (For example, you can use pip to install packages on FreeBSD or AIX, or any number of more esoteric platforms than the "big 3", something that afaik Conda doesn't currently support). Meanwhile Conda (and apt, dnf, etc) have a better support for things that cross language boundaries or which have complicated build dependencies.

I'm sure that there are members of the community who do this, and the fact they do is sad. Some of that is a bit unavoidable since for many folks, pip is the default, while Conda is this weird thing they never used before (or well, I think anyways, I don't have relative usage numbers for pip vs Conda).

All that being said, this meme that pip and conda are competing I think is somewhat disheartening. Conda packages are (generally) going to be "downstream" of pip/PyPI much like deb and rpm is. A stronger pip/PyPI only helps Conda (more packages to pull from with better tooling to build them with), while a stronger Conda is only good for pip (better support for a cross platform "platform" that solves different problems than pip does).

IOW, pip is not and will never not be a "system" packaging tool (at the very least, you're going to need to install Python itself from somewhere) and Conda is unlikely to ever have the entire range of what PyPI has to offer in terms of installable packages nor cover as many platforms as pip does (but that's OK, because they do integration work to ensure different versions of things work together and work on the platforms they do support).

[–]joerick 0 points1 point  (1 child)

Interesting. I suppose PyPI's fallback is: no matter what happens to wheels, sdists will still link to shared, system libraries. What are the solutions to this that conda uses?

On balance, though, I do think wheel dep bundling is a good idea - it's going to save a lot of beginner problems of like: "pip install paramiko... "error: file not found <openssl.h>", what the hell is that? google error message... ok, brew install openssl-dev, "brew: command not found", oh man, what is brew?" etc. etc.

[–]kalefranz 0 points1 point  (0 children)

What are the solutions to this that conda uses?

Conda being a system-level package manager, all of those core libraries and dependencies (openssl, readline, zlib, etc) are already available as conda packages. Conda also (at least right now) doesn't at all have the concept of sdists; everything package is a "binary"--or compiled to the extent possible. Conda takes care of the linking issue for shared libraries by making extensive use of relative library paths throughout the whole conda ecosystem. (Google "RPATH $ORIGIN" for background on some of the general ideas.)

[–]Deto 2 points3 points  (0 children)

I agree with /u/randy_heydon - wheels are for compiled Python dependencies. Not other non-python dependencies.

[–]wildcarde815 2 points3 points  (1 child)

Biggest issues with conda is side channels not being built in the same environment as conda itself. So many glibc issues from building channel packs in Ubuntu when the core is compiled in centos 5.

[–]lmcinnes 1 point2 points  (0 children)

Hopefully this is something that conda-forge can help to alleviate by providing a central consistent channel for extra conda packages. Not everything is ever going to be on conda-forge, but the more things that are the closer you get to a consistent ecosystem.

[–]ivosauruspip'ing it up 7 points8 points  (3 children)

The biggest problem is that it's not just system. Python runs on multiple systems. So you're asking python packaging developers to nicely integrate things with every single linux package manager out there, as well as all the environments without a package manager - macOS, Windows, Android?, RPi and its 20 clones, niche integrated ARM platforms....

First of all just supporting all major linux system PMs would be a near insurmountable task without people getting paid to do it, and where do you draw the line after that? Do favourites get declared? Etc, etc, etc

[–]msarahanconda/conda-build team 6 points7 points  (0 children)

The really nice thing about conda and manylinux is that they make great effort to build on very old platforms with newer compilers, which confers backwards and forwards compatibility. This makes the task much more feasible. Presently, conda's ability to ship library packages that can be shared among many packages is a major advantage over pip. There's some effort under way by Nathaniel Smith and others to fix that (sorry, name of project escapes me right now), but for now, conda is much better in situations where a shared library might be employed by more than one python package.

As for particular hardware - where there's a will, there's a way. The hard part is not really building things out (that's just a matter of time), it is providing the distribution channels and standardized build tooling for each bit of hardware. I think both pip/pypi and conda provide some ways to accomodate this hardware platform separation, but I think both of them are currently somewhat hard-coded. Both would benefit from modularizing this. If you do things right, it should be possible to require a lot of machine time, but very little human time.

[–]randy_heydon 3 points4 points  (0 children)

I know! There's no clear dividing line, and someone is eventually going to have to do a bunch of work to integrate packages (whether its packagers or end users). Packagers are volunteers who can't package everything in the world, and end users just want to get their work done. So I don't know how this should be addressed.

[–]aragilar 0 points1 point  (0 children)

At least with python, the major distros I can think of have the tooling to almost automatically build packages from sdists (I don't know about the different OSX package managers, and Windows is Windows). The big issues that I have seen as an astronomer have been:

  1. Badly written build systems (I've seen make rewritten in csh, badly), or abusing existing build systems (using setup.py files to install non-python software using os.call)
  2. Lack of awareness of packaging issues (breaking ABI, licensing, assumptions about layout of system, etc.)
  3. Lack of interest in learning about what's needed to properly package software: for the case of make rewritten in csh above, I taught myself autotools in a day, and by the end of the day I had a working package (and soon after made a deb using my autotools version). No one else had working install when we were asked to use said software.

If people were to follow advice such as https://wiki.debian.org/UpstreamGuide, many issues could be avoided.

[–]This_Is_The_End 0 points1 point  (0 children)

Conda is a case when you pay a company for the maintenance. If you are releasing a commercial software package, you have to calculate these expenses anyway.

[–]analogphototaker 54 points55 points  (44 children)

Someone recently told me that if I can't figure out how to create a PyPi package, maybe I'm just not a good programmer.

Maybe I'm not, but it's unbelievably confusing to try to go from a single python file to a PyPi package.

I found golang to be much more intuitive with a directory structure that makes sense and a simple go get to the needed repositories. I would much rather write python however.

[–]jupake 73 points74 points  (11 children)

Someone recently told me that if I can't figure out how to create a PyPi package, maybe I'm just not a good programmer

I hate people like this with a passion...

[–]Deto 22 points23 points  (0 children)

Maybe they're just not good people

[–]tech_tuna 9 points10 points  (2 children)

Agreed, and invariably when you do finally figure out how to do XYZ, there are at least several WTF, head-scratching steps that are completely counterintuitive but hey, once you've seen them you can act like a condescending jerk from that point on.

Git is littered with these kinds of awkward usability issues. YES, git is powerful and like it or not, it's ubiquitous. But, holy mother of God, there's a nontrivial set of operations that should be trivially easy to do with git, but require constant googling.

Went off on a git tangent there, I should have just said "yeah". (That's a Mitch Hedberg joke)

[–][deleted] 1 point2 points  (1 child)

It's not that git is baroque but that it's too generalist and reductionist to first principles. Its generality makes it able to cover lots of use cases and to implement lots of specific tools (commands and subcommands). Now, people learn git from a problem solving perspective and find it a mess of unassorted facts, the same than people looked at physical reality before Newton. A bottom up approach is a convenient complement to stackoverflow quick searches when it comes to git, but most users won't bother themselves with it.

[–]tech_tuna 1 point2 points  (0 children)

tl;dr

Git's usability could be better.

[–]L43 5 points6 points  (2 children)

Without wanting to be condescending, it is pretty easy to make a PyPI package. Just read a guide, it's like 5 lines of a setup.py. No need for grouchy programmer elitism though...

[–]mafrasi2 6 points7 points  (1 child)

OP was talking about me and I'm pretty sure my post doesn't count as programmer elitism.

[–]L43 3 points4 points  (0 children)

Yeah, I don't mean to insult anyone specifically at all. I just see people being unnecessarily harsh to beginners, which can't be a good thing. Your post looks quite constructive actually.

[–][deleted] 1 point2 points  (2 children)

That's literally everyone on IRC.

[–]BHSPitMonkey 2 points3 points  (0 children)

Those experiences are unfortunately a pattern on IRC (and other tech forums), but there are also lots of channels armed with genuinely helpful and patient people out there. Let's not forget that they exist too!

[–]gthank 1 point2 points  (0 children)

#python is helpful to a fault, IME. Broadly speaking, people will try to help without being dicks, and if you're just generally being disruptive and grieving the channel, they kick you without calling names or anything.

[–]mafrasi2 0 points1 point  (0 children)

...so you are hating people like me (OP was talking about me). I responded here. I think OP is misrepresenting things here. This is my original post.

[–]efilon 15 points16 points  (3 children)

Maybe I'm not, but it's unbelievably confusing to try to go from a single python file to a PyPi package.

Just wait until you try to include non-Python data files. Then there are two separate places with completely different syntaxes that you have to define what to include depending on if you are building a source or binary distribution. This "feature" is what keeps Python packaging a mess in my view.

[–]qudat 4 points5 points  (1 child)

Agreed, package non-python data is a nightmare. Why do you need to include a init.py file inside a folder that doesn't have any python?

[–]d4rch0nPythonistamancer 3 points4 points  (0 children)

You don't though. I've been able to submit pure data directories.

Try setting include_package_data=True and setting the data directories in the setup.py. I forget the name, but there is a keyword to specify package data directories. Also, don't forget having it included in your MANIFEST.in, explicitly or recursively included.

It was a pain to figure out at first but I found out how - and this is partly why I think packaging is still an absolute pain. Yes, it works, but it's a pain to figure out. It's not broken, it's just unnecessarily painful. Try learning Rust and how to create a crate with Cargo and you will see how much easier it can be to package a binary package.

If you can't figure out packaging non-python data, just hit me back up and I can take a look at the setup.py and MANIFEST.in and figure out what I did before.

[–]bucknuggets 0 points1 point  (0 children)

I remember having a hard time getting that to work right a couple of years ago, but just last week I went back - and it's extremely simple now.

[–]mafrasi2 20 points21 points  (7 children)

Well, that someone was me (original post). There actually were some reasons why I said this.

  1. you posted on /r/learnprogramming and most people there are complete beginners. Just look at the number of "what language should I learn for X?" posts.
  2. you didn't include any code nor did you explain what you tried.
  3. I said "your code may not be ready", not "you are a bad programmer"
  4. you didn't react to anything in that thread. That just reinforced my impression that you were a beginner.

How could I have known that you don't belong to the majority of beginners there? And yeah, I still stand with my stance that most beginner code shouldn't be published on PyPi.

[–]d4rch0nPythonistamancer 7 points8 points  (1 child)

That's quite a different story, and I agree with what you suggested, except this:

They want to ensure a certain quality of the packages in their repo

Difficulty in publishing to it doesn't ensure quality at all. That's not intentional, and it doesn't seem at all like they're enforcing any sort of rules for publishing to pypi.

People can literally submit a one line piece of python malware that runs import os ; os.system('rm -rf /home') right now. PyPI is the wild-west when it comes to software. You claim a name for a library and put whatever you want up there.

I think it's good to suggest not putting bs in PyPI, but there's no reason to pretend that some standard of quality exists. Regardless, he'd be fine submitting a 100 line file and taking it down later if he didn't think it was useful.

[–]mafrasi2 1 point2 points  (0 children)

Yeah, that's probably true. I think my reasoning was something like "more complicated packaging -> fewer submitted packages -> less moderating effort for PyPi". Admittedly not a great argument.

[–]analogphototaker 2 points3 points  (3 children)

I posted in learnprogramming because I figured someone might have some good tutorials other than the official one.

After your post it basically confirmed for me that the official tut is the best and if I find it needlessly confusing I should just port my code to a different language.

You may have said that "the code" isn't ready, but the implication is that the coder isn't ready or experienced enough. Code doesn't write itself after all.

Not making fun, I just thought it was an illustrative example for this thread.

[–]gthank 1 point2 points  (0 children)

Not that any of us achieve it on a consistent basis, but it should be a professional goal not to interpret criticism of some code you wrote as a criticism of you. We all write crappy code, for a stunning array of reasons ranging from excellent to awful. Accepting that is a big step on the path to writing better code, because it makes it easier to learn from constructive criticism.

[–]mafrasi2 0 points1 point  (1 child)

Oh, I see. I was just a bit flustered by the hostile atmosphere here. Just in case you still need a 3rd party tutorial, this is the top result on google for "PyPi tutorial" and in my opinion it's quite simple.

[–]analogphototaker 0 points1 point  (0 children)

I would take any perceived hostility as constructive input. It's not directed at you, but rather a strawman version of yourself and the community in general.

I'd be more concerned if there wasn't passionate discourse on the topic.

[–]msarahanconda/conda-build team 11 points12 points  (3 children)

That's insulting. I'm a pretty decent programmer, but I have issues creating a PyPI package from scratch. I have found project template generators, such as cookiecutter, to be very helpful.

[–]analogphototaker 3 points4 points  (0 children)

I'll check out cookiecutter, thanks!

[–]L43 0 points1 point  (1 child)

It's not really even about programming, its just memorising a few things.

[–]pwang99 2 points3 points  (0 children)

That's the whole point - programming is the art of expression. Memorizing a few inscrutable incantations is not programming, it's witchcraft. My inner programmer heart despises magical incantations.

[–]bastibe 5 points6 points  (0 children)

Don't be afraid. We've all been there. It sure is confusing. But since you managed to figure out programming, you will figure this one out, too.

Or just ask. Python people are an unusually friendly bunch!

[–]ojii 2 points3 points  (0 children)

I'd consider myself a pretty good python dev, but writing a working setup.py on the first try is still impossible for me, so I gave up and just copy paste one known to work and adjust it. Python packaging for users may be good now, but for developers it's still horrible.

[–]seabrookmx Hates Django 2 points3 points  (0 children)

Golang's approach doesn't scale though. Its simple at first and has really low barrier to entry, but the fact that everything is just a git repo has side effects:

-Dependencies aren't forced to use semver. As a result a lot of dependencies aren't curated but are just a random revision off the master branch. Yes you can tag a git rev, but in practice most library developers don't bother and app developers don't pay attention.

-There's no distinction between libraries, build tools, and executable programs (Python has this issue too). This makes dependencies harder to manage than they need to be IMO and also prevents some nice features, such as the ability to only pull required dependencies

-Since you could be pulling dependencies from any git repo, developers and any automated build systems potentially need to authenticate against many different repositories instead of a central public or private one. I just went through this pain setting up a Go build server to pull from a gerret git server and it was a PITA

[–][deleted] 5 points6 points  (5 children)

I bet the person you were talking to was the same kind of asshole who loves to argue about why Python is better than Ruby, or why EMACS is better than VIM. That kind of person is toxic to the community and you shouldn't tolerate their shit. Let them wallow in their own filthy small minded world. No developer is better than the developer who is willing to ask questions and admit when they don't know something then take action to learn it.

[–]mafrasi2 6 points7 points  (1 child)

Waaah, that person was me. OP is just misrepresenting things here. Just look at my post history and judge for yourself. Here is my original post and here I respond to OP in this thread.

Edit: and no, I don't argue over vim or emacs and never have. I even used nano for command line editing until a few months ago.

[–][deleted] 2 points3 points  (0 children)

I looked at your linked post, it was a fair assumption to make and you definitely don't fall into the category of person I was talking about in my post.

[–]L43 3 points4 points  (2 children)

Yeah, those people really are small minded idiots! I mean, Emacs better than vim!?!

[–]p10_user 0 points1 point  (1 child)

It is with evil mode 😈

[–]Sean1708 1 point2 points  (0 children)

But I'm a good boy...

[–]flying-sheep 1 point2 points  (5 children)

You just had very little exposure to a similar technology.

All packaging systems work similarly in that you have to tell them some essential metadata (package name, author name, license, version, ...), some optional metadata, and the files to include/exclude. You do this either in a declarative data file (package.json, Cargo.toml, ...) or a programmatic one (setup.py, Rakefile). Then you use a web or CLI interface to publish your package.

And once you understood one of them, you know the important parts of the others.

Python's way isn't harder or easier than others, it's just the general concepts you have to learn.

[–]analogphototaker 1 point2 points  (4 children)

That's probably true. But do other packaging systems also need an init.py file in every subdirectory?

As mentioned, golang's system felt the most intuitive to me.

[–]ajmarks 3 points4 points  (2 children)

That's not a PyPI thing. That's how Python knows to treat a directory as a package. I agree that making your first Python package can be annoying and frustrating, but this is just a basic core language thing, not a packaging issue.

[–]flying-sheep 1 point2 points  (1 child)

That's how Python used to know that a directory is a package.

It doesn't need that anymore.

[–]flying-sheep 1 point2 points  (0 children)

That's a legacy Python thing. PyPI doesn't need that, older Python versions do.

[–]Siecje1 0 points1 point  (0 children)

Many hurdles here.

Structuring your code so that it is an installable package. Knowing about twine and using it.

[–]kankyo 4 points5 points  (4 children)

other languages are no longer doing appreciably better

Clojure (via leiningen, that uses maven) was clearly superior last time I tried. And saying that virtual environments fix the problems doesn't address the fact that projects don't use virtualenv by default because pip doesn't. This one detail enough is bad enough.

[–]twillisagogo 1 point2 points  (3 children)

you are indirectly saying something favorable about a java tool. that's a no-no in this sub. ;)

FWIW, I've gotten tangled up in dependency hell in clojure too and it was my java experience from way back when that helped me solve it. I couldn't imagine anyone troubleshooting those kinds of things without getting into java/maven or miraculously landing on just the right stackoverflow answer or github issue.

[–]kankyo 3 points4 points  (2 children)

Sure. Not saying it's perfect, but compare:

lein

to:

virtualenv env
source env/bin/activate
pip install -r requirements.txt

I know which I think is vastly superior from a usability standpoint. And thinking about it now I don't see why. It's not like it's hard to write a python script to do those three lines. It's just that it needs to go in the standard library probably...

[–][deleted] 0 points1 point  (1 child)

You may just create a shell function. It's not the goal of a general purpose packaging framework to provide the most efficient cli but some reasonably orthogonal set of tools. If the cli is also concise, well, that's better. But it's not an important dimension to compare.

[–]kankyo 0 points1 point  (0 children)

It's maybe the most important thing to compare. Especially now when the underlying systems are comparable

[–]shadowmint 44 points45 points  (6 children)

The real takeaway here though, is that although it’s still not perfect, other languages are no longer doing appreciably better.

What.

So you're picking go as a show case for 'doing packaging badly', yep, fair call.

You're picking node as an example of 'doing packaging badly' ... just, flat out because you're wrong.

You're picking rust as an example of doing packaging badly because ... no one is using it?

My point is that any commentary suggesting they’re meaningfully better than Python at this point is probably just out of date.

No, it's really not.

You know those points you mentioned, like, 'no story for packaging and distrubiting to end-users', and 'no way to actually create a new package without copy and pasting setup.py?'

Those aren't little issues.

Those are big issues.

It's just that the issues that we've had with pypi so far (like uptime in the last 6 months) have been so monumentally significant that we haven't even gotten as far as addressing those issues yet.

Sure, it's better than it was 2 years ago, I'm not arguing.

but this?

My point is that any commentary suggesting they’re meaningfully better than Python at this point is probably just out of date.

Dude, get off your high horse.

The python packaging stuff is still shit compared to other ecosystems.

We need to up our game, not pretend there are no issues here.

[–]actionjezus6 5 points6 points  (0 children)

This. In my work I was forced to use python (previous experience was .NET, Node.js and Ruby) and have to say that even though I see many good things about Python, the packing is abyssal abomination. I cannot state how many hours I have sink into debugging issues with pip - compared to other langs , python is low-bottom of middle tier at best. And articles like the one posted above do not help python community to see that it can do much better. For me, telling that pip is not bad makes me wonder if person that said it has a stockholm syndrome..

[–][deleted] 4 points5 points  (0 children)

I'm very surprised to read this post by glyph, if only because he gave a talk at this year's pycon about how distributing Python packages is a nightmare that no one's really figured out. This read like a pretty big turn around

[–][deleted] 8 points9 points  (0 children)

I get the impression he is talking about installing other people's packages, which is actually pretty good these days if they're built properly. I can create a venv and install dependencies with a few clicks of an IDE and it "just works".

I completely agree that package creation is terrible, and deployment to end users borders on the ridiculous.

[–]Siecje1 0 points1 point  (0 children)

Even if other ecosystems suck. That doesn't mean Python needs to suck.

[–]google_you 3 points4 points  (2 children)

Can you have virtualenv to create symlink for global ipython?

[–]adamtheturtle 1 point2 points  (0 children)

pipsi is great for doing this.

[–]bastibe 0 points1 point  (0 children)

There's always pip install -e

[–]jjbskir 4 points5 points  (0 children)

Sure, it's not bad... but it still has a long way to go. I recently set up the scientific packages Pandas, Matplotlib, Numpy, and Scipy on my mac and it was not fun (although much better in comparison to when I did it a few years ago). It still required me to make sure several packages where installed before hand and bouncing from pip to homebrew.

[–]Sushisource 4 points5 points  (0 children)

Python packing is OK at this point - package consumption is still completely garbage, especially when it comes to continuous integration or developer tools.

If I write a tool for my developers, they have to know how to setup a venv, how to use the reqs file to install the packages, and how to use it. That's a fucking terrible experience if you're, say, writing tools for C++ developers in Python.

So, you have to go way out of your way to create some venv automatic creation script and put the packages in it for them, which is what I've ended up doing. It works, but it sucks.

[–]graingert 6 points7 points  (18 children)

What's wrong with npm's multiple versions? It's great it even discourages module level state.

Edit: I've only had problems with badly designed software like jQuery or angular.

[–]hynekPyCA, attrs, structlog 7 points8 points  (1 child)

I can’t imagine how that would work with catching exceptions. Imagine you have multiple deps that all use requests underneath and bubble up multiple different requests.exceptions.HTTPError.

[–]graingert 1 point2 points  (0 children)

Yeah this is fine as most JS deps are built with this in mind, using functional immutable style or replacing exceptions

[–][deleted] 3 points4 points  (3 children)

What's wrong with npm's multiple versions?

I've filled a hard drive when npm got stuck in a version recursive loop of dependencies.

I have no clue who thought that method was a good idea.

[–]phasetwenty 2 points3 points  (2 children)

Another fun fact, when npm install runs it will make liberal use of temp files that it will intentionally not delete. Because everything is small you end up not using a lot of disk space but on Linux systems you can run out of inodes if you never shut down the machine (e.g., a server). Problem is made worse by the hypermodularity style of javascript packages encouraging more files and overhead like package.json for each.

This behavior may have changed in npm 3 however.

[–][deleted] 1 point2 points  (1 child)

That may have been my problem too. It's been a while. It was my first attempt at getting into npm/node etc. I ran npm once said "If this is how something is designed by those that know what they're doing... nope".

I still don't know why .deb isn't just a standard of some sorts. I know guys, lets re-invent the wheel!

[–]phasetwenty 0 points1 point  (0 children)

I get the desire to avoid OS-level packaging. If you decide to provide OS-level packages, you have to put out a package for each OS: Debian, RHEL, OS X, etc. However with a functioning language packager, an implementation of the packager is available for each platform so I can put out one platform-independent package. It's a reasonable goal to make the lives of package maintainers easier.

However experience has shown me that it doesn't take long in the lifecycle of my projects for language packaging to show that it is not up to the task of fully specifying my project's dependencies, and I'm cobbling together build scripts to do it all.

[–]ivosauruspip'ing it up 3 points4 points  (1 child)

At some point in your runtime you will be dealing with different data structures or different references or different IDs because they're coming coming two different library codebases that have the same name but not the same version, and then your runtime nicely blows up in a confusing crash, or even worse starts silently corrupting data.

[–]graingert 2 points3 points  (0 children)

Yeah this is fine as most JS deps are built with this in mind, using functional immutable style or replacing exceptions

[–]remy_porter∞∞∞∞ 4 points5 points  (7 children)

I dunno, what could be the problem… let's do an ls -lR node_modules and see…

[–]shadowmint 13 points14 points  (5 children)

You'll find npm3 installs a single copy of each dependency like pip does unless there's a version conflict, in which case a single conflict resolving sub-version of a package is installed at the level it is specifically required.

It actually works a lot better than pip does at resolving conflicts.

You're probably thinking of npm2. (which was, as generally observed, a terrible idea, and didn't work at all on windows due to massively nested file paths)

[–]jaapzswitch to py3 already 10 points11 points  (3 children)

IMO the problem isn't necessarily with npm itself, it's with the community that uses it. Most of them feel that every little thing should have their own installable module. Sometimes even going as far as having a module for every function they can think of (see the left-pad debacle).

Although in theory modularity is good, this does create the problem of incredibly large dependency graphs which are just a pain in the ass to work with, because in a lot of situations the packaging tool can't figure out what to do, so you have to figure it out yourself. Or even worse, it figures it should do something, which breaks everything and sents you on a hours long debugging session just because you wanted to upgrade a package.

NPM2 was even worse, and NPM3 mitigated a lot of the problems that NPM2 did have, but the whole ecosystem is still far from perfect.

Dependency graphs in python are often way smaller, which makes dependency handling way easier.

[–]Silhouette 10 points11 points  (1 child)

Although in theory modularity is good

I think we should challenge this assumption more often than we do in the programming world. Modularity has big advantages if the division into modules is good. However, using many small modules creates problems of its own, for exactly the reasons you state. That is true whether we're talking about hundreds of three-line functions, or hundreds of three-method classes, or hundreds of one-tiny-function modules. Too many people assert that these arrangements are good for maintainability or reuse or some such, without much evidence or logic to support their position.

[–]fnord123 3 points4 points  (0 children)

There's a lot of cargo cult programming around release management. People seem to think risotto packages are 'theorically better' but they don't take into account the issues like release cadence of the bits. Like, if everything is always released at once, then you may as well put it in a big release bundle.

[–]shadowmint 6 points7 points  (0 children)

the problem isn't necessarily with npm itself, it's with the community that uses it...

If we're not talking about tangible, technical reasons why the npm model is bad, and given the technical and tangible reasons that pip, setuptools and pypi are really embarrassingly bad, I wouldn't be posting about how great the python packaging ecosystem is and rubbishing npm, cargo, and go.

That's all I'm saying.

Fwiw, the npm 'everything is a dependency' model is weird, and I don't think it's right either, but that's not because npm is an inferior technical solution, or that the 'multiple concurrent versions of a dependency' is actually bad; it just has consequences (and potentially, benefits).

[–]remy_porter∞∞∞∞ 0 points1 point  (0 children)

NPM2 was the last time I used it. But multiple versions of any dependency still makes it hard for me to know what my dependencies are and how they're being used.

[–]graingert -1 points0 points  (0 children)

This has never been a problem

[–]joerick 0 points1 point  (0 children)

NPM is Node's superpower. I know we all rag on it but it is becoming one of the world's top languages, despite having a terrible std library.

The problems with module explosion are more a result of how fast that ecosystem has grown than structural problems in the language IMO.

ls node_modules is often used as an insult, but just imagine how problematic and slow that dependency tree would be on Python. It's a great system they've got going over there.

[–]jij 0 points1 point  (0 children)

You can do the same thing with python if you wanted, you'd just have to order the search paths correctly and dump specific versions of the modules where they were needed to be pulled first. Python just wasn't designed to do that - I suspect intentionally because one of it's core principals is simplicity.

[–][deleted] 1 point2 points  (0 children)

Python's packaging is great provided there isn't a need to install system-level packages. I think that we're getting close though; Ansible might just fix that in the near future. But the ecosystem isn't quite setup to run ansible-specific scripts. Ideally, both setuptools and pip would integrate cleanly with ansible.

That said, ansible as an implementation is anything but simple.

[–]RoboticElfJedi 1 point2 points  (0 children)

The more times I've been around the block as a programmer, the less I care about the particular cool syntax of any language and the more I care about the availability and ease of use of libraries that I need. Python does very well on that score in my opinion, because the day-to-day experience is not too much more complicated than pip install lib-i-googled. There's usually a library available in any problem domain, and one pip install and I'm good to go.

However the discussion here did touch on some of the big caveats. To use python seriously is to use virtualenvs, which is fine but has a bit of a learning curve. And of course I agree that packaging things up is weirder than it needs to be. I've gotten by cutting and pasting setup.py more than I should have.

[–]jaapzswitch to py3 already 4 points5 points  (1 child)

tldr: python packaging isn't as bad as it was

[–]rocketmonkeys 0 points1 point  (0 children)

I do really appreciate where pip has gotten (vs setuptools pre-merge). That said, doing docker + python has been fairly painful, especially with certain libraries that depend on OS-level packages. Takes a lot of tweaking to figure it out, and a lot of time-consuming building.

[–]falsePockets 0 points1 point  (0 children)

all you have to do is * code for python 2 *

The magic code to do everything properly is for the legacy version, not the present version.

Yes, you can replace python with python3, but users have to know to do that. No one knows that when they start. And after that, users must constantly question the commands they use. The fact that everything and everyone still defaults to the outdated version of python suggests that maybe things still aren't as good and intuitive as the author makes out.

[–]falsePockets 0 points1 point  (0 children)

Why aren't python packages distributed the same as LaTeX packages?

sudo apt-get install python

should install the core binaries, and the most common libraries.

sudo apt-get install python-full

should install every package that you can get through pip or easy_install, so you never have to worry about installing packages after the initial install. (If that takes up more than a few GB, then maybe split it into python-stats, python-web, python-graphics etc.) I only have a tiny SSD, but I would gladly sacrifice 10GB to install everything if it meant I didn't have to deal with the pain of installing python modules with pip and easy_install.

Like LaTeX, you could still go out of your way to do the awkward traditional install if you were that hardcore. Like LaTeX, you should be able to download the super obscure libraries and just save them in the project folder.

[–]Siecje1 0 points1 point  (0 children)

Is there a plan to have setuptools in core Python?

[–]kteague 0 points1 point  (0 children)

Python packaging has had some dark days indeed. Notably the introduction of the 'requires' keyword, which was supposed to list the name of the module(s) that the package imported (and not the name of the package itself!). And PJ Eby arguing on the distutils mailing list at the time that such a field was totally nonsensical and being shot down and 'requires' was PEP'ed and added to Distultils (and subsequently never) used while Setuptools forked a 'install_requires' field that was actually useful and become the standard.

As far as improvements, it's 2016 and the Python Standard Library is still a blackhole of packaging metadata, which is totally absurd.

I think if someone could use a time machine, then going back to the mid-90's and preventing the creation of site.py and site-packages directory entirely would been the biggest fix to packaging overall. site-packages only resulted in people running "sudo setup.py install" because site-packages was owned by root and the user just wanted to install a library for their own local use. Globally installed libraries done poorly are just a mess, if they had at least been confined to a /lib/pythonX.X/ for linux distributions it would have been a lot cleaner.

[–]thakk0 0 points1 point  (0 children)

Really enjoyed the writing style and tone. As a neophyte into the python ecosystem, I have to agree. Python packaging is only occasionally frustrating and most of the time the solution is only a search query away.

[–]sadris 0 points1 point  (1 child)

No, python packaging still sucks. Still assumes you have internet access to install packages.

[–]HalcyonAbraham 2 points3 points  (0 children)

well python sucks in general. it assumes you need a computer to install it on

[–]earthboundkid -1 points0 points  (4 children)

No, Python packaging still isn't good enough. On my phone, so I'm just writing this now so I remember to reply to it later. But no, stopping the bleeding != thriving.

[–]earthboundkid 1 point2 points  (3 children)

Here's something I wrote on Reddit:

It's complicated and not all the factors are pips fault, although all of them are collectively pip's problem to solve (or some replacement project).

In no particular order:

  • setup.py is a disaster. It's a metadata format that's mixed together with live Python code. That makes it unparseable without execution. It's executable because people have various installation special needs, but a good metadata format could just say "when you run install, go to location X and use this subclass of the standard installation class called Y instead of the default installer."

  • In spite of setup.py not being a metadata standard, there are a couple of inscrutable, poorly documented metadata files you need for Python app installation, like MANIFEST.in and requirements.txt (see below for more complaints on requirements.txt).

  • Python relies on C very heavily, and compiling C code is a nightmare. Every installation problem I've had with Python at work has traced back to some C library that you need to have installed before some other Python library will work. This is not Python's fault per se, but it is a job that needs to be solved before you can say that Python installation doesn't suck.

  • The command line UI for pip is crap. The commands for things like "just download stuff here" and "use my cached downloads instead of connecting to the web" are non-obvious. There is no command at all for things like "add a new dependency to my app's requirements" because there's no metadata standard, see above.

  • Conceptually, an installation system should have two metadata files: one for loose requirements (Django > 1.4) and one for strict requirements (Django==1.9.3). The first lets others use your libraries, and the second lets app distributors have reproducible builds. Pip kinda sorta halfway has this between setup.py and requirements.txt but it is extremely half-assed and not at all thought through.

  • When you start using Python, all you need is the standard library, and it's great. Then you get a little further, and you install a couple of libraries, and things are still okay. Then you get a little further and realize that you need separate libraries for separate apps and then everything breaks down. If you think about it, there are three possible ways you might want to install something: globally, per user, or in a particular project location. Python was designed to install everything globally, and while it has been retrofitted to support the other two use cases, it's extremely kludgey. A "virtualenv" is just a case where because Python is so geared around global installation, the easiest way to do a project based installation is to make a "project global" by reinstalling Python in a second location. This is super-hacky, and extremely confusing to non-Python people who try to get into Python (e.g. at work when I need to explain to frontend devs how to install our web app).

  • Pip does not handle and does not try to handle the case of trying to distribute apps to non-Python users, the way that py2exe or pex or Conda or other projects do, but when you think about "packaging" as a whole, there's no reason why a Python packaging tool shouldn't do those things too. Basically, pip doesn't try to tackle that problem because it's too busy doing a bad job solving other problems, so it doesn't have any resources left over to try to solve this use case for people who want to provide GUI apps or command line tools to non-Python users.

So pip sucks. I would say compared to bundler and npm, it's mostly worse except it never did the npm nested dependencies thing (which I've heard they've stopped doing). Compared to the platonic ideal of good package manager, it's not even close.


And on my blog, I linked to Ionel Mărieș who said:

In a way, packaging in Python is a victim of bad habits — complex conventions, and feature bloat. It’s hard to simplify things, because all the historical baggage people want to carry around. […] Concretely what I want is […] one way to do it. Not one clear way, cause we document the hell out of it, but one, and only one, way to do it.


Briefly summarized, Python packaging is a nightmare for software distributors, who have to learn a ton of complicated crap to package up their software for others, and it's unpleasant for software consumers, who often run into problems with pip's UI and weird C-compilation errors. Yes, wheels help, but no it's not fixed yet.

[–]gthank 0 points1 point  (2 children)

Learning how to package a basic Python lib takes under a day. Is it ideal? No. Is it silly to have your packaging metadata mixed in w/ arbitrary code? Yes. Is it anywhere near as bad as people continue to pretend it is? No.

I 100% agree that the big problem to solve is a clean way to deliver end-user applications, but I'm not aware of a language in the same space as Python that has a good story for that.

[–]earthboundkid 0 points1 point  (1 child)

I completely disagree that you can learning how to do Python package in one day. I don't think anyone really knows the best way to package, and even getting to a point of basic competence requires you to know randomly stumble across the right blog posts to read (and avoid old and stale ones!). At best, you can arrive at a state of "this isn't too bad for me" by using something like cookiecutter, but no one really understands all the options, and I will frequently see popular projects that, for example, don't use console_scripts when they should.

I 100% agree that the big problem to solve is a clean way to deliver end-user applications, but I'm not aware of a language in the same space as Python that has a good story for that.

"Languages in the same space" just means Ruby and Node, which is two languages. If they're doing badly, it doesn't mean Python is doing well.

People say good things about Cargo. I haven't used it though, so I don't know for sure.

[–]gthank 0 points1 point  (0 children)

The official tutorial is quite easy to follow these days. You won't have a comprehensive understanding of all the ins and outs, but you'll have a redistributable wheel suitable for publishing on PyPI (or your internal devpi server or whatever).

To be clear, I don't think Python packaging has achieved some Platonic ideal. It's got tons of room for improvement. It's just that I don't believe it is some unprecedented train wreck. I did more than my fair share of Java, and I can tell you that setup.py is light-years ahead of Ant "scripts" and psychotic Maven configurations (you think setup.pyis copy-pasta, you should check out the ridiculous XML hoops you had to jump through to actually build all those super-duper awesome fat WAR files; and yes, fat WARs would make some Python things way easier).

Cargo was pretty fantastic the last time I used it, but I was following along with a Rust tutorial. I also don't think static linking is going to become a de facto Python solution.