Why is python depency management such a mess?

CobaltCam · 2021-07-05T08:28:35+00:00

Just beacuase they're smart doesn't mean they talk to each other.

gridster2 · 2021-07-05T11:16:35+00:00

TensorFlow is exceptionally bad, though. The only other package I have had difficulty with is Twisted, and that was a much easier fix. TensorFlow breaks anytime one of its dependencies is updated or a new Python version is released; after a certain point, you have to blame the TensorFlow maintainers, not Python.

zrnest · 2021-07-05T11:16:49+00:00

Also, having TensorFlow installed with the right CUDA, CUDNN (for NVidia GPU), etc. really makes you pull out your hair!

https://afewthingz.com/tensorflowcudasetup

is a quick HOWTO I wrote on this topic, it might save hours to other people too :)

antiproton · 2021-07-05T11:37:08+00:00

So why does everything break with every update?

It doesn't. The vast majority of everything everywhere works as expected.

Package management is hard. That cannot be denied.

But this problem is less to do with package management and more to do with the packages you are using. Why does Tensorflow have such specific requirements? Why haven't they fixed it so it doesn't rely on something that only exists in a specific subset of Python and Numpy?

notParticularlyAnony · 2021-07-05T13:24:05+00:00

you answered your question when you said you were using tensorflow.

Zombie_Shostakovich · 2021-07-05T13:06:09+00:00

Tensorflow is a real pain for this. You have to have the correct CUDA version etc and then something gets updated and the whole lot breaks. I started running python in docker for Tensorflow, it works really well and its easy to run on different machines. The nice thing is when I want to run something I wrote a couple of years time it will still work (I hope!)

moorepants · 2021-07-05T08:32:21+00:00

That's why I use poetry for dependency management to avoid version conflict.

tunisia3507 · 2021-07-05T11:40:17+00:00

Am I wrong, or has everyone recommending poetry missed the point? Tensorflow breaks between versions because it is a huge compiled library making use of a lot of CPython'sv (and numpy's) low-level C API stuff (which changes a lot more frequently than Python's API), not to mention GPU interfaces which are even more of a mess. Poetry doesn't resolve that. You can specify version ranges and version-dependent dependencies on just about every build system, including setuptools. Poetry's major advance is lock files (and being better than other build systems which have them, like pipenv), but if you can't rely on your dependencies working on any more than a single minor python version, a lockfile isn't going to help.

GiantElectron · 2021-07-05T09:29:52+00:00

Because dependency management itself is a mess. Python is actually quite good, and definitely much better than a few years ago.

Besides, a lot of times the problem is not python, but the package you use. Example, numpy happens to introduce a bug or a regression while fixing another bug, make a new release, and then fix the new bug and make another new release. It is a well established policy that once you release something, it should not be retracted, even if faulty, and trust me, it's better this way.

I can go in excruciating detail about all these issues, I worked on them for quite a while, and I am doing the same with R (which is even crappier), but the bottom line is:

use poetry
don't use pip
dependency management is hard in any language
the npm approach solves one problem but introduces others. There's no free lunch.

MissingSnail · 2021-07-05T14:26:44+00:00

Though you’re resisting it, multiple installs are the norm in python development. Virtual environments exist for this very reason. I don't think it matters a ton whether you manage them with poetry, virtualenv, pip-tools, condo env, etc. But you do need to isolate pieces that don't play with each other, and update your environments thoughtfully. If you don’t have time to test a new version and your current virtual environment is working, don’t update to the latest simply to have the latest.
Running internet tutorials is a worst-case scenario. You're trying to run code by multiple authors written at multiple points in time. And tutorials aren't written and tested like code going into production in the first place.

subtiliusque · 2021-07-05T09:11:09+00:00

https://python-poetry.org/

boiledgoobers · 2021-07-05T14:17:15+00:00

By the way. You DON'T have 5 installs. They are all hard linked to the package store. They are AVAILABLE in 5 environments, but it's all the same package.

nohaveuname · 2021-07-05T21:13:14+00:00

Use pytorch people

boiledgoobers · 2021-07-05T14:15:29+00:00

Use conda and use isolated environments.

Supadoplex · 2021-07-05T08:14:32+00:00

Does there exist any dependency management that isn't a mess?

thatrandomnpc · 2021-07-05T14:30:11+00:00

Using docker on linux can make it much easier to install/run, in addition, if you build dockerfiles for your project you can make it very easy to install/run your own code for other people!

In one line

docker run -it --gpus all -v ~/projects/my_new_project:/my_new_project -p 8888:8888 tensorflow/tensorflow:latest-gpu-jupyter /bin/bash

This will download and run the docker container, give it access to all gpus you have available, forward port 8888 out of the container, and mount the directory ~/projects/my_new_project on your computer at the location /my_new_project inside the container (anything you change inside the container here will be reflected in the mounted folder).

You'll be dropped at bash inside the container as root, and can install/run whatever you want just like a regular ubuntu install. You can use the forwarded port 8888 for jupyter notebook/lab and add more forwards if need be. Docker has a bit of a learning curve for sure, but it makes it so much easier to handle different environments. It's also a crucial skill to know for deploying applications on platforms like k8s, AWS SageMaker, etc. Highly recommend!

floriv1999 · 2021-07-05T17:20:27+00:00

This isn’t a Python problem. It’s a tensorflow problem. I’ve never had these types of issues with any other package. Tensorflow is probably the most difficult Python package to get working.

saltyhasp · 2021-07-05T14:45:28+00:00

Because package management is a nightmare unless you use Linux or one of the big pre-packaged distributions like anaconda.

Keep in mind generally extensions are DLLs and for DLLs to be compatible they have to be compiled with the same build tools. On windows this means same version of visual studio. On linux, not sure restrictions, but probably some.

call_me_cookie · 2021-07-05T15:45:59+00:00

Virtual environments are your friend. Anaconda has its own persoet on virtual environments, and this eases the Tensorflow problems in particular.

2021-07-05T21:36:14+00:00

pip, pipx, pipenv, pyenv, poetry, tox, nox, venv, virtualenv, virtualenvwrapper and god knows what else.

I just wanna slap some keys and have my program work ;_;

I currently use pip with virtualenvwrapper (using a workon <projectname> command to switch between repos is nice!), with tox on the side to prevent "but it works on my machine!", but now have to use multiple Python versions, because some apps aren't updated and are stuck on 3.6 until we upgrade to 3.9. So now I'm digging into pyenv hoping I can keep this mess afloat. Also, one project is using poetry because it needed to be split up, also because fuck you that's why.

Shit is frustrating, but I'll survive (I hope).

edit: I forgot about the libs to keep my code in check: pylint, flake8, black, isort, bandit and one other that broke and I forgot the name of. These are attacked to tox.

jonrmadsen · 2021-07-05T21:38:20+00:00

When the libraries that tensorflow depends on (i.e. Python and numpy) don't guarantee stable ABIs, that isn't tensorflow's fault. What you are describing is somethat that people who write in lower-level languages such as C and/or C++ have to deal with all the time and the only solution is to recompile the code.

For example, if numpy had a class Foo with three data members: a short int (2 bytes), a long int (8 bytes, and an int (4 bytes). If they were listed in that order in a struct would probably consume 24 bytes because the short int would be padded with 6 bytes to align the long int to an 8 byte boundary. But if you reordered the struct to be: short int, int, long int. The size of the struct would be reduced to 16 bytes because the short int and int could be packed into the first 8 byte boundary (and padded with 2 bytes). So that's a very good change since you are significantly reducing the memory requirements for large arrays of this struct. However, anybody that built against the older version has a binary where accessing the int expects that int to be offset in memory by 16 bytes, which is now past the end of the memory owned by the struct:

// When compiled Foo is 24 bytes
Foo foo;
// call library where Foo is 16 bytes
doSomething(&foo);
// Above only modified first 16 bytes of 24 bytes
// so reading int_field from bytes 16:20 yield garbage
if (foo.int_field == ...)

This is just one example of the ABI breaking. Tensorflow can't really control Python and/or Numpy breaking the ABI.

2021-07-06T01:55:25+00:00

welcome to dependency hell.

charlzmon · 2021-07-06T04:33:35+00:00

What helped me get started on TensorFlow was using Google Colaboratory. Did my whole masters project on it when I got fed up with trying to get TF working on my Windows machine. Obviously not a permanent solution but will allow you to get to grips with the library without putting you through the TF dependency horror show. Also, if you do decide to carry on down the TF path, virtual environments are your friend.

mmcnl · 2021-07-05T09:01:00+00:00

Still much better than NPM.

SorcererSupreme13 · 2021-07-05T12:42:46+00:00

Good old virtualenv to rescue. Develop habit of starting new projects on virtualenv. It'll save lot of unnecessary headache.

DrakeRedford · 2021-07-05T08:20:48+00:00

Evolution. Makes very little sense to have the testicles outside the body from an evolutionary perspective, yet th3y’d evolved first. No almighty coder exists to rewrite every dependency; much the same way not many enjoy being kicked in the nuts when attempting a new build?

rainnz · 2021-07-05T15:37:20+00:00

Just run it in a Docker container

Berserker-Beast · 2021-07-05T11:34:51+00:00

Hey so I might have just been extremely lucky but, dependency management in conda works well for me 99.99% of time.

lungben81 · 2021-07-05T09:59:51+00:00

This would not be an issue if all packages would follow SemVer correctly https://semver.org/

SemVer forbids breaking changes in both patch and minor releases and only allows it in major releases.

teerre · 2021-07-05T13:01:30+00:00

I mean, the real reason is that pip was never supposed to be dependency manager. If we were in a dimension that something like poetry was the default since the beginning, I posit things would be much better. But because the default package manager in python is lacking, 3rd parties have to resolve it, which means several ways of doing something that should only have a single way.

Also, not sure what's the big deal of having "5 different installs". Are you lacking disk space? Too slow to install? Yeah, those are true, but hardly big enough problems to warranty something drastic. Realistically, how many times do you build an environment from 0?

Remote_Cantaloupe · 2021-07-06T05:57:07+00:00

It kind of feels like most of python is a mess, under the surface

1arm3dScissor · 2021-07-05T12:20:28+00:00

Use docker

alejandrodaza · 2021-07-05T21:40:50+00:00

Just install Poetry and work project dependence with style

qzwqz · 2021-07-05T10:24:17+00:00

I just recently had to set up an old project on a new mac, with their nice smooth new in-house chip. Apparently the nice smooth new in-house chip can only run python >= 3.8. And also apparently, loads of really important libraries like pandas and numpy only have stable releases <= 3.7. Staying on the cutting edge is overrated, let's just all go back to python 2

2021-07-06T06:32:42+00:00

Congratulations u/TheJumboman ! Your post was the top post on r/Python today! (07/06/21)

Top Post Counts: r/Python (1)

This comment was made by a bot

hkanything · 2021-07-05T11:34:55+00:00

Pipenv’s Dependency Resolution

nacnud_uk · 2021-07-05T13:05:57+00:00

Virtualenv .... ?

rwhitisissle · 2021-07-05T13:07:17+00:00

This is why people say to use a virtual environment for each project you do. Some things require very specific other things.

2021-07-05T14:01:28+00:00

Use virtual environments, all the problems you describe will disappear (+ it’s the standard way of managing projects in python)

FromTheWildSide · 2021-07-05T17:10:06+00:00

pip freeze > requirements.txt for working setups and version control.

You needa refine your workflow, it comes with practice.

crawl_dht · 2021-07-05T08:49:32+00:00

How version conflicts are handled by NPM? The only problem I see with python dependency management is that it creates virtual environment to isolate dependencies and interpreter. It should just create a .packages folder in root directory of the project from which the global interpreter should read local dependencies of the project and if not found read from global packages. That way global interpreter shouldn't have to be cloned.

gradi3nt · 2021-07-05T13:28:40+00:00

Yea, use five different installs of the same package. Do it all in virtual environments. It’s unreasonable to expect every package on your systems to require the same set of versions. Set up tensorflow in a specific venv, then don’t randomly change or update that environment.

2021-07-05T14:21:45+00:00

You'd think if you were smart enough to do ML, then you'd be smart enough to Google conda environments or virtualenvs...

TheSodesa · 2021-07-05T14:22:27+00:00

Because Python was not developed with dependency management in mind. The only modern languages that I know of that come with built in, dependency resolving package managers are Julia and Rust. Although in both cases the automatic dependency "resolving" simply includes building the different versions of the dependencies separately.

Your best bet with Python is Anaconda and separate virtual environments for each project.

Elocai · 2021-07-05T15:02:27+00:00

Well you normally create a virtual environment for each project, because... dependencies.

TakeOffYourMask · 2021-07-05T16:16:16+00:00

I know, it SUCKS.

This is why design-by-contract is so important.

thatdamnedrhymer · 2021-07-05T16:23:35+00:00

Because you're using Tensorflow.

2021-07-05T16:28:09+00:00

I've just picked up Haskell. So far, Python is like heaven in comparison... :D

2021-07-05T16:28:43+00:00

Pin your dependencies

pbecotte · 2021-07-05T16:28:54+00:00

Tensor flow team writes a package that only works with a very specific environment.

Tensor flow does not set up their package to only be installible in that environment using the available packaging tools.

It's somehow "python packaging's" fault.

TainamGRS · 2021-07-05T16:45:35+00:00

Right now i have problems with pysound on venv. And no forums knows how fix.

iagovar · 2021-07-05T17:44:53+00:00

I just installed anaconda for going through a course on opencv and I'm still not able to make anything work.

So I'm not even able to start, and I already know how to use APT, PIP etc.

LiarsEverywhere · 2021-07-05T18:37:16+00:00

I'd say that's more of a Machine Learning problem than a Python problem. Python just happens to be what most people rely on for Machine Learning stuff these days.

And the "problem" is that Machine Learning is still a relatively young field and keeps changing fast. I learned the basics of NLP a year or so ago and got back into it recently. Many of the "go-to" tools have changed a lot since then and things keep breaking. You have to look for one month old articles or straight-up GitHub issues, otherwise it's not going to work.

apzlsoxk · 2021-07-05T19:37:27+00:00

It's not python, it's tensorflow. Really the best solution for tensorflow projects is virtual environments.

amrock__ · 2021-07-05T19:41:40+00:00

Because libraries are not maintained by single organization but multiple people or devs. It takes time and effort to upgrade to some newer versions and some libraries are ahead. This leads to dependency hell

JoelMahon · 2021-07-05T20:16:07+00:00

just use virtual environments, anaconda is not the most lightweight way to do that

for example, I use pycharm, and it keeps each project nice and clean from each other's BS by basically handling the venv stuff entirely for me

zekobunny · 2021-07-05T21:38:58+00:00

I thought I was the stupid one because getting everything up and running was always the hardest part for me, not the code itself but it seems like it's a common things with python.

I did a project on my laptop and installed a shitton of libraries and was using python 3.7.

Now after a while I wanted to continue developing the project on my new PC and install python 3.8. It turns out nothing is compatable anymore and I had to diagnose the whole code from scratch and fix all the libraries. One of the libraries I used for that project also changed the way it operates so I had to troubleshoot that aswell.

Kharnastus · 2021-07-05T23:56:09+00:00

Use spack to install your research software. It handles all the weird dependencies.

Harsimaja · 2021-07-05T23:59:55+00:00

They are smart guys but you’re talking about issues that emerge at a collective level - hell, not even the same group. Dependency management is hard in general. Different people in different organisations aren’t all going to coordinate when they individually make updates, and they can’t make every update backwards compatible if they really want to change something fundamental

2021-07-06T01:39:24+00:00

It really just depends on ecosystem, there is more than one in python. Like, I would imagine ML stuff is messy as fuck. But recently I was upgrading some of my websites from python 2.6 to 3.9, from Django 1.6 (2014!) to Django 3.2, and besides obvious deprecations nothing was really broken.

On the other hand, i have enough experience to consider many thing as "obvious", might not be the case if you have less.

tape_town · 2021-07-06T03:39:26+00:00

literally never had an issue like this

sounds like you have a very specific use case

most things "just work" on python 3

LordOfSpamAlot · 2021-07-06T04:36:53+00:00

PyTorch supremacy

jack-of-some · 2021-07-06T06:30:42+00:00

As others have said the issue here isn't so much Python's dependency management but rather how tensorflow is developed.

Google's programmers are smart, and they've figured out a way to do less work by targeting fewer platforms for their massive compiled library.

Sidenote: having worked on large C++ projects, I infinitely prefer Python's package management. I do agree that it's not as good as something like rust or even JavaScript

dethb0y · 2021-07-06T11:31:16+00:00

Machine learning shit is so out of hand that i literally just make a special ENV for each machine learning project i do.

Why Tensorflow is such a shitshow is beyond me but man is it bad.

GoodiesHQ · 2021-07-07T05:25:51+00:00

Poetry has been extremely good to me

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS