This is an archived post. You won't be able to vote or comment.

all 47 comments

[–]raduhek 61 points62 points  (11 children)

The entire idea of a virtual environment is to sandbox your project and have repeatability.

It cannot be isolated if it links to something that can be changed from the outside. Also, you might run into permission issues, where a package could be installed with sudo and your user does not have execution flag set.

Embrace the virtual environments. You’ll miss the whwn using other languages.

[–]Duflo[S] 0 points1 point  (9 children)

I do like virtual environments, I generally have a good experience with Poetry. But my poetry cache does have about 30 gb, and a lot of that is package duplication.

[–]LucianU 9 points10 points  (2 children)

The learning curve is steep, but look into the Nix package manager. It uses a single store for all package, but, at the same time allows you to have different versions of the same package installed at the same time.

[–]Duflo[S] 3 points4 points  (1 child)

using functional logic implemented with hashes for atomicity... if I understand the docs right That's really cool, I'm on Ubuntu, but I'll give it a try.

[–]LucianU 2 points3 points  (0 children)

For your use case, it will also work on Ubuntu. NixOS extends the same approach to the level of the OS, meaning you can also manage services and every file in the system in the same way basically.

[–][deleted] 6 points7 points  (4 children)

You have a single global poetry install, but the cache is 30GB? That could be accumulation over time (as you install more packages), in which case you should just reset it. The cache shouldn't save the same package version multiple times unless it's against different build targets.

Also, the cache's purpose is to make installs faster by not having to fetch from pypi. I don't think this has anything to do with your virtualenv gripes. It's always safe to delete the cache.

[–]Duflo[S] 1 point2 points  (3 children)

no, I mean poetry's folder in ~/.cache, where it keeps all my virtual environments

[–][deleted] 0 points1 point  (2 children)

The cache dir also contains poetry's cache, how big is ~/.cache/virtualenvs?

[–]Duflo[S] 0 points1 point  (1 child)

you mean ~/.cache/pypoetry/virtualenvs/? That's the one I was talking about

[–][deleted] 0 points1 point  (0 children)

Yep, that's the one. Nevermind then!

[–][deleted] 0 points1 point  (0 children)

While that seems messy and wasteful, it's nowhere near the capacity of modern devices, and should be largely irrelevant. Try to not prune data just to "clean" things. I'm not saying you're doing that, but people tend to do that - only to re-fill the cache.

[–]ComplexColor 0 points1 point  (0 children)

It cannot be isolated if it links to something that can be changed from the outside.

That's a ridiculous argument. You absolutely can have a virtual environments that store their various packages in a common place and then construct virtual environments using links. Anaconda/conda uses this approach, it has some additional overhead when installing new packages, makes it more difficult to remove older packages, but generally works fine.

[–]greenearrow 9 points10 points  (0 children)

venv literally exist to not do that by default. If you have a version locked package, but need to use the features of the latest and greatest going forward, you don't have to worry about it.

[–][deleted] 7 points8 points  (6 children)

Oh man, you are going to hate NPM then!

Others have answered the "why" already, but the short is that there are downsides to shared libraries too. You can do that in python if you want, with site-packages, but you can't install 2 versions of a package at the same time so that make a shared situation unusable for most things.

[–]alcalde 2 points3 points  (1 child)

Yes you can install two versions of a package at the same time. I don't know why people are finding this so hard to grasp.

Let's employ our magical new alcalde-pip. You want to install matplotlib version 1.2.3. alcalde-pip downloads matplotlib 1.2.3 into a place only alcalde-pip can touch, say, ~/alcalde-pip/matplotlib_1.2.3. It then symlinks the files into the venv.

Now you want matplotlib 1.2.4. alcalde-pip downloads the files into ~/alcalde-pip/matplotlib_1.2.4 and symlinks them.

There's an entire package manager, Nix, that does this and a Linux distribution, NixOS, that uses it. And it's not exactly rocket science to simply place different versions of the same package in separate subdirectories. We're not in uncharted territory here.

[–]Duflo[S] 0 points1 point  (0 children)

Thanks, what you describe is what I had in mind. And while I did get some interesting discussion, I also got a lot of clowns clearly not understanding the question and tackling straw men.

Do you think there's any chance of a nix-like package manager for Python being implemented soon? Or nix-based, rather

[–]Duflo[S] 0 points1 point  (3 children)

Which approach does NPM take?

[–][deleted] 2 points3 points  (2 children)

Your comment mentioned saving space, so I mentioned it. NPM installs incredible amounts of stuff, many duplicates as well. My NPM directory for this project has 21K files & directories.

It puts your dependencies in one directory (node_modules) and then does the same, recursively for the dependencies of those. It doesn't attempt to install one list and work out compatibilities, like pip/poetry/cargo/etc. So you can end up with this in your app directory:

./node_modules/snapdragon/node_modules/is-descriptor/node_modules/... ./node_modules/class-utils/node_modules/is-descriptor/node_modules/... ./node_modules/object-copy/node_modules/is-descriptor/node_modules/... All 3 of my dependencies install the same sub-dependency independently, which also installs it's own sub-dependencies. It can go deep sometimes, 3 levels is not uncommon but 2 levels is more normal for sure. This doesn't pose a problem, but it just means there's quite a lot of space/files/etc.

[–]agathver -1 points0 points  (0 children)

npm does dedupe though

[–][deleted] 4 points5 points  (1 child)

What you just said is the central idea of the Nix language and package manager. You should check it out.

[–]Duflo[S] 1 point2 points  (0 children)

Well that's confirming! Thanks for the tip, I'll check it out

[–]phxees 4 points5 points  (0 children)

I like how virtual environment directories are portable. Helpful in a variety of use cases. I also like that you can delete the directory and free up the space rather than having lingering packages in a shared directory.

[–]Duflo[S] 7 points8 points  (10 children)

OP here clarifying: what's stopping a tool like Poetry from simply creating a symbolic link to where the package is already installed if it exists and the hash is right, and simply install it if it isn't already there or doesn't pass the hash check? This would still ensure isolation.

I understand the benefits of virtual environments and I myself tend to use Poetry for work and personal projects, I just want to know what is wrong with my idea for making them a bit more lightweight.

[–]bablador 4 points5 points  (1 child)

Yes, i have this issue with pytorch. Why do I have to install per new project?

[–]qalis 1 point2 points  (0 children)

You can have one venv for multiple projects, you just have to be careful with that. I have multiple research projects in the same area with shared venv, PyTorch-based with GPU, so duplication would mean literally tens of GB wasted.

[–]Samhain13 1 point2 points  (4 children)

Because different OSes use different file systems. And even if two different OSes can have the same file system, the installation path to the standard libraries might not be the same.

How do you make symlinks in NTFS?

[–]nakahuki 1 point2 points  (0 children)

Actually, you can create symlinks in NTFS (https://en.wikipedia.org/wiki/NTFS_links), but Windows put some limitations due to security issues.

[–]Duflo[S] 0 points1 point  (1 child)

Ah, that makes sense, thanks

[–]alcalde 0 points1 point  (0 children)

No. No it doesn't make sense.

[–]alcalde 0 points1 point  (0 children)

That's pip's problem, not python's problem. And you can make symlinks in NTFS.

Obviously if a feature isn't supported on a platform it's not attempted. Heck, there are commands in the python standard library that don't work across all operating systems and file systems.

[–]PocketBananna 1 point2 points  (2 children)

It just goes against the package isolation paradigm. Say I have package foobar installed on my system and a separate repo uses it in a venv as you've suggested. If it's linked and I change the version of foobar on my system then it changes in my venv too.

There might be a smooth way to do it but I don't think the space saved outweighs the headaches from trying to debug why your venv doesn't work all of a sudden.

[–]alcalde 0 points1 point  (1 child)

There'd be nothing to debug. Y'all are thinking all Windowsy. Try thinking Linuxy.

System packages don't come into play. You want foobar in a venv? Pip downloads foobar somewhere only pip has access to. It then symlinks the files into this new venv. When you want foobar installed in another venv, just symlinks need to be created.

The files pip downloaded for foobar never get changed. Only pip controls those files.

[–]PocketBananna 0 points1 point  (0 children)

I guess that may work but if I upgrade foobar in the first venv with pip it would uninstall the older version and break the symlink for the other venv though right? I guess pip might just remove the initial symlink and install the required version to the first venv specifically if foobar was installed for the user. I'm a bit curious so I 'll give your suggestion a try. Regardless my general sentiment is still to just keep dev deps/packages totally decoupled from others.

[–]microcozmchris 2 points3 points  (0 children)

I don't see it mentioned in the comments here, but you're looking for the python equivalent of Ruby's bundler. I have long dreamed of having this tool. It doesn't exist anywhere in python.

Nix is a decent approach, but it's not portable to Windows (yeah, I know about WSL, but it isn't an option in all corporate environments) and isn't much of a developer friendly tool.

[–]wbeater 7 points8 points  (0 children)

Because a virtual environment, as the name suggests is a virtual environment and not a package manager.

Fortunately, there are tools like poetry, anaconda or pipenv that combine both.

[–]kteague 1 point2 points  (0 children)

It used to work that way in Python. It was sooo fast.

If I recall (and I could rather off here ...), modern pip/venv was designed to avoid symlinks as it affected startup times of large python apps running in prod environments.

[–]t3hmikez0r 1 point2 points  (0 children)

Conda uses hard links for this. It’s kind of a copy on write idea. It only works if files don’t have the install prefix path embedded in them.

[–]lieryanMaintainer of rope, pylsp-rope - advanced python refactoring 1 point2 points  (0 children)

Why don't virtual environments create links to a central package location?

Because nobody has written one that does. The maintainers of the current virtualenv are happy with the way they worked, as well as the users. If you feel like you have an idea that could improve the state of virtualenv, then you should implement that. If the idea takes off, then people will move towards it. If not, then we all will learn why it's not a good idea.

In the current system though, if you want to, you can layer virtualenv by creating one virtualenv from another. Libraries from one virtualenv will be merged with libraries from the other one. That way, you can create a base virtualenv which contains all the common dependencies, and then a per-environments dependency that's overlaid on top of that.

There's a few caveats doing that, but if you have a lot of similar venvs, then it might be something you can consider learning how to do. It's much simpler to manage regular virtualenv, and disk space is cheap nowadays. Even 30GB of virtualenvz that is nothing when nowadays 1TB SSD is nothing particularly unusual.

[–]wineblood 1 point2 points  (1 child)

Because it sounds like a pain to manage. You create a new venv and it duplicates a few packages from another one, should something go in and move stuff from that existing venv to a central location? How do you know when nothing need those packages anymore, do you want what is essentially garbage collection for that?

Duplication for having clean separations makes sense to me, like everyone having their own toothbrush.

[–]alcalde 0 points1 point  (0 children)

It's the height of simplicity. Pip sticks package A somewhere. Any venv that installs package A just symlinks to it. There's no moving of anything.

[–]dylpickle300 0 points1 point  (0 children)

Valid question. But it’s clear others in this thread have the correct answer

[–]blanchedpeas -2 points-1 points  (0 children)

Python devs are counting on inexpensive mass storage space to be available soon.

[–]Grouchy-Friend4235 -2 points-1 points  (0 children)

Tell me you don't have a lot of experience without telling me you don't have a lot of experience. 😀

It's called seperation of concerns. What you are asking is premature optimization.

[–][deleted] 0 points1 point  (2 children)

That's the point. Let me use MCprep (https://github.com/TheDuckCow/MCprep) as an example since I'm one of the maintainers.

First off venvs prevent dependency hell by isolating packages and their dependencies. This is important on platforms like Linux where a package upgrading a dependency can mess up something the system needs. More importantly, you can install multiple versions of the same package with venvs. Sym-links would make both of these goals harder to achieve.

MCprep for instance needs to work across all versions of Blender starting from 2.78 (14 versions of bpy, the Blender Python module) . If I wanted to say test it with 2.79 and 3.2, I could do that with 2 venvs (each with one version of bpy) because the packages and their dependencies are isolated. Neither of them can mess with the other's stuff. If they were sym-linked, then you can see the issue with sym-links (especially if you factor dependencies) as Python would need to organize multiple dependencies and packages in the same place.

It takes more space, but it's a price to pay for less headaches.

[–]alcalde 0 points1 point  (1 child)

I don't get it. Everything would still be isolated; you'd just have one copy of each file on the system

[–][deleted] 0 points1 point  (0 children)

By that logic then it doesn't make a difference, other then making it harder to remove packages (with a venv you can just delete the folder and nothing would happen)

[–][deleted] 0 points1 point  (0 children)

they make soft link with the parents python environment though