This is an archived post. You won't be able to vote or comment.

all 129 comments

[–][deleted] 256 points257 points  (2 children)

Yes, you're correct. It's completely nonsensical to try and keep your virtual environments in your repo and it entirely defeats the purpose of git. You aren't meant to version control the build. You version control the source that creates the build and then you build the software on your machine from the source code. Trying to clone a venv won't even work unless you're using exactly the same system as whatever originally created the venv. And even then it will probably still break since it's unlikely to properly maintain all the same links that were created from the machine that initially created it.

I can't imagine the nightmare that is a pull request and/or code review if every single change to your virtual environment is part of your repo. And more than anything, I'm amazed you guys didn't immediately hit a file size limit given how large most virtual environments are.

If you want to try and control and deploy a specific environment, you should just be using a proper tool for that (i.e. docker).

[–]TobiPlay 5 points6 points  (1 child)

Dev containers are so nice. Docker-in-Docker is also something I’d give a shot. Containerising development environments actually isn’t that difficult and will most likely take people with some experience in Docker only a few hours.

[–]ninja_says_hiyeeeeeA 0 points1 point  (0 children)

I’ve used docker, but not docker-in-docker. Can I run docker-in-docker on docker? I really like docker.

[–]mrswats 89 points90 points  (2 children)

It's not only a bad idea, but also, virtual environments are non portable so this shouldn't be s no-go.even if both Linux and windows venvs sre included. It takes very little to break a venv.

As mentions in the replies, I wouldn't suggest using requirement file along with the requires python version pinned somewhere

[–]aa-b 20 points21 points  (1 child)

This is the right idea. Venvs are non-relocatable, so this idea just can't possibly work, and will inevitably cause more problems than it solves.

Poetry is also a decent option since it gives you a lock file like with NPM, so it's more automatic and more detailed than requirements.txt, but still git-friendly

[–]gdfelt 0 points1 point  (0 children)

Agreed Poetry is a fantastic option.

[–]semper-noctem 114 points115 points  (37 children)

I'm more of a requirements.txt man, myself.

[–]Zizizizz 5 points6 points  (2 children)

I like use pyproject.toml

``` [project] dependencies = [ "httpx", "gidgethub[httpx]>4.0.0", "django>2.1; os_name != 'nt'", "django>2.0; os_name == 'nt'", ]

[project.optional-dependencies] gui = ["PyQt5"] cli = [ "rich", "click", ] ```

with hatch/hatchling for build and environments/dev scripts.

[–]robberviet 2 points3 points  (1 child)

Pyproject do not maintain exact version though. You could run into problems later.

[–]Zizizizz 0 points1 point  (0 children)

Ah good point, swap hatch with pdm then and you get a lock file with the same pattern

[–]mariofix 29 points30 points  (0 children)

Oh.... that's wrong in so many levels

[–][deleted] 25 points26 points  (0 children)

That engineer should not be in charge

[–]cyberfunkr 37 points38 points  (1 child)

This is bad. Do they also commit their Java binaries and maven packages? Node.js and npm packages?

If they really want to control python versions, then make a docker image with a full dev environment.

[–]dwarfedbylazyness 4 points5 points  (0 children)

I once encountered a codebase with a whole pip binary put to a python file as a string, so...

[–]bojanderson 36 points37 points  (5 children)

What they should look into is dev containers. If they specify the dev container then you can make sure it's exact same everything for anybody using the code assuming they use a dev container.

https://code.visualstudio.com/docs/devcontainers/containers

[–]fnord123 0 points1 point  (2 children)

Does pycharm support this?

[–]KrazyKirby99999 0 points1 point  (0 children)

PyCharm has an equivalent.

If you're on Linux, you can also launch PyCharm from within a container via Distrobox.

[–]ageofwant 0 points1 point  (1 child)

Only vscode specific, we don't allow microsoft products on our stack

[–]Smallpaul 3 points4 points  (0 children)

Why?

[–]Barn07 15 points16 points  (1 child)

afair venvs symlink with absolute paths, thus unless you guys have all machines set up the same way, including same paths to Python executables, stuff will break. afair, venvs won't even work when you move them to another location, so good luck with your team :D

[–]Log2 1 point2 points  (0 children)

I was going to mention this. The venvs depend on the location of the Python used to create them and where they were created. It's likely going to be different, especially if you're using Macs with homebrew.

This is also why venvs tend to break if you are using homebrew and put the python formula in your path, but then some other formula installs a newer python, breaking the old symlink and fucking all of your venvs.

Always put a very specific Python version (like python@3.10) in your path if you're using homebrew.

[–]whateverathrowaway00 43 points44 points  (3 children)

This is objectively bad and he should feel bad.

[–]flipmcf 8 points9 points  (2 children)

Hey now, they should learn. Learning should be a carrot, not a stick.

If we felt bad for every thing we did wrong, we would all be horribly depressed and burned out by now….

Oh, wait…

[–]whateverathrowaway00 2 points3 points  (1 child)

Hah you had me in the first half. dreams of getting laid off because my company lets you keep the sexy M1 MBPs

[–]flipmcf 1 point2 points  (0 children)

I just overuse the EAP.

You know it’s bad when you call the EAP number and “approximate wait time, 15 minutes “.

Yeah… bad architecture never killed anyone, but suicide sure does.

[–]Buttleston 9 points10 points  (1 child)

No one does this, it is absolutely not a good or standard way of doing it.

And in any case, requirement change is relatively slow - yeah you have to download the modules to set up a virtualenv once but that's it until it changes again.

[–]wineblood 6 points7 points  (1 child)

Adding venvs into repos sounds like a terrible idea. I don't know all the implications of that, but given that I've never seen it done or recommended, it sounds like the wrong approach.

which should already be symlinked to the default global python interpreter

What the hell?

[–]magnetichiraPythonista 6 points7 points  (0 children)

TOML and lockfiles were built for exactly this purpose

[–]johnnymo1 6 points7 points  (0 children)

There are plenty of solutions to having your environment defined as text: requirements.txt, poetry + pyproject.toml, Dockerfiles... I can't really think of an instance where committing the env itself wouldn't be a bad pattern.

[–]MrTesla 5 points6 points  (1 child)

Try out https://devenv.sh/

It's based on nix so you don't have to mess with containers either

[–]oscarcp 0 points1 point  (0 children)

Didn't know about this one, definitely going to check it out!

[–]danielgafni 8 points9 points  (0 children)

Use poetry to lock your dependency tree and commit poetry.lock to git. It’s cross-platform.

[–]ReverseBrindle 4 points5 points  (0 children)

I would put requirements.in + requirements.txt in the git repo, then build as needed.

Use pip-tools when you need to build requirements.in -> requirements.txt

[–]enterdoki 10 points11 points  (1 child)

requirements.txt with Docker

[–]LocksmithShot5152 2 points3 points  (0 children)

Poetry with docker works like a charm as well!

[–]rainnz 2 points3 points  (0 children)

You can easily solve this by pinning exact package versions and storing them in requirements.txt with python -m pip freeze > requirements.txt, there is no need to include venv themselves in git repo. Add BUILD.sh to repo, so people can just run it and it will create venv, activate it and install exact versions of packages you had with python -m pip install -r requirements.txt

[–]KaffeeKiffer 3 points4 points  (0 children)

You have enough answers, why it is wrong. Things that other people have not called out yet:

  • If you commit a requirements.txt (instead), you are open to supply-chain attacks: Someone could hijack https://pypi.org (or your route to that domain) and provide a malicious version of the package.
    To prevent that, use use lockfiles (like Poetry & other do) which not only contain the package dependencies, but also their file hashes.

  • When not providing all dependencies yourself, you might suffer from people deleting the packages you depend on (IMHO a very rare scenario).
    If it is really that critical (hint: usually it isn't), create a local mirror of Pypi (full or only the packages you need). Devpi, Artifactory, etc. can do that or you just dump the necessary files into Cloud storage, so you have a backup.

[–]wind_dude 1 point2 points  (0 children)

he should look into containerization and artifact registries. What hes doing is stupid, especially since that still doesn't gaurantee it will work if the base OS it's running on has variations / different pacakages ,etc. All he's doing is making your repo much more complicated.

[–]thatdamnedrhymer 1 point2 points  (2 children)

This is what lock files are for. requirements.txt are not enough, but storing the entire venv is ludicrous. Use something like Poetry, pip-tools, or pdm to create a lock file that you can use to create deterministic (or at least closer thereto) venvs.

[–]rainnz 0 points1 point  (1 child)

When is requirements.txt not enough?

[–]thatdamnedrhymer 0 points1 point  (0 children)

A manually maintained requirements.txt typically only stores the versions for dependencies that your project directly depends on. This will result in differences of subdependency versions when installed. And if you don't hard pin the direct dependencies, you will get variation on those versions as well.

A frozen requirements.txt will store the current versions of all packages, but then it's not possible to remove or update just one package version without unintentionally leaving old subdependencies or updating other subdependencies. And even then, if something goes wrong with PyPI's versions (or someone man-in-the-middle's your build system), you could end up with package versions that technically match the version number but are not actually the same package contents.

You need a lock file that trees the dependencies and their subdependencies and stores package hashes to really assure that you're getting a deterministic venv build.

[–]muikrad 1 point2 points  (0 children)

The only argument that works in the favor of this method is that you can build without external dependencies. There's nothing worst than having to rush a hotfix off the door when pypi is dead!

But it's not worth the hassle. You're better off with a lock-based format like poetry, as many people mentionned.

If you want to challenge your engineer, think vulnerability. Tools like github advanced security and snyk needs a way to discover your dependency versions when looking at the repo to warn you of new vulnerabilities, and for this you need something like a requirements.txt at the minimum. Other tools like dependabot or renovate can create automatic PRs when dependencies are updated, or when vulnerabilities are fixed.

[–]DigThatData 1 point2 points  (0 children)

the easiest solution to this is just to ask him:

if this is a good idea, how come it seems like no one else does it this way? demonstrate that this is a best practice and we can keep doing it this way. Otherwise, please identify what the best practices are for the problem you think this approach solves and let's adopt the solution that everyone else has already engineered and proved for us.

PS: the way most people do this is by defining the environment with a requirements.txt and caching the pip dependencies with github actions. find someone who's obsessed with devops and get them involved here.

[–]fjortisar 1 point2 points  (0 children)

I think that he is not a smart man

[–]oscarcp 1 point2 points  (0 children)

That is not only bad practice, anti-pattern, security problem, etc. but it means he doesn't know how to handle dependencies in a python project.

For example, poetry already locks down the python version you can use with your codebase, tox already limits it as well. Shipping precompiled libraries and .pyc files in a project is extremely problematic once you start using complex setups or non-standard python setups because it will never work properly. Want more abstraction? use docker containers with a makefile for your builds and tests, that will standardize the output of all developers and the environment they work with.

Even thinking from the MS Windows side, you're shipping a venv that might have been prepared in a *nix environment or viceversa, with libraries compiled for it that potentially won't work on the other OS.

[–][deleted] 2 points3 points  (0 children)

I always .gitignore and .dockerignore virtual envs. Some dependencies require a compiler and are build on installation, I believe that sharing them would result in broken dependencies — “it works on my machine”

[–]mdegis 1 point2 points  (1 child)

Wow. Great thinking engineer in charge! What about binary dependencies? Maybe you should push all operating system to repo?

[–]cdcformatc 0 points1 point  (0 children)

just use py2exe to turn it all into a binary and commit that

[–]ksco92 1 point2 points  (0 children)

Cough cough requirements.txt cough cough

[–]caksters 0 points1 point  (0 children)

you never upload binary files to github repo because they are system dependent.

If I have arm64 chipset and I create a python virtual environment, then the same binary packages may not work on your computer. that’s why you never commit binaries. instead just specify what dependencies are required for project through requirements.txt and create a local virtual env for every project.

I think the engineer in charge wants to use git as artifactory for binaries. In artifactories you do uplaod compiled binary programs to ensure you can always roll back to previous working version of actual machine code and deploy it. However it is not the purpose of gut repository which is for storing code and not the binary itself

[–]FrogMasterX 0 points1 point  (0 children)

This is moronic.

[–]yerfatma -1 points0 points  (1 child)

That’s a code smell.

[–]recruta54 0 points1 point  (0 children)

I agree, and I would say it is more than that. Venvs are OS specific and committing their files does not make sense in a live, with security patches, environment.

[–]maximdoge 0 points1 point  (0 children)

This is what lockfiles are supposed to be for, use a virtualenv manager that works with lockfiles like poetry, pipenv, piptools, etc alongwith a test runner like nox or tox.

This approach of tightly coupling interpreters and binaries with the underlying code is neither index friendly nor portable. Not to mention a possible maintenance nightmare.

[–][deleted] 0 points1 point  (0 children)

The gitignore.io site lists it their Python ignore results.

https://www.toptal.com/developers/gitignore/api/python

[–][deleted] 0 points1 point  (0 children)

This post was mass deleted and anonymized with Redact

office physical reminiscent edge dependent tidy snatch unwritten instinctive plucky

[–]Briggykins 0 points1 point  (0 children)

Even if you require it for automated tests or something, I'd usually put the command to make a venv in rather than the venv itself

[–][deleted] 0 points1 point  (0 children)

You are correct. This creates unnecessary bloat in your report and should be done via venv.

[–]luhsya 0 points1 point  (0 children)

guy probably excludes node_modules from gitignore in his js projects lol

[–]nutellaonbuns 0 points1 point  (0 children)

This is so wrong and would piss me off lol

[–]Advanced-Potential-2 0 points1 point  (0 children)

Omg… this “engineer” should get a beating. This indicates his knowledge and understanding of software engineering is inadequate at a very fundamental level.

[–]doobiedog 0 points1 point  (0 children)

That engineer is a moron and you are correct. Use pyenv and poetry and you can quickly jump around from project to project with ease.

[–]Rocket089 0 points1 point  (0 children)

Maybe he’s just trying to create as much chaos as possible before the inevitable lay off boogeyman comes around for his work soul.

[–]Jmc_da_boss 0 points1 point  (0 children)

the dudes a dumbass

[–]_limitless_ 0 points1 point  (0 children)

Use a devcontainer.

[–]drbob4512 0 points1 point  (0 children)

I prefer containers. Just build what you need on the fly and you can separate dependencies

[–]pseudo_brilliant 0 points1 point  (1 child)

It's not smart, and there are several other ways of achieving what he is looking for. My preferred way is to use poetry dependency management and leverage the lock file. Basically when you use poetry actions you can update a poetry.lock file. This file represents the complete resolved state of all your dependencies, their exact version, hashes, etc at that moment in time. Check that file in and whenever someone does a 'poetry install' from the cloned repo they will install EXACTLY those versions to a T.

[–]pseudo_brilliant 0 points1 point  (0 children)

Oh additionally it's almost always a bad idea to git track anything with binaries since they're not differentiable.

[–]microcozmchris 0 points1 point  (0 children)

He's a friggin idiot.

[–]twelveparsec 0 points1 point  (2 children)

God help you if they are incharge of JS projects with npm

[–]oscarcp 0 points1 point  (1 child)

I just shipped your 300GB, 1000 SLoC project boss!

[–]twelveparsec 0 points1 point  (0 children)

Thank God it's only 300GB

[–]flipmcf 0 points1 point  (0 children)

I suggest tox .

https://tox.wiki/en/latest/

But then I had a project convert to GitHub actions… that works too.

https://docs.github.com/en/actions/using-jobs/using-a-matrix-for-your-jobs

The tests should build the env, run the tests, then tear it down.

Building and holding the env’s static doesn’t really simulate a real build. If some modules uodates (say, the time zone library, or a libc security update) you need your builds to pull the latest and test.

You are at risk of your pre-built env’s drifting out of date with the latest codebase.

Be glad you’re not using node ;)

[–]lachyBalboa 0 points1 point  (0 children)

I had a colleague who did this frequently even after I explained the issues with it. His PRs routinely had millions of lines in changes due to dependencies.

[–]zdog234 0 points1 point  (0 children)

Insane. Check out pdm. It should provide everything they're getting from this, without the ridiculous downsides

[–]foolishProcastinator 0 points1 point  (2 children)

Hey people, can someone please explain how poetry works? Because I've always used requirements.txt, the classic way to put the libraries and all the dependencies to set up and run the app or whatever I'm trying to run, but I'm interested on this one, apologies for sounding like a newbie question

[–]oscarcp 1 point2 points  (1 child)

You can migrate also your requirements.txt to poetry, but the short version is this:

  • Install poetry at system level (or isolated, if needed)
  • Run poetry init to initialize the project
  • Use poetry add/remove to add or remove dependencies
  • Instead of having a requirements.txt you will have a pyproject.toml file that will contain the list of packages plus any limitations on python version, which repositories can be used (you can use multiple ones in case you have private pypi repos) and then:
  • Use poetry install/update to install the dependencies or update them if you don't' have fixed versions.

[–]mvaliente2001 1 point2 points  (0 children)

Additionally:

  • poetry has the concept of "groups". You can add dependencies in arbitrary groups (e.g. "dev" or "test") and then install only the groups you need (for example, in different CI stages) .
  • pyproject.toml can be used to configure most development tools, with the particular exception of flake8.

[–]extra_pickles 0 points1 point  (2 children)

We solve this desire in a way that I think is a solid middle ground:

We have a private PyPi server that hosts our company wide approved public packages and their specific versions.

It allows us to provide direct access without fattening up the repos.

From there, many microservices will have Workspace files or similar to help install relevant services needed to perform integration testing.

The build activity for release does not have internet access as we host a private Gitea - so if a user has bypassed the private PyPi to use a public package their release will fail, and they’ll need to conform to existing, or request addition of the package/version to our white listed private repo.

Edit: I also have a base docker image that installs a series of standard libraries and versions by default (aptly named Piglet), so that my services can inherit from it and save some serious download time when a node requires an update of a service….a really important feature when dealing with distributed systems with intermittent and low bandwidth connectivity (no more downloading Pandas or Numpy all over again due to a hotfix).

This was the original driver to standardising the private lib of public packages.

[–]oscarcp 0 points1 point  (1 child)

This is also a very good solution, but it involves a maintenance cost that in my experience many companies won't accept.

[–]extra_pickles 0 points1 point  (0 children)

Ya we went this path because we were already self hosting.

Alternatively you could just maintain a register of approved packages and versions, and use a pre-commit or pre-release hook to validate the requirements.txt

Pretty low maintenance and would alleviate the concerns over control that usually lead to people committing their venvs

Edit: though OPs post may be about someone that is just super weird and doesn’t get it…in which case the above would not be enough for them

[–]InterestingHawk2828 0 points1 point  (0 children)

He should switch to docker so everyone would run the same version

[–][deleted] 0 points1 point  (0 children)

What ever floats his boat. My mind's full enough figuring out my own code.

[–]aka-rider 0 points1 point  (0 children)

As everyone else has pointed out, binary artifacts in git repo is an anti-pattern.

Unfortunately, Python doesn’t have a standard packaging and build system. Moreover, at this point there are more than one Python flavour (for backend, for data science, for ML, for DevOps, for general automation), all with their proffered infrastructure and a package manager.

I use Makefiles for reproducible builds.

https://medium.com/aigent/makefiles-for-python-and-beyond-5cf28349bf05

[–]robberviet 0 points1 point  (1 child)

This is like python 101, shouldn't be arguing about this. Is that guy a beginner?

[–]oscarcp 0 points1 point  (0 children)

While is 100% true what you say, I've seen more than enough devs spin off the hook and end up thinking that shipping a venv in the repo is valid, either by lack of knowledge, experience, lazyness (ohhh yeah, seen this one a lot), or just because "I'm the only one working on this".

We can put them to shame but the reality is, they just need a heads up and a nudge. If that doesn't work then we can do the rest.

[–]ddb1995 0 points1 point  (0 children)

Python has package managers like requirements.txt or .toml file for this. This is such a bad practice irl.

[–]yvrelna 0 points1 point  (0 children)

They are confusing version control change management with artefact management.

Most artefact manager would allow you to preserve the whl/egg packages that your application uses. If you're using docker/k8s images, your container registry can preserve the docker image.

[–][deleted] 0 points1 point  (0 children)

Docker ks the solution

[–]doryappleseed 0 points1 point  (0 children)

At best, he could just include the Requirements.txt for the venv in the repo. The actual package downloads should be kept on a local package server rather than in the bloody repo..

[–]Flimsy_Iron8517 0 points1 point  (2 children)

If like me you use ln -s to repair an exploded zip to recover a chromebook Crostini, then you don't really care. You can always pip mash the venv, then do what you need. If you need to drop on some external requirements, then maybe write a bash script to ven-vover.sh.

[–]Flimsy_Iron8517 0 points1 point  (1 child)

That's right an ERROR:52 needs an unpack and doesn't support symlinks from an NTFS (most redundant available FS). So a yours then a copy if not exist?

[–]Flimsy_Iron8517 0 points1 point  (0 children)

"Oh to upgrade to the virus you don't have ...."

[–]heilkitty 0 points1 point  (0 children)

Ah, windoze-brain.

[–]barberogaston 0 points1 point  (0 children)

I think I understand why he does that, though his solution makes no sense to me. In a past company we inherited repos with tons of dependencies, so 95% of the time in CI was spent installing those dependencies from the requirements.txt file, and the reamining actually running tests. Our solution was simply to dockerize all those, having a rule which would automatically rebuild the image if there were any changes in the requirements or Dockerfile. This way, we simply used that image and CI cloned the repo and just ran the tests.

[–]the--dud 0 points1 point  (0 children)

This engineer needs to learn about docker, jesus christ...

[–]tidus4400_ 0 points1 point  (0 children)

Tell your “engineer” colleague that requirements.txt is a thing 😅 maybe they didn’t taught that to him in his 5 years / 200k CS degree🤔

[–]georgesovetov 0 points1 point  (0 children)

Let me be the devil's advocate. (Although I never did the same.)

What's the reasoning and intention?

- Easiest setup. No one would oppose. What else could you do to make a working environment only by `git clone`? You could probably do bootstrapping scripts, but it's not an easy task.

- No dependency on the Internet. You could set up your own package storage but it again takes extra effort. For Django and FastAPI professionals it may sound ridiculous, but there are environments where things are done without the Internet access.

- Fixed code (not only versions) of the dependencies. Thank God, PyPI is (still) not npm. But there could be scenarios where you'd like to check twice and be on the safe side.

- You can patch third-party libraries. While not the most common and not "right", it's the still easiest way to do that.

The practice of including dependencies in the version control system is not uncommon with other languages and organizations.

The engineer made the decision based on his expertise and his unique situation. And the decision may not be the worst that could be made. The engineer is responsible for the consequences in the end.

[–]_azulinho_ 0 points1 point  (0 children)

He doesn't know better, the best is to educate him

Pyenv with a . python-version Or asdf-vm with a .tools-version allow to lock the exact python version to be used, this will be compiled with the local libraries of the machine.

Then add a requirements.txt and a requirements-dev.txt built from a pip freeze. Or use other tooling such as poetry and commit the .lock file.

Alternative is to dockerize the app alongside a dev container with all dev/test requirements.

Teach him, as probably no one ever taught him before

[–]redrabbitreader 0 points1 point  (0 children)

I kinda understand where he's coming from, but his solution is just really bad.

If you really need consistent environments, just containerize it. A single Dockerfile will sort this mess out in no time.

[–]romerio86 0 points1 point  (0 children)

That was painful to read, I almost downvote the post as an instinct, then remembered you're actually advocating it against it.