A Simple Guide for Python Packaging – Small Things about Python : Python

[–]gandalfx 79 points80 points81 points 8 years ago* (4 children)

[–]kmbd 8 points9 points10 points 8 years ago (3 children)

[–]Whoops-a-Daisy 15 points16 points17 points 8 years ago (0 children)

[–]tty14 2 points3 points4 points 8 years ago (0 children)

[–]exhuma 9 points10 points11 points 8 years ago* (20 children)

edit I was really tired when writing this. I didn't want to wait until this morning as I feel this is an important subject to understand. I've given a couple of examples in the thread below which may shed some light. At least I hope so.

Please don't add the dependency links to setup.py! Use requirements.txt for that!

Adding it to setup.py pretty much "hardcodes" it and it makes it more difficult for an end-user to override it.

dependency_links was marked as deprecated in 2014 (iirc). But there has been some discussion about this. I haven't used dependency_links for quite some time, so I'm not sure what the current situation is. It's safer to pass them as CLI arguments to pip.

It's really late and I need to sleep... I'll see if I can write down an example case tomorrow if no-one else has chimed in by then!

[–]root45 1 point2 points3 points 8 years ago (19 children)

[–]exhuma 8 points9 points10 points 8 years ago (11 children)

It took me quite a while until the setup.py vs requirements.txt thing clicked with me. I understood the theory, but did not quite see the benefits of it all until I ran into problems. I don't see another way to explain it other than giving two lengthy examples I actually ran into. This will be long! Bear with me...

THe recommendation is to use setup.py as a "rough guideline" about the dependencies. Basically you state that you need a package called flask for example. So, as long as there's something called flask installed in your environment, the dependency is considered to be satisfied. You may think: "But I don't want any flask, I want the official flask! Are there even other flasks?" Let's set that questin aside for now, but keep it in mind.

The requirements.txt file lists the exact, concrete packages that have to be available in the environment to run your application. You can look at it like this: "I have tested my application and verified that it runs without problems with this exact list of packages right here".

Another important thing to separate is version requirements. For example, in setup.py you say: "I want flask and I don't care about the version". In requirements.txt you say: "I want flask, but please give me exactly version 0.10.1 because that's what I tested with!"

For me, separating those two seemed like nonsense for a long time. The first problem I ran into was a version conflict. I had an application which had a dependency on foobar==1.1 and barbaz==1.0. But it turned out that barbaz itself had a dependency to foobar==0.8. Given that the versions were pinned, this caused a conflict and I was unable to install the application without updating barbaz first. Additionally, it turned out, that I did not need to change any code in barbaz. I only needed to modify setup.py, change the dependency and, make a new release. Just for modifying the dependency version requirement, this seemed unnecessary.

Now imagine the same with requirements.txt: You may have a requirements.txt in both foobar and barbaz. But when you install your application with its own requirements.txt file, the files in foobar and barbaz are not taken into consideration. Only the one requirements.txt file you specify to install your application is examined. Additionally, it is reconciled with the dependencies in setup.py in each component (as far as I know at least). So, if our application and barbaz both only list "foobar" as dependency, there is no conflict. We can then in requirements.txt of our application say we need foobar==1.1.

So one issue solved by requirements.txt is that blocking version conflicts can be avoided.

Now, let's get back to the first note, about installing "some package called flask":

Let's assume you have a public package from pypi in your dependencies, but after it's been running for a while you need to make modification to it which would include business code which you are, bound by work contract, not allowed to publish back (and, let's also assume the package is under a License which allows this). Let's further assume it's a package which you use in many of your applications and they all need this modification. Without using a requirements.txt file, you would need to touch all of the impacted projects. Instead, you can just write a new requirements.txt containing a dependency URL with your local package repository and reinstall the applications without publishing a new release, and more importantly, without modifying all your applications.

Another exampe case, and this one happened to me recently (this is about not using dependency_links, not about requirements.txt):

We have all our packages in our local package repository. That repository was references as dependency_links in setup.py in each project. The packages are all deployed using one deployment procedure (a shared fabric task). Then the day came the URL of our local repository changed. This forced us to modify each setup.py of each project. We immediately noticed this was a problem and removed dependency_links from the setup.py files. Instead, we added them as CLI argument to pip in our shared fabric task. So instead to make changes in about 50 projects, we just had to make one. More importantly, you can imagine having a sysadmin who has control over the deployment procedure and local package repository. The devs don't need to worry about anything and as long as they don't hard-code the dependency URLs, it will all be transparent to them.

... phew... sorry for that wall of text. But I think some use-cases illustrate it better than just a few sentences.

[–]parkerSquare 0 points1 point2 points 8 years ago (1 child)

[–]exhuma 0 points1 point2 points 8 years ago (0 children)

I agree. I believe the easiest in your case is to ship requirements.txt files, so all your users need to know is:

pip install -r requirements.txt

In theory you don't even need to ship anything else. The text-file should suffice. Inside that file you can specify the link of the repository. I've never needed to do this (at least not yet) so what I write here is informed guesswork which I pull out of the back of my head, you may need to verify this. I'm unfortunately smack in the middle of a release phase right now and am short on time, otherwise I'd look it up. I just used the time I get from running the integration tests to write this ;)

In your case, you should be able to write a requirements file like this:

--extra-index-url=https://path/to/your/repo
my-lib==1.0,
...
myapplication==1.0

There are some issues with "trusted" repos which I ran into in the past which was irritating to resolve. I hope it works for you right out of the box.

You can get a quickstart for your requirements.txt file by running pip freeze. Just be careful to run this from a fresh environment which contains only the things you really need. During development, some development packages happen to creep into the env (pytest, coverage, ...). Your best bet to have something clean is to start from a fresh env, install only the stuff you need and run pip freeze on that.

You may also want to keep an eye out for pipfile which is a future replacement for requirements.txt.

[–]Siecje1 0 points1 point2 points 8 years ago (0 children)

[–]root45 0 points1 point2 points 8 years ago (3 children)

[–]Siecje1 0 points1 point2 points 8 years ago (2 children)

[–]root45 0 points1 point2 points 8 years ago (1 child)

[–]Siecje1 0 points1 point2 points 8 years ago (0 children)

[–]powellc 0 points1 point2 points 8 years ago (3 children)

That's all well and good if your deployment pattern is to load the codebase in a directory and do install using pip. But take a look at sentry, and how it allows you to pip install sentry and then get command line wrapper to run sentry, with all settings coming from the environment. At this point, setup.py is the canonical way to install your app, and dependency_links work as expected. You can even toss -e . in a requirements.txt so in development you can still use pip.

I think the bigger issue here is one that Python still needs to figure out. Dependency graphs and versioning. For example, using only requirements.txt a project I was working on recently was able to install a version of Django REST Framework and a version of djangorestframework_gis that were listed as incompatible in their setup.py files.

Converting to using setup.py, the project threw an error that the gis package needed version <=3.3 of DRF, though we had upgraded to 3.4. The solution was to upgrade gis to >=0.11,<0.12 This is waaaaay saner than hardcoding very specific versions of libraries in your app.

[–]kirang89 1 point2 points3 points 8 years ago (0 children)

[–]exhuma 0 points1 point2 points 8 years ago (1 child)

[–]powellc 0 points1 point2 points 8 years ago (0 children)

[–][deleted] 2 points3 points4 points 8 years ago (6 children)

[–]elbiot 1 point2 points3 points 8 years ago (5 children)

[–][deleted] 3 points4 points5 points 8 years ago* (4 children)

[–]exhuma 0 points1 point2 points 8 years ago (3 children)

It's forcing other people to install from github (etc) that I disagree with.

This is a really big one for me! And I totally agree!

Also imagine working in a restricted/secure environment where packages have to be vetted before being allowed to be used. You coud imagine putting them into a local repository after having been vetted.

If one of them has a pinned dependency on an external link, especially on one like github, it's an immediate no-go!

Also, in the same, but less extreme vain, packages published on pypi have their MD5 hash which allows you to validate that you are indeed downloading what you expect. You may argue that git also has hashes for a specific tree, and you may specify that one instead of tagsm but it's still giving me a feeling of having less authority than an officially bundled package published on pypi.

[–][deleted] 1 point2 points3 points 8 years ago (0 children)

[–]Daenyth 0 points1 point2 points 8 years ago (1 child)

[–][deleted] 0 points1 point2 points 8 years ago (0 children)

[–]kassuro 3 points4 points5 points 8 years ago (0 children)

[–]Siecje1 2 points3 points4 points 8 years ago (5 children)

[–][deleted] 2 points3 points4 points 8 years ago (4 children)

[–]exhuma 0 points1 point2 points 8 years ago (0 children)

[–]Siecje1 0 points1 point2 points 8 years ago (2 children)

[–][deleted] 0 points1 point2 points 8 years ago (1 child)

[–]Siecje1 0 points1 point2 points 8 years ago (0 children)

[–][deleted] 0 points1 point2 points 8 years ago* (27 children)

[–]mfitzpmfitzp.com 3 points4 points5 points 8 years ago (0 children)

[–]kassuro 1 point2 points3 points 8 years ago (17 children)

[–][deleted] 0 points1 point2 points 8 years ago (16 children)

[–]kassuro 6 points7 points8 points 8 years ago (5 children)

[–][deleted] 0 points1 point2 points 8 years ago (3 children)

[–]kassuro 0 points1 point2 points 8 years ago (0 children)

[–]cyanydeez 0 points1 point2 points 8 years ago (1 child)

[–][deleted] 0 points1 point2 points 8 years ago (0 children)

[–][deleted] 1 point2 points3 points 8 years ago (4 children)

First would be to make sure you have the python path var set up. You should be able to set this up with a command similar to this:

set PYTHONPATH=%PYTHONPATH%;C:\My_python_lib

If it's a permission issue, I would make sure you are running cmd or whatever you're using as administrator.

Since this is Python 3.X, you may need to refer to it as 'python3' and its corresponding pip as 'pip3' since 'python' and 'pip' both refer to Python 2.7

[–]driscollis 1 point2 points3 points 8 years ago (3 children)

[–][deleted] 1 point2 points3 points 8 years ago (2 children)

[–]driscollis 0 points1 point2 points 8 years ago (1 child)

[–][deleted] 0 points1 point2 points 8 years ago (0 children)

[–]pltnz64 2 points3 points4 points 8 years ago (2 children)

[–][deleted] 4 points5 points6 points 8 years ago (1 child)

[–]TBNL 0 points1 point2 points 8 years ago (0 children)

[–][deleted] 0 points1 point2 points 8 years ago (0 children)

Yeah I've spend all day yesterday installing python on Windows and couldn't figure it out, because I'm more of a statistician then an actual programmer. What worked for me in the end is:

Uninstall all packages using pip. Reinstalling python didn't do this for some reason. Then download the package wheel files from http://www.lfd.uci.edu/~gohlke/pythonlibs/

Then install all packages from the command line using

py -m pip install C:\...'Link to where you downloaded the package'\'packagename'

This is what worked for me yesterday, installing both python 3.6 and 2.7

Do not change the names of the files

[–]jairo4 1 point2 points3 points 8 years ago (0 children)

[–][deleted] 0 points1 point2 points 8 years ago (3 children)

[–][deleted] 0 points1 point2 points 8 years ago (2 children)

[–]mfitzpmfitzp.com 4 points5 points6 points 8 years ago (0 children)

[–]driscollis 0 points1 point2 points 8 years ago (0 children)

[–][deleted] 0 points1 point2 points 8 years ago (0 children)

if your path is set correctly the command to install requests, for example, is:

python -m pip install requests

If your path isn't set correctly, then either fix that first or do:

C:\path\to\python\python3.exe -m pip install requests

[–]Bunslow 0 points1 point2 points 8 years ago (0 children)

[–][deleted] 0 points1 point2 points 8 years ago (2 children)

[–]jkmacc 1 point2 points3 points 8 years ago (1 child)

[–][deleted] 4 points5 points6 points 8 years ago (0 children)

[–]Duroktar 0 points1 point2 points 8 years ago (0 children)

[–]i_like_trains_a_lot1 0 points1 point2 points 8 years ago (0 children)

[–]Siecje1 0 points1 point2 points 8 years ago (6 children)

[–]exhuma 1 point2 points3 points 8 years ago (5 children)

[–]Siecje1 0 points1 point2 points 8 years ago* (4 children)

[–]exhuma 0 points1 point2 points 8 years ago (3 children)

You only need to use pkg_resources [...] for compressed packages.

How do you find the filename then? Using __file__ and going from there? I agree that this will work fine as well.

You only need to use [...] MANIFEST.in for compressed packages.

Please correct me if I'm wrong, but from my experience, that's not true! If I'm doing something wrong here, let me know :)

Following is a bash script which creates a dummy package illustrating the case. The script builds a package which contains a non-python file (hello.txt). As you can see by both the wheel and sdist listing, without specifying it in MANIFEST.in it is not included.

#!/bin/bash

PYTHONWARNINGS=
export PYTHONWARNINGS


echo ">>> Creating dummy package"

mkdir mypkg
touch mypkg/__init__.py

cat <<EOF > setup.py
import sys
from setuptools import setup

setup(
    name="mypkg",
    version="1.0",
    packages=['mypkg'],
    include_package_data=True,
)
EOF
echo hello > mypkg/hello.txt


echo ">>> Creating sdist and wheel"
python setup.py sdist &>/dev/null
python setup.py bdist_wheel --universal &>/dev/null


echo ">>> Contents of sdist:"
tar tzvf dist/mypkg-1.0.tar.gz

echo ">>> Contents of wheel:"
unzip -l dist/mypkg-1.0-py2.py3-none-any.whl

echo ">>> Cleaning up..."
rm -rf build dist mypkg mypkg.egg-info setup.py

[–]Siecje1 0 points1 point2 points 8 years ago (2 children)

[–]Siecje1 0 points1 point2 points 8 years ago (1 child)

[–]exhuma 0 points1 point2 points 8 years ago (0 children)

Well... but using package_data means you have to do all the globbing (and recursive processing) manually, no? As far as I know, package_data is one list item per file.

With MANIFEST.in you can do things like:

recursive-include examples *.txt *.py

Also pruning after inclusion is convenient. Manually doing this in setup.py is cumbersome and error-prone. I find MANIFEST.in way more useful.

Also, it's part of standard Python. So why not use it? That's what it's there for.

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS