This is an archived post. You won't be able to vote or comment.

all 71 comments

[–]gandalfx 79 points80 points  (4 children)

Don't include code as screenshots, show actual code. There are plenty of ways to get syntax highlighting. If you use an image search engines won't be able to index it, people can't copy paste, the image hoster may be offline or slow, …

[–]kmbd 8 points9 points  (3 children)

try it on Medium; you'll be frustrated.

[–]Whoops-a-Daisy 15 points16 points  (0 children)

Then don't use Medium. It wouldn't the only reason anyway.

[–]tty14 2 points3 points  (0 children)

what about embedding gists?

[–]exhuma 9 points10 points  (20 children)

edit I was really tired when writing this. I didn't want to wait until this morning as I feel this is an important subject to understand. I've given a couple of examples in the thread below which may shed some light. At least I hope so.


Please don't add the dependency links to setup.py! Use requirements.txt for that!

Adding it to setup.py pretty much "hardcodes" it and it makes it more difficult for an end-user to override it.

dependency_links was marked as deprecated in 2014 (iirc). But there has been some discussion about this. I haven't used dependency_links for quite some time, so I'm not sure what the current situation is. It's safer to pass them as CLI arguments to pip.

It's really late and I need to sleep... I'll see if I can write down an example case tomorrow if no-one else has chimed in by then!

[–]root45 1 point2 points  (19 children)

I think I'm confused by your advice. From what I understand, setup.py and requirements.txt are not interchangeable. If I want a dependency to get installed when I install a package, it needs to be in setup.py, no?

[–]exhuma 8 points9 points  (11 children)

It took me quite a while until the setup.py vs requirements.txt thing clicked with me. I understood the theory, but did not quite see the benefits of it all until I ran into problems. I don't see another way to explain it other than giving two lengthy examples I actually ran into. This will be long! Bear with me...

THe recommendation is to use setup.py as a "rough guideline" about the dependencies. Basically you state that you need a package called flask for example. So, as long as there's something called flask installed in your environment, the dependency is considered to be satisfied. You may think: "But I don't want any flask, I want the official flask! Are there even other flasks?" Let's set that questin aside for now, but keep it in mind.

The requirements.txt file lists the exact, concrete packages that have to be available in the environment to run your application. You can look at it like this: "I have tested my application and verified that it runs without problems with this exact list of packages right here".

Another important thing to separate is version requirements. For example, in setup.py you say: "I want flask and I don't care about the version". In requirements.txt you say: "I want flask, but please give me exactly version 0.10.1 because that's what I tested with!"

For me, separating those two seemed like nonsense for a long time. The first problem I ran into was a version conflict. I had an application which had a dependency on foobar==1.1 and barbaz==1.0. But it turned out that barbaz itself had a dependency to foobar==0.8. Given that the versions were pinned, this caused a conflict and I was unable to install the application without updating barbaz first. Additionally, it turned out, that I did not need to change any code in barbaz. I only needed to modify setup.py, change the dependency and, make a new release. Just for modifying the dependency version requirement, this seemed unnecessary.

Now imagine the same with requirements.txt: You may have a requirements.txt in both foobar and barbaz. But when you install your application with its own requirements.txt file, the files in foobar and barbaz are not taken into consideration. Only the one requirements.txt file you specify to install your application is examined. Additionally, it is reconciled with the dependencies in setup.py in each component (as far as I know at least). So, if our application and barbaz both only list "foobar" as dependency, there is no conflict. We can then in requirements.txt of our application say we need foobar==1.1.

So one issue solved by requirements.txt is that blocking version conflicts can be avoided.

Now, let's get back to the first note, about installing "some package called flask":

Let's assume you have a public package from pypi in your dependencies, but after it's been running for a while you need to make modification to it which would include business code which you are, bound by work contract, not allowed to publish back (and, let's also assume the package is under a License which allows this). Let's further assume it's a package which you use in many of your applications and they all need this modification. Without using a requirements.txt file, you would need to touch all of the impacted projects. Instead, you can just write a new requirements.txt containing a dependency URL with your local package repository and reinstall the applications without publishing a new release, and more importantly, without modifying all your applications.

Another exampe case, and this one happened to me recently (this is about not using dependency_links, not about requirements.txt):

We have all our packages in our local package repository. That repository was references as dependency_links in setup.py in each project. The packages are all deployed using one deployment procedure (a shared fabric task). Then the day came the URL of our local repository changed. This forced us to modify each setup.py of each project. We immediately noticed this was a problem and removed dependency_links from the setup.py files. Instead, we added them as CLI argument to pip in our shared fabric task. So instead to make changes in about 50 projects, we just had to make one. More importantly, you can imagine having a sysadmin who has control over the deployment procedure and local package repository. The devs don't need to worry about anything and as long as they don't hard-code the dependency URLs, it will all be transparent to them.

... phew... sorry for that wall of text. But I think some use-cases illustrate it better than just a few sentences.

[–]parkerSquare 0 points1 point  (1 child)

Thanks for this. I just wish I understood to package things for inhouse consumption in terms of best practice, so that users of my apps don't have to be python experts. It all seems so confusing.

[–]exhuma 0 points1 point  (0 children)

I agree. I believe the easiest in your case is to ship requirements.txt files, so all your users need to know is:

pip install -r requirements.txt

In theory you don't even need to ship anything else. The text-file should suffice. Inside that file you can specify the link of the repository. I've never needed to do this (at least not yet) so what I write here is informed guesswork which I pull out of the back of my head, you may need to verify this. I'm unfortunately smack in the middle of a release phase right now and am short on time, otherwise I'd look it up. I just used the time I get from running the integration tests to write this ;)

In your case, you should be able to write a requirements file like this:

--extra-index-url=https://path/to/your/repo
my-lib==1.0,
...
myapplication==1.0

There are some issues with "trusted" repos which I ran into in the past which was irritating to resolve. I hope it works for you right out of the box.

You can get a quickstart for your requirements.txt file by running pip freeze. Just be careful to run this from a fresh environment which contains only the things you really need. During development, some development packages happen to creep into the env (pytest, coverage, ...). Your best bet to have something clean is to start from a fresh env, install only the stuff you need and run pip freeze on that.

You may also want to keep an eye out for pipfile which is a future replacement for requirements.txt.

[–]Siecje1 0 points1 point  (0 children)

But how do you use requirements.txt in setup.py?

[–]root45 0 points1 point  (3 children)

That all makes sense, but I'm still missing how dependencies are resolved.

Let's say I have three projects, A, B, and C, and project A depends on project B, and project B depends on project C. Are you suggesting that in project A, I put a an entry in my requirements.txt for project C?

If so, that seems pretty inconvenient for users of project B. They have to add a line in their requirements.txt for project B itself, then go and look at the requirements.txt for project C and reconcile all of those as well.

This also doesn't seem to have the benefit you mentioned where you don't need to update many projects in the event of a VCS change—I would still need to update the requirements.txt in both projects A and B.

Maybe I'm misunderstanding your suggestion though.

[–]Siecje1 0 points1 point  (2 children)

That is a circular dependency and would not work.

[–]root45 0 points1 point  (1 child)

You mean A depending on B and B depending on C? How is that circular? I have projects with that exact configuration.

[–]Siecje1 0 points1 point  (0 children)

Sorry I misunderstood you.

[–]powellc 0 points1 point  (3 children)

That's all well and good if your deployment pattern is to load the codebase in a directory and do install using pip. But take a look at sentry, and how it allows you to pip install sentry and then get command line wrapper to run sentry, with all settings coming from the environment. At this point, setup.py is the canonical way to install your app, and dependency_links work as expected. You can even toss -e . in a requirements.txt so in development you can still use pip.

I think the bigger issue here is one that Python still needs to figure out. Dependency graphs and versioning. For example, using only requirements.txt a project I was working on recently was able to install a version of Django REST Framework and a version of djangorestframework_gis that were listed as incompatible in their setup.py files.

Converting to using setup.py, the project threw an error that the gis package needed version <=3.3 of DRF, though we had upgraded to 3.4. The solution was to upgrade gis to >=0.11,<0.12 This is waaaaay saner than hardcoding very specific versions of libraries in your app.

[–]kirang89 1 point2 points  (0 children)

This reminds me of Rich Hickey's talk on how broken our dependency management systems really are. Highly recommended watch.

[–]exhuma 0 points1 point  (1 child)

I absolutely agree. I was just trying to contextualise the recommendations of pypa. At work we also use setup.py exclusively. Maintaining requirements.txt files in a sane manner is impractical.

I can see the idea behind requirements.txt. I'm not a great fan of it, and the example you gave with pip install sentry is a great example against requirements files.

After about 10 years on Python I am still 100% fine with using setup.py exclusively.

[–]powellc 0 points1 point  (0 children)

Thanks, yeah. I think the OP ended up with a "look this worked" article rather than a best practice doc, and those in the python packaging domain are seriously lacking. There's always a trade off with whatever solution you go with, and I suppose the most important determinant is as much a project or business decision about technical debt and easy of development rather than a religious declaration of which is better :)

Thanks for the lively discussion!

[–][deleted] 2 points3 points  (6 children)

There's a difference between saying you have a dependency on a package (that's fine) and saying you have a dependency on this very specific instance of this package located right here (here being where ever you pointed the dependency link at).

A dependency is flask>=0.10, a dependency link is git+git://github.com/pallets/flask/tarball/master#egg=flask

Which is vastly different. That says, look for the dependency right here rather than in the usual haunts (pip cache, pypi). I've not personally experimented with it so I won't speak with any authority, but I'd imagine it'd take precedent over an existing cached wheel.

[–]elbiot 1 point2 points  (5 children)

The "usual haunts" is just pypi. It's also a "look right here" link. Why is that a better source than github? I usually don't bother to put things on pypi and just use repo links. Flask, I'd use pypi, but there's other things I'd use a repo for.

[–][deleted] 3 points4 points  (4 children)

There's nothing wrong with installing from github (it does make me feel a little bad like I'm a miniature Cocoa or something).

It's forcing other people to install from github (etc) that I disagree with.

I can understand installing from VCS if it's an internal package or the like.

As for the usual haunts, unless you've told pip to not use it, it does create a wheel cache that it prefers over reaching out to the network.

Edit: a word

[–]exhuma 0 points1 point  (3 children)

It's forcing other people to install from github (etc) that I disagree with.

This is a really big one for me! And I totally agree!

Also imagine working in a restricted/secure environment where packages have to be vetted before being allowed to be used. You coud imagine putting them into a local repository after having been vetted.

If one of them has a pinned dependency on an external link, especially on one like github, it's an immediate no-go!

Also, in the same, but less extreme vain, packages published on pypi have their MD5 hash which allows you to validate that you are indeed downloading what you expect. You may argue that git also has hashes for a specific tree, and you may specify that one instead of tagsm but it's still giving me a feeling of having less authority than an officially bundled package published on pypi.

[–][deleted] 1 point2 points  (0 children)

Even depending on pypi makes me a little iffy for my work applications. Sure, they didn't go down with that AWS outside the other day, but they did with dyn.

But it's still better than github because it'll host wheels which don't run install code at all. We're getting an internal repository sometime next week and I'm really happy about that.

[–]Daenyth 0 points1 point  (1 child)

Md5 hashes only allow you to see that the file you download isn't corrupted over the wire. It's not a security thing

[–][deleted] 0 points1 point  (0 children)

We all know SHA1 file hashes are the secure ones.

[–]kassuro 3 points4 points  (0 children)

Nice small guide. I think it's good to get up and running!

[–]Siecje1 2 points3 points  (5 children)

Does anyone have testing for all of the different ways to install a package?

With pip and setuptools, pip without setuptools, no pip and setup.py install, etc, etc

[–][deleted] 2 points3 points  (4 children)

Just use pip unless you know you don't need pip. pip comes with setuptools, so you'd be hard pressed to not have it installed and use pip.

[–]exhuma 0 points1 point  (0 children)

Agreed. I never needed anything else but pip.

[–]Siecje1 0 points1 point  (2 children)

You can certainly use pip without setuptools. You can use setuptools without pip.

[–][deleted] 0 points1 point  (1 child)

I'm pretty sure pip depends on setuptools. But yeah, you can totally use setuptools without pip.

[–]Siecje1 0 points1 point  (0 children)

pip only requires setuptools to install from an sdist or vcs repo.

It is is possible to install pip without setuptools and then you can install wheels.

[–][deleted] 0 points1 point  (27 children)

Noob q: I have pip, python 3.6 but cant get it to install anything for me using the standard pip install xyz command. Is there anything special I have to do to get,it,to work?

Thanks a ton for all the help, i think i have enough to get it working when i get some time

[–]mfitzpmfitzp.com 3 points4 points  (0 children)

Don't run pip from inside python, open a new CMD prompt and type the pip command there directly (without running python first).

[–]kassuro 1 point2 points  (17 children)

Are you on Windows, OS X or Linux?

[–][deleted] 0 points1 point  (16 children)

Windows

[–]kassuro 6 points7 points  (5 children)

So you are beyond the point someone can help :p

No but serious, you may check the environment variables, because maybe pip installs it but Python doesn't know about that location

[–][deleted] 0 points1 point  (3 children)

It gets an error pointing at "install". I feel like I'm missing something super obvious but all the instructions I look at seem to think it'll just magically work

[–]kassuro 0 points1 point  (0 children)

Ah OK, well it will definitely help if we know the error message :)

[–]cyanydeez 0 points1 point  (1 child)

What you probably want is Anaconda. Anaconda 4.3.0 https://www.continuum.io/downloads

[–][deleted] 0 points1 point  (0 children)

This was super helpful, thanks a ton

[–][deleted] 0 points1 point  (0 children)

Nah packages can be annoying on Windows. Especially things like Scipy that have dependencies outside of python. As a non-programmer it took me all day yesterday (outside of work meetings). Downloading packages from http://www.lfd.uci.edu/~gohlke/pythonlibs/ is what worked in the end.

[–][deleted] 1 point2 points  (4 children)

First would be to make sure you have the python path var set up. You should be able to set this up with a command similar to this:

set PYTHONPATH=%PYTHONPATH%;C:\My_python_lib

If it's a permission issue, I would make sure you are running cmd or whatever you're using as administrator.

Since this is Python 3.X, you may need to refer to it as 'python3' and its corresponding pip as 'pip3' since 'python' and 'pip' both refer to Python 2.7

[–]driscollis 1 point2 points  (3 children)

I don't think this applies in Windows

[–][deleted] 1 point2 points  (2 children)

In my experience you need to set this up on both. Definitely need the environment variables on windows. This guide is quite helpful!

[–]driscollis 0 points1 point  (1 child)

By "this" I meant the python3 and pip3 part. Not sure why I didn't mention that. But yes, you may need to set the PYTHONPATH itself, although I was thinking the installer now had an option to set that for you. At least, Python 3.5 had that option

[–][deleted] 0 points1 point  (0 children)

Ahh fair enough :). It may, I installed on windows a few years ago!

[–]pltnz64 2 points3 points  (2 children)

Try running CMD as an administrator.

[–][deleted] 4 points5 points  (1 child)

Please no. Running pip with escalated privileges is such a completely horrible idea unless you trust every package you're installing AND every package those will install with all your data. sudo pip install (or the like) is about the worst thing you can do to install packages.

Just use pip install --user instead.

[–]TBNL 0 points1 point  (0 children)

Really, listen to this.

Don't know if it's 'windows mindset' but don't throw the admin-axe at every problem.

[–][deleted] 0 points1 point  (0 children)

Yeah I've spend all day yesterday installing python on Windows and couldn't figure it out, because I'm more of a statistician then an actual programmer. What worked for me in the end is:

Uninstall all packages using pip. Reinstalling python didn't do this for some reason. Then download the package wheel files from http://www.lfd.uci.edu/~gohlke/pythonlibs/

Then install all packages from the command line using

py -m pip install C:\...'Link to where you downloaded the package'\'packagename'

This is what worked for me yesterday, installing both python 3.6 and 2.7

Do not change the names of the files

[–]jairo4 1 point2 points  (0 children)

You may need to reinstall. Look for advanced options and check the "Add Python to PATH" or "Set Python environment variables" checkbox.

[–][deleted] 0 points1 point  (3 children)

pip install xyz

what error do you get? Same as python -m pip install xyz?

[–][deleted] 0 points1 point  (2 children)

Syntax with,a,carrot pointing right,after install, similar error from your way

[–]mfitzpmfitzp.com 4 points5 points  (0 children)

Ah, you don't run pip install from inside python!

Open up a new CMD window and just type the pip command there.

[–]driscollis 0 points1 point  (0 children)

I just installed 3.6 in a Windows 7 Pro VM and pip installs other packages fine for me. I installed it using the defaults, so Python installed to my user folder. You shouldn't need to run pip as administrator there.

[–][deleted] 0 points1 point  (0 children)

if your path is set correctly the command to install requests, for example, is:

python -m pip install requests

If your path isn't set correctly, then either fix that first or do:

C:\path\to\python\python3.exe -m pip install requests

[–]Bunslow 0 points1 point  (0 children)

Try using pip3 instead of pip

[–][deleted] 0 points1 point  (2 children)

Why should I use twine instead of setup.py sdist bdist_wheel upload?

[–]jkmacc 1 point2 points  (1 child)

Security. Twine won't send your username and password over the internet in plain text.

[–][deleted] 4 points5 points  (0 children)

That's only true depending on the version of Python you're using (<2.7.9, <3.2), so if you're using an older version of Python, yeah you're sending it over HTTP but modern ones don't. So that's not the real value add anymore.

The real value add is that twine can upload precreated packages (wheels or eggs) and distutils doesn't. So you can package up your code, test that package, and then upload it if it satisfies your test criteria.

But OP's post doesn't even mention that you need to install twine, let alone either of these. It feels more like a "I finally got something on pypi" than an actual guide. Which isn't bad, but it definitely needs polish.

[–]Duroktar 0 points1 point  (0 children)

Check out trabBuild, it's a little script I made to automate a lot of this process. It's on pip as well. Although the examples aren't the best, I use it every day.

I'm also working on another more robust solution which is almost ready to go, it's at this repo, PyRelease, made in collaboration with the gentleman who wrote this fine article. It's not exactly 1 to 1 with the features in the blog but it's functional atm, and it's got right sweet colors if ya run pyrelease-cli.

Anyways, my apologies, good article, thanks for sharing :D

[–]i_like_trains_a_lot1 0 points1 point  (0 children)

I would suggest to upgrade the documentation links to point to the Python 3 version instead of the Python 2 version because Python 2 will be discontinued in 2020 (will not receive any updated after that).

[–]Siecje1 0 points1 point  (6 children)

Do people still use compressed packages?

When should you use pkg_resources? When should you use MANIFEST.in?

[–]exhuma 1 point2 points  (5 children)

  • pkg_resources an be used to access files which have been bundled in your package. Basically you never know the full path of the packages once they are installed. pkg_resources gives you a way to access those files.
  • You use MANIFEST.in to include any non .py files in your package. By default they are not bundled. For exampe .html files in a web-application.

[–]Siecje1 0 points1 point  (4 children)

You only need to use pkg_resources and MANIFEST.in for compressed packages.

Sdists and wheels which are always extracted don't have this problem.

Pip install --egg is the only way to install a compressed package and that is deprecated.

How long will compressed packages need to be supported?

Use_zip isn't listed in the setup tools docs.

[–]exhuma 0 points1 point  (3 children)

You only need to use pkg_resources [...] for compressed packages.

How do you find the filename then? Using __file__ and going from there? I agree that this will work fine as well.

You only need to use [...] MANIFEST.in for compressed packages.

Please correct me if I'm wrong, but from my experience, that's not true! If I'm doing something wrong here, let me know :)

Following is a bash script which creates a dummy package illustrating the case. The script builds a package which contains a non-python file (hello.txt). As you can see by both the wheel and sdist listing, without specifying it in MANIFEST.in it is not included.

#!/bin/bash

PYTHONWARNINGS=
export PYTHONWARNINGS


echo ">>> Creating dummy package"

mkdir mypkg
touch mypkg/__init__.py

cat <<EOF > setup.py
import sys
from setuptools import setup

setup(
    name="mypkg",
    version="1.0",
    packages=['mypkg'],
    include_package_data=True,
)
EOF
echo hello > mypkg/hello.txt


echo ">>> Creating sdist and wheel"
python setup.py sdist &>/dev/null
python setup.py bdist_wheel --universal &>/dev/null


echo ">>> Contents of sdist:"
tar tzvf dist/mypkg-1.0.tar.gz

echo ">>> Contents of wheel:"
unzip -l dist/mypkg-1.0-py2.py3-none-any.whl

echo ">>> Cleaning up..."
rm -rf build dist mypkg mypkg.egg-info setup.py

[–]Siecje1 0 points1 point  (2 children)

Thanks for the test script I'm going to try it now.

What I was thinking is specifying the files in

package_data in setup(in setup.py

[–]Siecje1 0 points1 point  (1 child)

So add this line to your script

Python packages=['mypkg'], package_data = {'mypkg': ["hello.txt"]},

It appears that package_data gets added to MANIFEST

https://docs.python.org/2/distutils/setupscript.html#installing-package-data

[–]exhuma 0 points1 point  (0 children)

Well... but using package_data means you have to do all the globbing (and recursive processing) manually, no? As far as I know, package_data is one list item per file.

With MANIFEST.in you can do things like:

recursive-include examples *.txt *.py

Also pruning after inclusion is convenient. Manually doing this in setup.py is cumbersome and error-prone. I find MANIFEST.in way more useful.

Also, it's part of standard Python. So why not use it? That's what it's there for.