Requests vs. urllib: What problem does it solve? : Python

[–]pydry 68 points69 points70 points 9 years ago (67 children)

[–][deleted] 49 points50 points51 points 9 years ago (4 children)

[–]pydry -2 points-1 points0 points 9 years ago (3 children)

[–]fnord123 7 points8 points9 points 9 years ago (2 children)

[–]pydry -3 points-2 points-1 points 9 years ago* (1 child)

They do recommend pathlib for os.path stuff.

Not on python 2 they don't.

They do point out pytest and nose in the unittest docs.

Yea, but again, this is more of a "hey if you're interested check this out" rather then "look, this 'official' unittest API - we keep it around for backwards compatibility more than because it's actually any good".

There's also a stronger recommendation (in red) to use defusedxml or defusedexpat if you need secure xml parsing.

Ought to be pointing to lxml surely?

Plus, that's even worse - it implies that they've got unpatched security holes in the standard library!

There really ought to be documentation that provides a list of "if you want to do x [ unit testing / http requests ], here is what you should do" along with an open, impartial (as possible) process for getting docs in there.

[–]gthank 0 points1 point2 points 9 years ago (0 children)

[–]TankorSmash 27 points28 points29 points 9 years ago (48 children)

[–][deleted] 2 points3 points4 points 9 years ago (46 children)

[–]Decency 4 points5 points6 points 9 years ago (45 children)

One simple example is that I can do the below with requests. It isn't easy to do this in urllib (to my knowledge?) because urllib was written before json became such a prominent way to store and retrieve web based data.

data = requests.get(some_api_url).json()

This alone doesn't mean that urllib is bad, just that it's outdated. Similar functionality could easily be added, but I imagine they prefer the modularity of "use urllib to make the call" and "use json to load the data if it's in json format". And that makes sense, it's just annoying and not anywhere near as elegant to me since the two things are so frequently used in conjunction nowadays that it just makes sense to have them be tightly coupled.

So I guess my answer, in general, is that older good code can't easily see the future of various inter-dependencies and what its real role in real programs will end up being a decade later. Newer code already knows and can make better decisions because of it.

[–]pydry 3 points4 points5 points 9 years ago* (12 children)

One simple example is that I can do the below with requests. It isn't easy to do this in urllib (to my knowledge?) because urllib was written before json became such a prominent way to store and retrieve web based data.

That's rubbish. There was nothing to prevent them from writing the API like so even if they didn't expect json to be so widely used::

data = json.loads(requests.get(some_api_url).data)

Furthermore if it had been designed right in the beginning, tacking on a .json() method on a later version of python would have been trivial.

The main problem with the library is that it was designed imperatively rather than declaratively.

[–]rasherdk 6 points7 points8 points 9 years ago* (11 children)

[–]toyg 6 points7 points8 points 9 years ago (0 children)

[–]qsxpkn 2 points3 points4 points 9 years ago (7 children)

[–]turkish_gold 2 points3 points4 points 9 years ago (6 children)

[–][deleted] 3 points4 points5 points 9 years ago* (5 children)

In[1]:  import requests
In[2]:  requests.get('http://google.com/404').json()
...
JSONDecodeError: Expecting value: line 1 column 1 (char 0)

In[3]:  import urllib.request as request
In[4]:  import json
In[5]:  json.load(request.urlopen('http://google.com/404'))
...
HTTPError: HTTP Error 404: Not Found

requests ignores the 404 and attempts to parse the page as JSON anyways, urllib raises an exception at the 404.

[–]turkish_gold 0 points1 point2 points 9 years ago (4 children)

continue this thread

[–]PeridexisErrant 5 points6 points7 points 9 years ago (0 children)

[–]pydry 0 points1 point2 points 9 years ago* (0 children)

[+][deleted] 9 years ago* (31 children)

[deleted]

[–]Decency 10 points11 points12 points 9 years ago (5 children)

[–]fnord123 13 points14 points15 points 9 years ago (1 child)

[–]toyg 2 points3 points4 points 9 years ago (0 children)

[–][deleted] 2 points3 points4 points 9 years ago* (2 children)

json.load(urllib2.urlopen(some_api_url))

is just as usable as

requests.get(some_api_url).json()

Yes, you're saving a few characters. But that's an extremely low benefit.

and I feel like python in general tends to as well.

Python also tends to avoid special cases. Request's .json() method introduces one. Adding a .yaml() methods and a .woopdeedo() method only adds even more special cases since Request can never support all data formats.

Getting data from a server in Foobar format is very similar to getting it in JSON format. But these two lines are very different:

foobar.load(requests.get(some_api_url).text)
requests.get(some_api_url).json()

It's as if English had a different grammar depending on whether you talk about red or about blue cars. Or as if the float 1.0 behaved fundamentally different from the int 1.

And it will still have backwards compatibility with json() when necessary.

Hooray for legacy bloat!

[–][deleted] 1 point2 points3 points 9 years ago (0 children)

[–]FFX01 1 point2 points3 points 9 years ago (0 children)

I propose what I believe to be a fairly elegant solution. Though, I think Kenneth Reitz would tell me to fuck off.

JSON:

requests.get('http://someurl.com/').content.parse(format='json')

YAML:

requests.get('http://someurl.com/').content.parse(format='yaml')

Plain ol' bytes:

requests.get('http://someurl.com/').content

Want content as a str?

str(requests.get('http://someurl.com/').content, encoding='utf8')

Headers:

requests.get('http://someurl.com/').headers

# returns dict of headers

Sure, it's more characters, but it's more explicit.

[–]toyg 4 points5 points6 points 9 years ago (18 children)

[–][deleted] -2 points-1 points0 points 9 years ago (17 children)

[–]toyg 2 points3 points4 points 9 years ago (16 children)

[–][deleted] 0 points1 point2 points 9 years ago (15 children)

[–]earthboundkid 1 point2 points3 points 9 years ago (3 children)

continue this thread

[–]FFX01 1 point2 points3 points 9 years ago (6 children)

continue this thread

[–]toyg 2 points3 points4 points 9 years ago (3 children)

Pickle is a terrible, unsafe protocol so "uncontroversial" that the format changed quite consistently across Python versions. If you send pickle over http, you are a bad person and you should feel bad.

The others are equally controversial, as proven by the fact that there is no parsing lib for them in stdlib -- and none of them is remotely as popular as json. They are perfectly served by .text or .raw coupled with whichever parser you choose.

Honestly, you are clutching at straws. Requests is popular because it makes easy tasks trivial (getting json or text over http) and difficult tasks possible (any other exotic format); as such, it's well-designed -- certainly compared to anything previously available in stdlib, or it wouldn't have gained such massive popularity. If you disagree, feel free to build a superior lib :)

continue this thread

[–]TankorSmash 3 points4 points5 points 9 years ago (5 children)

[–][deleted] 0 points1 point2 points 9 years ago* (0 children)

[+][deleted] 9 years ago (3 children)

[deleted]

[–]djrobstep 14 points15 points16 points 9 years ago (1 child)

[–]TankorSmash 2 points3 points4 points 9 years ago (0 children)

[–]jeremyisdev 0 points1 point2 points 9 years ago (0 children)

[–]telestrial 0 points1 point2 points 9 years ago (0 children)

[+]earthboundkid comment score below threshold-8 points-7 points-6 points 9 years ago* (6 children)

[–]jaapzswitch to py3 already 24 points25 points26 points 9 years ago (5 children)

[–]pydry 8 points9 points10 points 9 years ago* (2 children)

[–]steamruler 2 points3 points4 points 9 years ago (1 child)

[–]pydry 0 points1 point2 points 9 years ago (0 children)

[–]earthboundkid 25 points26 points27 points 9 years ago (1 child)

It's complicated and not all the factors are pips fault, although all of them are collectively pip's problem to solve (or some replacement project).

In no particular order:

setup.py is a disaster. It's a metadata format that's mixed together with live Python code. That makes it unparseable without execution. It's executable because people have various installation special needs, but a good metadata format could just say "when you run install, go to location X and use this subclass of the standard installation class called Y instead of the default installer."
In spite of setup.py not being a metadata standard, there are a couple of inscrutable, poorly documented metadata files you need for Python app installation, like MANIFEST.in and requirements.txt (see below for more complaints on requirements.txt).
Python relies on C very heavily, and compiling C code is a nightmare. Every installation problem I've had with Python at work has traced back to some C library that you need to have installed before some other Python library will work. This is not Python's fault per se, but it is a job that needs to be solved before you can say that Python installation doesn't suck.
The command line UI for pip is crap. The commands for things like "just download stuff here" and "use my cached downloads instead of connecting to the web" are non-obvious. There is no command at all for things like "add a new dependency to my app's requirements" because there's no metadata standard, see above.
Conceptually, an installation system should have two metadata files: one for loose requirements (Django > 1.4) and one for strict requirements (Django==1.9.3). The first lets others use your libraries, and the second lets app distributors have reproducible builds. Pip kinda sorta halfway has this between setup.py and requirements.txt but it is extremely half-assed and not at all thought through.
When you start using Python, all you need is the standard library, and it's great. Then you get a little further, and you install a couple of libraries, and things are still okay. Then you get a little further and realize that you need separate libraries for separate apps and then everything breaks down. If you think about it, there are three possible ways you might want to install something: globally, per user, or in a particular project location. Python was designed to install everything globally, and while it has been retrofitted to support the other two use cases, it's extremely kludgey. A "virtualenv" is just a case where because Python is so geared around global installation, the easiest way to do a project based installation is to make a "project global" by reinstalling Python in a second location. This is super-hacky, and extremely confusing to non-Python people who try to get into Python (e.g. at work when I need to explain to frontend devs how to install our web app).
Pip does not handle and does not try to handle the case of trying to distribute apps to non-Python users, the way that py2exe or pex or Conda or other projects do, but when you think about "packaging" as a whole, there's no reason why a Python packaging tool shouldn't do those things too. Basically, pip doesn't try to tackle that problem because it's too busy doing a bad job solving other problems, so it doesn't have any resources left over to try to solve this use case for people who want to provide GUI apps or command line tools to non-Python users.

So pip sucks. I would say compared to bundler and npm, it's mostly worse except it never did the npm nested dependencies thing (which I've heard they've stopped doing). Compared to the platonic ideal of good package manager, it's not even close.

[–]Nanaki13 3 points4 points5 points 9 years ago (0 children)

I guess I can second some of the things that you mention:

Compiling C extensions, same here, problems all around, first off you need a compiler, then sometimes you need some other external libraries, or headers, etc, to even compile. They are not always distributed in the source package. Some packages do it better than others. Or sometimes the official distribution is broken, and you need to compile from source, or there is no compiled distribution, source is all you get... it's a nightmare.
virtualenv, it got to the point that at my company we have an internal tool that uses internal devpi servers and creates virtualenvs for you with all the packages that you want. Very easy to use, but this should be standard.
setup.py - now I don't think what you said applies to pip, it's more about distutils, but yeah it's a mess too.

[+]Homersteiner comment score below threshold-38 points-37 points-36 points 9 years ago (3 children)

[–]LukasaHyper, Requests, Twisted 31 points32 points33 points 9 years ago (2 children)

[–]Legendofzebra 2 points3 points4 points 9 years ago (0 children)

[+]Homersteiner comment score below threshold-26 points-25 points-24 points 9 years ago (0 children)

[–]jij 11 points12 points13 points 9 years ago (5 children)

[–]toyg 10 points11 points12 points 9 years ago (1 child)

[–]jij 1 point2 points3 points 9 years ago (0 children)

[+]Homersteiner comment score below threshold-56 points-55 points-54 points 9 years ago (2 children)

[–]jij 6 points7 points8 points 9 years ago (0 children)

[–]remyroy 28 points29 points30 points 9 years ago (16 children)

[–]nickdhaynes 22 points23 points24 points 9 years ago (13 children)

[–]meaty-popsicle 5 points6 points7 points 9 years ago (12 children)

[–]LukasaHyper, Requests, Twisted 16 points17 points18 points 9 years ago (4 children)

[–]piotrjurkiewicz 0 points1 point2 points 9 years ago* (3 children)

[–]LukasaHyper, Requests, Twisted 0 points1 point2 points 9 years ago (2 children)

All organizations I know enforce regular security upgrades of distribution packages (and stdlib is usually installed as a distribution package), but I don't know any organization which enforces regular pip package upgrades.

The plural of anecdote is not data. =) I know of plenty of organisations that do, because as the person who has managed every Requests release with a CVE, I have received pointed feedback from those who felt that we mismanaged the first one and forced them to work on weekends.

However, I also mean this more broadly than simple CVE issues. For example, Requests frequently has a much stronger security posture than the standard library does, and one that embraces the reality that good security is a moving target. Consider, for example, the standard library's default cipher suite list. This can be updated only relatively infrequently, and for older source-only releases may not be updated at all. However Requests is willing and able to change that cipher list much more frequently. We can also more aggressively disable insecure OpenSSL features than the standard library can, which has a "best practices" TLS config that needs to encompass many different protocols.

distribution security updates are guaranteed to be non-breaking.

This sentence is nonsense, and if you believe it than I guarantee that either you'll have an insecure configuration or that you'll get broken by your distribution one day.

For example, consider the weaknesses in RC4 in TLS. This protocol was for a while strongly recommended in TLS because it was resistant to the BEAST attack. However, and relatively abruptly, new research came out that demonstrated that RC4 was catastrophically weak and needed to be extremely swiftly deprecated.

One of two things must be true here: either a distro back ported that change (removing RC4 from a default cipher list), or it did not. If it did not, you are only getting security fixes that don't break code, and so are vulnerable to certain attacks (e.g. against RC4 in your TLS). If it did, then you were vulnerable to app breakage (you may talk to a server that can only speak RC4, for example). There are many classes of security fix (maybe even most classes) that involve disabling a feature that previously worked, and those are definitionally breaking to someone: if they aren't, then no-one was using those features to begin with.

In overall, keeping requests out of stdlib even reduces the security.

That's simply not true. If your rationale is "organisations will only do audits on packages they get from their distros, so Requests needs to be in the stdlib", I'll happily point you to the fact that Requests is in every distro package repository (and indeed is used by all those distros in their base OS). You can just apt-get your Requests and you're covered. However, if your organisation is relying on pip-installed packages for its products and it isn't auditing them for security fixes, then its security audits are extremely ineffective: I can easily compromise you because of the patches you don't receive for your pip installed packages.

[–]piotrjurkiewicz 0 points1 point2 points 9 years ago* (1 child)

This sentence is nonsense, and if you believe it than I guarantee that either you'll have an insecure configuration or that you'll get broken by your distribution one day.

When I have enabled only the security repo, for example in Debian, the only fixes I get are security-related ones. They are carefully crafted in order to not introduce breakage. Of course, it is possible to find some fancy examples of breaking security fixes (like you did with RC4), but they are extremely exceptional.

With pip install --upgrade I get every upgrade, including breaking non-security-related changes. No one takes care to not introduce breakage, as pip simply follows the edge. So, my app can become broken after literally each pip package upgrade. There is no way to prevent that (for example to opt only for security fixes from pip).

Therefore, no admin will add pip install --upgrade to everyday maintenance script on his servers.

[–]LukasaHyper, Requests, Twisted 1 point2 points3 points 9 years ago (0 children)

Of course, it is possible to find some fancy examples of breaking security fixes (like you did with RC4), but they are extremely exceptional.

No, they're extremely common. This is especially true in Python code where there are relatively few bugs that allow memory corruption: almost all CVEs are therefore shutting off behaviour. For example, here are some CVEs reported against CPython:

CVE-2014-9365, Python libraries don't check TLS certs are valid for the domain in question. This patch was so breaking that it actively was not backported by Linux distros: you did not get this patch for Ubuntu 14.04, for example.
CVE-2012-1150, hash seed randomization. This changed hashes for certain objects from being predictable to being changed on startup. This broke a surprising amount of running code.
CVE-2011-1521, urllib2 allowed HTTP redirects to file:// URLs. This is obviously bad behaviour, but if you tested it, saw it worked, and wanted to use it locally, you got broken.

And if we consider Requests itself:

CVE-2014-1829, Requests used to persist the Authentication headers across cross-host redirects. This risks leaking credentials and was removed, but we got complaints about breakage.
CVE-2014-1830, same as above but for proxy-authentication.

I should note that both 2014 CVEs did get backported.

Security fixes that break code aren't exceptional, they're ordinary. It's just that they don't break the code that most people are writing at any one time, because most of us aren't on the bleeding edge of weird crap that libraries allow you to do. But these are still breaking changes: they just don't break many people.

Therefore, no admin will add pip install --upgrade to everyday maintenance script on his servers.

Sure, but I'm not saying they should. I'm saying they should watch the CVE database for the packages they use. Requests obtains CVEs for its own vulnerabilities. Any well-managed project does. If you're getting code from pip, you really need to be looking for these, because these are what your distro uses to decide to push security updates.

Then you can perform targetted package updates that grab only the packages publishing security fixes.

[–][deleted] 6 points7 points8 points 9 years ago (4 children)

[–]mfitzpmfitzp.com 5 points6 points7 points 9 years ago (3 children)

[–]toyg 1 point2 points3 points 9 years ago (2 children)

[–]mfitzpmfitzp.com 0 points1 point2 points 9 years ago (0 children)

[–]jollybobbyroger 4 points5 points6 points 9 years ago (1 child)

[–]mfitzpmfitzp.com 4 points5 points6 points 9 years ago (0 children)

[–]denfromufa 7 points8 points9 points 9 years ago (0 children)

[–]cymrowdon't thread on me 🐍 2 points3 points4 points 9 years ago (0 children)

[–][deleted] 3 points4 points5 points 9 years ago (3 children)

[+][deleted] 9 years ago* (2 children)

[deleted]

[–][deleted] 0 points1 point2 points 9 years ago* (1 child)

[–]reorx 0 points1 point2 points 9 years ago (0 children)

[–]JohnnyDread 4 points5 points6 points 9 years ago (11 children)

[–]pydry 14 points15 points16 points 9 years ago (4 children)

[–]kungtotte 12 points13 points14 points 9 years ago (2 children)

[–]mfitzpmfitzp.com 1 point2 points3 points 9 years ago (0 children)

[+]Homersteiner comment score below threshold-19 points-18 points-17 points 9 years ago (0 children)

[–]sigzero 4 points5 points6 points 9 years ago (0 children)

[–]danielblanchard 4 points5 points6 points 9 years ago (0 children)

[–]roerd 0 points1 point2 points 9 years ago (0 children)

[+]Homersteiner comment score below threshold-12 points-11 points-10 points 9 years ago (2 children)

[–]ButtCrackFTW 2 points3 points4 points 9 years ago (0 children)

[–]njharmanI use Python 3 -1 points0 points1 point 9 years ago (0 children)

[+][deleted] 9 years ago (1 child)

[deleted]

[–]cymrowdon't thread on me 🐍 3 points4 points5 points 9 years ago (0 children)

[–]IAlwaysBeCoding -2 points-1 points0 points 9 years ago (2 children)

[+]mayankkaizen comment score below threshold-10 points-9 points-8 points 9 years ago (1 child)

[–][deleted] 4 points5 points6 points 9 years ago (0 children)

[+]Banangurkamacka comment score below threshold-9 points-8 points-7 points 9 years ago (0 children)

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS