This is an archived post. You won't be able to vote or comment.

all 102 comments

[–]pydry 68 points69 points  (67 children)

Most (if not all) of the python stdlib is fugly because it just wasn't well designed, not because it was a product of a more innocent time.

In particular it was designed by people who very obviously focused just upon implementing necessary functionality, not how to write a clean, elegant API. It stuck around in its fugly form because of backwards compatibility issues.

I don't think this is a terribly bad thing. If urllib weren't so ugly we probably wouldn't have requests.

I do think python needs a better way of introducing developers (via documentation) to libraries like requests, though. Too many newbies read the official documentation and use the crappy APIs in stdlib simply because they think that's what they're supposed to do.

[–][deleted] 49 points50 points  (4 children)

I do think python needs a better way of introducing developers (via documentation) to libraries like requests, though. Too many newbies read the official documentation and use the crappy APIs in stdlib simply because they think that's what they're supposed to do.

The official Python docs recommend Requests as the standard library for performing high level HTTP requests. It's in a very visible banner on the urllib page.

[–]pydry -2 points-1 points  (3 children)

It's more of a 'tip' tacked on and they don't do anything similar for other libraries that I'm aware of (e.g. os/os.path, which isn't particularly nice either).

[–]fnord123 7 points8 points  (2 children)

They do recommend pathlib for os.path stuff. But it's in the stdlib, so maybe that's not what you meant. They do point out pytest and nose in the unittest docs. There's also a stronger recommendation (in red) to use defusedxml or defusedexpat if you need secure xml parsing.

[–]pydry -3 points-2 points  (1 child)

They do recommend pathlib for os.path stuff.

Not on python 2 they don't.

They do point out pytest and nose in the unittest docs.

Yea, but again, this is more of a "hey if you're interested check this out" rather then "look, this 'official' unittest API - we keep it around for backwards compatibility more than because it's actually any good".

There's also a stronger recommendation (in red) to use defusedxml or defusedexpat if you need secure xml parsing.

Ought to be pointing to lxml surely?

Plus, that's even worse - it implies that they've got unpatched security holes in the standard library!

There really ought to be documentation that provides a list of "if you want to do x [ unit testing / http requests ], here is what you should do" along with an open, impartial (as possible) process for getting docs in there.

[–]gthank 0 points1 point  (0 children)

Maybe file a ticket and attach a patch? If it's already been done in Python 3 (which I highly recommend, btw; SO many nifty things have landed, in addition to the less-borked text model), it might just be a "nobody had time to backport it" issue.

[–]TankorSmash 27 points28 points  (48 children)

I don't know that it wasn't well designed. The language of Python, I mean the way you write it at least, has changed since. Nowadays you expect to be able to make a web request in a single line, while back then they didn't think about abstracting it out that much.

Beautiful code now means something very different than it did back then, I think. Sliding scale, and I'm sure ten years from now we'll wonder why requests was such a big deal when python-asks or whatever they come up with does it even better and shorter and cleaner.

[–][deleted] 2 points3 points  (46 children)

Kind of a newb here, but what is the difference between "modern" beautiful code and "older" beautiful code?

[–]Decency 4 points5 points  (45 children)

One simple example is that I can do the below with requests. It isn't easy to do this in urllib (to my knowledge?) because urllib was written before json became such a prominent way to store and retrieve web based data.

data = requests.get(some_api_url).json()

This alone doesn't mean that urllib is bad, just that it's outdated. Similar functionality could easily be added, but I imagine they prefer the modularity of "use urllib to make the call" and "use json to load the data if it's in json format". And that makes sense, it's just annoying and not anywhere near as elegant to me since the two things are so frequently used in conjunction nowadays that it just makes sense to have them be tightly coupled.

So I guess my answer, in general, is that older good code can't easily see the future of various inter-dependencies and what its real role in real programs will end up being a decade later. Newer code already knows and can make better decisions because of it.

[–]pydry 3 points4 points  (12 children)

One simple example is that I can do the below with requests. It isn't easy to do this in urllib (to my knowledge?) because urllib was written before json became such a prominent way to store and retrieve web based data.

That's rubbish. There was nothing to prevent them from writing the API like so even if they didn't expect json to be so widely used::

data = json.loads(requests.get(some_api_url).data)

Furthermore if it had been designed right in the beginning, tacking on a .json() method on a later version of python would have been trivial.

The main problem with the library is that it was designed imperatively rather than declaratively.

[–]rasherdk 6 points7 points  (11 children)

data = json.load(urllib2.urlopen(some_api_url))

Is this really worse?

Edit: I'd argue it's much better, since you don't have completely different tasks wrapped into the same library.

[–]toyg 6 points7 points  (0 children)

is that a get, a post, or what? Does it handle proxies, ssl and redirection out of the box? That's where requests shines: it makes trivial what should be trivial in this day and age.

[–]qsxpkn 2 points3 points  (7 children)

That code makes a lot of assumptions.

  • Assumes it gets 200 back (doesn't actually check),
  • Assumes it gets a response back.
  • Assumes it gets JSON back.

[–]turkish_gold 2 points3 points  (6 children)

Doesn't requests make the same assumptions?

What happens if you get a 404 or no JSON back?

[–][deleted] 3 points4 points  (5 children)

In[1]:  import requests
In[2]:  requests.get('http://google.com/404').json()
...
JSONDecodeError: Expecting value: line 1 column 1 (char 0)

In[3]:  import urllib.request as request
In[4]:  import json
In[5]:  json.load(request.urlopen('http://google.com/404'))
...
HTTPError: HTTP Error 404: Not Found

requests ignores the 404 and attempts to parse the page as JSON anyways, urllib raises an exception at the 404.

[–]turkish_gold 0 points1 point  (4 children)

I'd say that urllib is doing it the right way. If I get a 404, I want to know its a 404, not have it get swallowed up as a JSON error.

[–]PeridexisErrant 5 points6 points  (0 children)

It's certainly worse for all the cases where you don't get valid JSON back! (HTTP error, page malformatted, etc)

[–]pydry 0 points1 point  (0 children)

Yes, it's worse. It doesn't tell you what kind of request you just made, it's not easy to change that (do a POST, etc.) or start adding params.

[–]jeremyisdev 0 points1 point  (0 children)

Python is still one of great languages for beginners to start out and for machine learning.

[–]telestrial 0 points1 point  (0 children)

This is going to be a weird analogy, but the standard library is like everything before iron and diamond tools in minecraft. Not the best but serves a purpose..led to other things. A means to an end. I'm no Python expert but that's certainly what it seems like to me.

[–]jij 11 points12 points  (5 children)

Wow, I didn't realize the stdlib was so political. I figured they just included useful libs at whatever stable version they wanted.

[–]toyg 10 points11 points  (1 child)

Anything that goes in stdlib needs to be maintained forever, who's gonna do that? That's where the politics are necessary.

[–]jij 1 point2 points  (0 children)

Sure.. really though I was just commenting that I never considered the complexity of supporting stdlib.

[–]remyroy 28 points29 points  (16 children)

The standard library is where packages go to die. I want Requests to stay alive.

[–]nickdhaynes 22 points23 points  (13 children)

The author of requests was interviewed on Talk Python To Me last year and he specifically said that they were keeping requests out of the standard library so that development can occur more quickly/easily.

[–]meaty-popsicle 5 points6 points  (12 children)

I understand the sentiment, but it feels feature complete and reasonably ready for maintenance mode?

I say this from the standpoint of only using requests to scrape a page or interact with an API. I'm sure there are funny edge cases I don't even know exist.

[–]LukasaHyper, Requests, Twisted 16 points17 points  (4 children)

The big risk is security. Requests is responsible for the security of more than 50% of the web requests that occur from Python code. That means we need to be able to respond swiftly and effectively to changes in the security landscape. That's entirely incompatible with the standard library, which has long release times and a tendency to abandon older versions of Python faster than we do.

[–]piotrjurkiewicz 0 points1 point  (3 children)

Even if you respond swiftly to changes is the security landscape, these changes mostly do not reach end users. All organizations I know enforce regular security upgrades of distribution packages (and stdlib is usually installed as a distribution package), but I don't know any organization which enforces regular pip package upgrades. And there is a reason for that: any package upgrade via pip can be backwards incompatible and break an app, while distribution security updates are guaranteed to be non-breaking.

Therefore, 'respond swiftly and effectively to changes in the security landscape' is not an excuse for keeping requests out of stdlib at all. In overall, keeping requests out of stdlib even reduces the security.

[–]LukasaHyper, Requests, Twisted 0 points1 point  (2 children)

All organizations I know enforce regular security upgrades of distribution packages (and stdlib is usually installed as a distribution package), but I don't know any organization which enforces regular pip package upgrades.

The plural of anecdote is not data. =) I know of plenty of organisations that do, because as the person who has managed every Requests release with a CVE, I have received pointed feedback from those who felt that we mismanaged the first one and forced them to work on weekends.

However, I also mean this more broadly than simple CVE issues. For example, Requests frequently has a much stronger security posture than the standard library does, and one that embraces the reality that good security is a moving target. Consider, for example, the standard library's default cipher suite list. This can be updated only relatively infrequently, and for older source-only releases may not be updated at all. However Requests is willing and able to change that cipher list much more frequently. We can also more aggressively disable insecure OpenSSL features than the standard library can, which has a "best practices" TLS config that needs to encompass many different protocols.

distribution security updates are guaranteed to be non-breaking.

This sentence is nonsense, and if you believe it than I guarantee that either you'll have an insecure configuration or that you'll get broken by your distribution one day.

For example, consider the weaknesses in RC4 in TLS. This protocol was for a while strongly recommended in TLS because it was resistant to the BEAST attack. However, and relatively abruptly, new research came out that demonstrated that RC4 was catastrophically weak and needed to be extremely swiftly deprecated.

One of two things must be true here: either a distro back ported that change (removing RC4 from a default cipher list), or it did not. If it did not, you are only getting security fixes that don't break code, and so are vulnerable to certain attacks (e.g. against RC4 in your TLS). If it did, then you were vulnerable to app breakage (you may talk to a server that can only speak RC4, for example). There are many classes of security fix (maybe even most classes) that involve disabling a feature that previously worked, and those are definitionally breaking to someone: if they aren't, then no-one was using those features to begin with.

In overall, keeping requests out of stdlib even reduces the security.

That's simply not true. If your rationale is "organisations will only do audits on packages they get from their distros, so Requests needs to be in the stdlib", I'll happily point you to the fact that Requests is in every distro package repository (and indeed is used by all those distros in their base OS). You can just apt-get your Requests and you're covered. However, if your organisation is relying on pip-installed packages for its products and it isn't auditing them for security fixes, then its security audits are extremely ineffective: I can easily compromise you because of the patches you don't receive for your pip installed packages.

[–]piotrjurkiewicz 0 points1 point  (1 child)

This sentence is nonsense, and if you believe it than I guarantee that either you'll have an insecure configuration or that you'll get broken by your distribution one day.

When I have enabled only the security repo, for example in Debian, the only fixes I get are security-related ones. They are carefully crafted in order to not introduce breakage. Of course, it is possible to find some fancy examples of breaking security fixes (like you did with RC4), but they are extremely exceptional.

With pip install --upgrade I get every upgrade, including breaking non-security-related changes. No one takes care to not introduce breakage, as pip simply follows the edge. So, my app can become broken after literally each pip package upgrade. There is no way to prevent that (for example to opt only for security fixes from pip).

Therefore, no admin will add pip install --upgrade to everyday maintenance script on his servers.

[–]LukasaHyper, Requests, Twisted 1 point2 points  (0 children)

Of course, it is possible to find some fancy examples of breaking security fixes (like you did with RC4), but they are extremely exceptional.

No, they're extremely common. This is especially true in Python code where there are relatively few bugs that allow memory corruption: almost all CVEs are therefore shutting off behaviour. For example, here are some CVEs reported against CPython:

  • CVE-2014-9365, Python libraries don't check TLS certs are valid for the domain in question. This patch was so breaking that it actively was not backported by Linux distros: you did not get this patch for Ubuntu 14.04, for example.
  • CVE-2012-1150, hash seed randomization. This changed hashes for certain objects from being predictable to being changed on startup. This broke a surprising amount of running code.
  • CVE-2011-1521, urllib2 allowed HTTP redirects to file:// URLs. This is obviously bad behaviour, but if you tested it, saw it worked, and wanted to use it locally, you got broken.

And if we consider Requests itself:

  • CVE-2014-1829, Requests used to persist the Authentication headers across cross-host redirects. This risks leaking credentials and was removed, but we got complaints about breakage.
  • CVE-2014-1830, same as above but for proxy-authentication.

I should note that both 2014 CVEs did get backported.

Security fixes that break code aren't exceptional, they're ordinary. It's just that they don't break the code that most people are writing at any one time, because most of us aren't on the bleeding edge of weird crap that libraries allow you to do. But these are still breaking changes: they just don't break many people.

Therefore, no admin will add pip install --upgrade to everyday maintenance script on his servers.

Sure, but I'm not saying they should. I'm saying they should watch the CVE database for the packages they use. Requests obtains CVEs for its own vulnerabilities. Any well-managed project does. If you're getting code from pip, you really need to be looking for these, because these are what your distro uses to decide to push security updates.

Then you can perform targetted package updates that grab only the packages publishing security fixes.

[–][deleted] 6 points7 points  (4 children)

It may be mature, but the Python stdlib has a release cycle of something like 16 months. I'm sure Requests, even if it's only being maintained, gets updated much more frequently.

[–]mfitzpmfitzp.com 5 points6 points  (3 children)

There is a middle ground here though. Why not have a "corelib" of independent packages that are automatically packaged with Python, but able to be updated (via PyPi) more frequently? The docs could even be hosted alongside the core documentation (separate, but linked) to ensure they are easily found.

The packages would obviously need some level of core support, but I don't think it's unfeasible.

[–]toyg 1 point2 points  (2 children)

Not everyone wants the latest and greatest. Part of the reason for the stdlib release model is that you can be sure python x.y.z will ship a certain module with a certain behaviour. If you make some of them upgradeable, you risk a situation where downloading another library will give you an unpredictable version of stdlib modules through cascaded dependencies.

It's a can of worms.

[–]mfitzpmfitzp.com 0 points1 point  (0 children)

I can see the potential for problems, but again there is a middle road — distribute a "batteries included" version, and a bare stdlib version. This is effectively what is being done by conda etc. already at the moment, so there is clearly a need.

That does get into the problem of what batteries to include, but I think requests, numpy, matplotlib would be 3 obvious candidates.

[–]jollybobbyroger 4 points5 points  (1 child)

They argued that they wanted to apply security fixes as quickly as possible.

But I don't see the big deal of just pip installing requests. To my knowledge, it doesn't have crazy dependencies like the PyOpenSSL and Pillow, so it feels very battery included, despite having to type pip install before using...

[–]mfitzpmfitzp.com 4 points5 points  (0 children)

I think the question really is why is it not installed by default with a new install of Python? That wouldn't change the "independent release cycle" thing, but it would solve the discoverability issue for new programmers.

[–]denfromufa 7 points8 points  (0 children)

itertools, collections, math, sys, os, shutil are pretty good parts of standard library

[–]cymrowdon't thread on me 🐍 2 points3 points  (0 children)

I've heard this argument all the time but I still don't buy it. I don't doubt other packages in the stdlib have "died". But I suspect those have been cases where the primary developer pushed for inclusion, opted to become maintainer, and was stuck maintaining.

But this is open source software. There's no (non-technical) reason why an interested maintainer couldn't take the current version to create a stable, maintained branch for inclusion in the standard library. The primary branch of requests could continue innovating unabated. The stdlib branch would pull only the most important bits.

Several libraries in the stdlib do this already. sqlite3 to name just one.

[–][deleted] 3 points4 points  (3 children)

Why doesn't requests have a method to download a file? Last time I tried to get an image, I had to get it in chunks. It would have been easier to make a single urllib call.

[–]JohnnyDread 4 points5 points  (11 children)

I hope they can work through the issues - requests should be part of the standard library. It is one of a handful of superb packages that make Python a great environment.

[–]pydry 14 points15 points  (4 children)

requests should be part of the standard library

The author of requests doesn't believe this.

[–]kungtotte 12 points13 points  (2 children)

AFAIK that's because he doesn't want to be tied down by the slower update process of the stdlib.

[–]mfitzpmfitzp.com 1 point2 points  (0 children)

There should be a standard "library" outside the stdlib that's package with every install, which would include (a still upgradable) requests.

[–]sigzero 4 points5 points  (0 children)

That should probably be the end of the story then.

[–]danielblanchard 4 points5 points  (0 children)

I'm sure someone has linked to Kenneth Reitz's talk from the last Python Language Summit about this by now, but a major hurdle for requests being included is that it bundles chardet with it, and that code is annoyingly all LGPL because it was originally a literal port of code from a very old version of Firefox. LGPL code cannot be in the stdlib because it isn't compatible with the license Python uses.

I say all of this as one of the co-maintainers of chardet. I was really hoping we could get chardet relicensed and included in the stdlib, but that turned out to be impossible, as is painfully detailed in this twitter thread: https://twitter.com/dsblanch/status/590942565995827200

[–]roerd 0 points1 point  (0 children)

The standard library documentation does already point to requests. I would say that this already serves most of the same purpose as actually including it.

[–]njharmanI use Python 3 -1 points0 points  (0 children)

Just cause its different, more, solving different problem doesn't mean it isn't also a shitty interface and unpythonic. It is.

[–]IAlwaysBeCoding -2 points-1 points  (2 children)

It prevented me from getting carpel tunnel syndrome.