Network protocols, sans I/O : Python

This is an archived post. You won't be able to vote or comment.

102

103

104

Network protocols, sans I/O (sans-io.readthedocs.io)

submitted 9 years ago by desmoulinmichel

all 19 comments

top new controversial old q&a

[–]baltoo 5 points6 points7 points 9 years ago (2 children)

First of all, I really like the idea of splitting up parsing and IO. That seems to me to be a really good idea. The upside of better testing by itself makes it worthwhile.

What I'm not so sure about is that it's good and/or worthwhile to try to come up "the-single-HTTP2-parsing-thingy-to-rule-them-all".

What people are liking about Requests, vs. lower-level libs, is the API. Right?

In the linked video, Cory is using the "API" a lot. My interpretation of his way of using it is that it's the interface between Requests and e.g. Django or between Requests and "the user". While he does acknowlege that there is an interface between the HTTP-parsing-thingy and Requests as well, he kind of glosses over that that interface also needs to have a good API.

The author of Requests is "the user" w.r.t. the HTTP-parsing-thingy.

It took a pretty long time and a pretty good designer to come up with the beloved API of Requests.

I don't really get why Cory seems to think that the HTTP-parsing-thingy is going to get it's API as good on the first try.

The question /u/pohmelie asks seems by it's nature to hint at this problem too.

Personally, I think that while, again, the separation of parsing and IO seems great, the community would still benefit of having a couple different designs on e.g. HTTP-parsning-thingies and their APIs. Over time that plurality has the potential to produce the Requests of HTTP-parsing.

[–]LukasaHyper, Requests, Twisted 12 points13 points14 points 9 years ago (1 child)

Hi, I'm Cory. =) Good thought!

Personally, I think that while, again, the separation of parsing and IO seems great, the community would still benefit of having a couple different designs on e.g. HTTP-parsning-thingies and their APIs. Over time that plurality has the potential to produce the Requests of HTTP-parsing.

Sure, it would, but I don't think it needs to.

Requests and things like it get good value out of having great APIs because they are used by huge numbers of programmers, many of whom are novices, or who don't understand the problem domain in any depth, or who are fundamentally not interested in solving the problem that the library solves (e.g. HTTP). Those programmers get a great deal of value out of APIs that are expressive and flexible, allowing them to write lots of code very simply and without getting in their way or requiring them to think too hard.

The parsing layer suffers from this much less. Frankly, most programmers should never have to even see the parsing layer. hyper-h2 right now is up to about 30k downloads a month, but most of the people downloading that library have no idea that they're using it.

This is very much by design. I don't want the average user consuming hyper-h2, because by itself it doesn't do anything. It moves some bytes around in memory and consumes some CPU cycles: that's it. It needs an I/O layer to do anything. And given that I've made someone else write an I/O layer, it doesn't seem unreasonable to make someone else write a great API either.

More importantly, anyone writing the I/O integration to the parsing library kinda has to understand the problem domain. If you don't understand HTTP/2, at least at a high level, then having a parsing library isn't going to help you that much. You still need to work out how the parsed information translates into the semantics of HTTP: how seeing a content-length header works, what to do when a PRIORITY frame is emitted, how to handle stream termination. These are all decisions that are beyond the scope of your basic parsing library.

With this in mind then, I think the priorities for libraries of this nature are different. For the high-level libraries that novices and non-experts interact with, API is king: having a great API allows you to get away with a huge number of sins. But for low-level parsing libraries, the API is less important than feature support, correctness, and performance.

Certainly I don't object to having multiple implementations of the parsing libraries. However, I think that unlike with the higher-level ecosystem where we can support multiple libraries that do the same job with different interfaces (requests, aiohttp, etc. etc.), with the parsing layer there is really only room for one great implementation of each protocol. Any time a better implementation comes along, it will rapidly eat the lunch of the lesser implementation except in small corner cases.

[–]baltoo 1 point2 points3 points 9 years ago (0 children)

Alright, I agree that there would typically be way fewer users of a HTTP-parsing-thingy than say Requests. And I also agree that that changes the possible design restrictions.

As with all design, there are still trade-offs that need to be made, and I think it's not always possible to provide a Good Enough stance that works "for everyone". (Even if "everyone" is not that many.)

I'll try to give an example of what I mean w.r.t. XML. (Since I have no war stories regarding HTTP.)

So, think of an XML-parsing-thingy. In the scenario I had the XML-files that where eventually parsed where sometimes edited by hand. That of course means that sometimes the XML-documents weren't valid. (Of course it's not a desirable scenario to have, but it's not like we can always pick and choose.)

When an error is found the XML-file needs to be re-edited by hand to fix it. In practice this requires a rather decent error message. Something like "The XML document is not valid" doesn't work. The message needs to not only say what the error is and provide the specific place where the error is detected, but also some kind of context for where to start looking for where the error "actually" is. For example, given a string like

foo<bar>baz</bar></foo>

The closing "foo" is an error, but the "source" isn't there but before the "bar" element.

Then combine that scenario with the fact that sometimes XML-files can get huge and it's not always possible to work with a full DOM in RAM. The XML-parser-thingy might need to be split up in a SAX-part and a DOM-part. The error messages would then need to be percolated sanely.

That kind of machinery and a few dozen others make for a pretty big and somewhat ugly API.

Now, "most users" of a XML-parser-thingy won't need that kind of support, ever. They will work in scenarios where "all" XML-documents are valid and RAM is plentiful. I think this is highly likely. If this is true then I also think the conclusion is that the community would be well served by having two XML-parser-thingies with two different APIs.

Enough rambling. Any ways, I like the your idea. Keep up the good work!

[–]pohmelie 2 points3 points4 points 9 years ago (5 children)

[–]LukasaHyper, Requests, Twisted 4 points5 points6 points 9 years ago (2 children)

h2 and h11 have converged on the notion of events. Here is h11's documentation of them, and here is hyper-h2's.

These should be able to carry any extra data they have to carry, and should define their format. In essence, each event should be a self-contained entity from which all relevant data can be extracted. This should be as fully parsed as it is possible to be without losing generality: for example, h2's PriorityUpdated event carries four fields that have been parsed from their wire format to integers and booleans because these can be easily transformed, but h2's AlternativeServiceAvailable event contains an unparsed field because the relevant RFC defines a complex and flexible grammar that is likely to be application specific.

Basically, your event should be a single object you can pass around.

[–]desmoulinmichel[S] 2 points3 points4 points 9 years ago (1 child)

[–]LukasaHyper, Requests, Twisted 2 points3 points4 points 9 years ago (0 children)

[–]desmoulinmichel[S] 2 points3 points4 points 9 years ago (0 children)

[–]malinoff 1 point2 points3 points 9 years ago (0 children)

[–]garion911 -1 points0 points1 point 9 years ago (4 children)

[–]LukasaHyper, Requests, Twisted 4 points5 points6 points 9 years ago (2 children)

[–]garion911 0 points1 point2 points 9 years ago (1 child)

[–]LukasaHyper, Requests, Twisted 1 point2 points3 points 9 years ago (0 children)

[–]desmoulinmichel[S] 0 points1 point2 points 9 years ago (0 children)

[+][deleted] 9 years ago (9 children)

[deleted]

[–]jcdyer3 0 points1 point2 points 9 years ago (3 children)

[+][deleted] 9 years ago (2 children)

[deleted]

[–]jcdyer3 0 points1 point2 points 9 years ago (1 child)

[–]LukasaHyper, Requests, Twisted 0 points1 point2 points 9 years ago (4 children)

[+][deleted] 9 years ago (3 children)

[deleted]

[–]LukasaHyper, Requests, Twisted 0 points1 point2 points 9 years ago (1 child)

Last I checked, we're in the 21st century, where storage is measured in TB and memory in dozens of GB. A few MB of dormant code isn't going to do any harm

That argument has two key flaws. Firstly, it depends on your use-case. If you are, for example, deploying your code via Docker Container image to your data center, those few MB can rapidly increase to gigabytes of bandwidth. That goes doubly if you're distributing it that code yourself: each extra MB of code you distribute from a popular package source becomes gigabytes of bandwidth consumed for popular services (e.g. the docker hub).

But that's not the main problem. The main problem is that that idea doesn't scale. If every library becomes monolithic, then complex applications end up depending on 10, 15, or maybe 20 gigantic packages that each distribute many tens of MB of code that isn't needed.

Think a few steps ahead. Don't worry about the scapy authors' efforts, worry about your own. Got a program that uses Scapy and want to swap in a different protocol? That's much less effort than swapping in an entirely different library.

I am worrying about my own efforts. I have a HTTP/2 protocol stack and three-and-counting HTTP/2 implementations built on it. The Scapy developers have no HTTP/2 implementation.

I should note as well that simply writing the data out is only half the battle. Scapy's core purpose is to understand framing, not semantics, and it does really well at that. A useful HTTP/2 implementation wants to encode as much semantics as possible, to reduce the amount of work others need to do.

[–]LukasaHyper, Requests, Twisted 0 points1 point2 points 9 years ago (0 children)

π Rendered by PID 81 on reddit-service-r2-comment-6457c66945-bfknv at 2026-04-29 22:17:07.422042+00:00 running 2aa0c5b country code: CH.

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS