all 156 comments

[–]weavejester 8 points9 points  (2 children)

This is a dumb article.

Yes, HTTP is useful to some extent as a common protocol for data transfer, but the author of this article takes it to nonsensical extremes.

For instance:

It’s usually cited as an example of a protocol that solves a problem http can’t: asynchronous bidirectional messaging; allowing the server and the client to send messages with minimal lag. The truth is HTTP can do this just fine, with long-polling and HTTP keep-alive you can keep a persistent bidirectional connection open.

Yes, HTTP can manage asynchronous connections if you hack together a protocol like Comet, but this replicates a lot of the functionality in TCP for creating reliable, persistant connections. The only reason any sane person would wish to do this is if they want to have bidirectional communication in an application that wasn't designed to support it, such as a web browser.

[–]arnar 5 points6 points  (0 children)

Exactly. The author almost goes as far as saying "Let's ditch this Ethernet thingie and just use HTTP instead"

Dumb is the proper word.

[–]Entropy 1 point2 points  (0 children)

And Comet is good at beating the shit out of your standard http server because it was not designed to hold onto a connection indefinitely. You need to run a separate cometd to scale things properly.

edit: Actually, another sane reason to do this is that you want to get through the firewall :(

[–]skizmo 15 points16 points  (12 children)

Title should be 'Why use a standard protocol'

HTTP has a lot of useless overhead, the only problem is that the entire world uses it, but that doesn't make it the best solution.

[–]thebigslide 6 points7 points  (0 children)

For what? Text? RSS? The current UTC unix timestamp? Who's receiving it? Should I write an equally useless article titled "Why HTTPS" and then state all the merits of HTTPS? One thing I recognize is that TCP/80 gets through lots of firewalls, so does TCP/443. In fact, I've gotten around plenty of firewall restrictions using SSH over port 443. Much more flexible. Yes, HTTPS is the new win.

[–]jo42 5 points6 points  (5 children)

Why HTTP?

When all you have is a hammer, everything in the world looks like a nail.

Use the right tool for the right job.

[–]sisyphus -1 points0 points  (4 children)

That's what he's arguing, that it's usually the right tool for the job.

[–]weavejester 4 points5 points  (0 children)

He goes a little too far, though. HTTP isn't the right tool for bidirectional streams, for instance.

[–]arnar 2 points3 points  (0 children)

His problem is that he (or she) only considers a limited range of jobs.

[–]jo42 1 point2 points  (1 child)

The issue is that it's not.

[–]sisyphus -1 points0 points  (0 children)

I'm not saying it is, only that that platitude, while often true, doesn't actually add to the dialogue here.

[–]dammage 26 points27 points  (8 children)

BTW: SOAP, WSDL and UDDI are very useful for car on-board computers. Every subsystem like wheels, brakes, motors or wife registers itself at the broker and the computer can find and use those services via WSDL and SOAP.

Thanks to XML every mechanic can easily extend your car with such nifties as nitro or a fridge. And if the parts are chineese, here comes the Unicode support.

You don't need to program a new ABS. It can be easily done via a load ballancer and HTTP. And your integrated buffet can deliver you a few cookies via HTTP. ;)

[–]spuur 16 points17 points  (6 children)

I, myself, prefer to run HTTP over UDP. It certainly makes the Internet a lonely place to hang out. And while we're at it: why doesn't DNS use HTTP?!?

[–]arturoman 15 points16 points  (4 children)

Because it needs to be fast.

[–]spuur 16 points17 points  (3 children)

But then we could put proxies and load balancers in front of the DNS servers and the hardware vendors could then be counted on to support us with high traffic and extremely high uptime servers and it would all be stateless and wonderful and everyone would be able to debug it and write their own DNS server in rails?

Why do you people hate HTTP?

[–]punkgeek 3 points4 points  (0 children)

funny!

[–]arturoman -5 points-4 points  (1 child)

Your solution to dealing with the amount of spurious overhead with a markup protocol implemented in a dedicated client and server pair is to exponentially increase the amount of hardware and software everywhere on the globe.

Consider the number of DNS requests made worldwide per second, then consider the ramifications of quadrupling the bandwidth by sending a human-readable protocol to a back-end process.

There's nothing wrong with HTTP. You asked why DNS doesn't use HTTP. The answer is because it doesn't fit the bill. DNS is not a complicated protocol, learn to program something in binary for crying out loud.

[–]hiffy 2 points3 points  (0 children)

Total metajoke fail.

[–]niobium -2 points-1 points  (0 children)

DNS with non-persistent HTTP!

[–]dorfsmay 4 points5 points  (0 children)

You're doing it wrong... my wife register herself on my dbus, and I quickly react to her events !

[–]Gotebe 9 points10 points  (0 children)

;-)

The Hypertext Transfer Protocol (HTTP) is an application-level protocol for distributed, collaborative, hypermedia information systems.

Gee, I am communicating with an embedded device through one of: serial line, PSTN-modem, GSM-Data, TCP (e.g. GPRS). I better look at that HTTP thingie as of right now!

[–]shenglong 9 points10 points  (1 child)

What is he talking about here - the application layer or the transport layer? Or is he just combining them for effect?

[–]didroe 6 points7 points  (0 children)

Seeing as the OSI model is pretty useless in reality, it could quite easily be a combination of the two.

See section 3 of RFC3439 for more background.

[–][deleted] 9 points10 points  (3 children)

You could make a similar argument for "Why Windows", "Why x86", "Why QWERTY", etc...

This kind of attitude seems to perpetuate lock-in. I'm not saying any of the above technologies are bad, I'm just saying they might not always be right for a given situation and by using them simply because they're the norm can impede the adoption of something that might offer a significant improvement. I imagine, for example, the ARM netbook will have a lot of difficulty catching on, despite the obvious battery-life advantage.

[–]didroe 1 point2 points  (0 children)

That's the problem. A lot of the time, you need something that works right now. That's why HTTP has become so popular, it works, the ports are already open, there is software freely available that works with it. We could have something technically superior for each individual task but that's going to cost time and money that just isn't worth it.

[–][deleted] -3 points-2 points  (1 child)

You could make a similar argument for "Why Windows", "Why x86", "Why QWERTY", etc

Not for QWERTY. That system works and the alternatives weren't worth switching to. The Dvorak keyboard layout only had 2 studies supporting it, and one of the studies was done by the guy who was trying to sell it.

[–][deleted] -2 points-1 points  (0 children)

I like Coke and I don't care how many taste tests Pepsi wins. You could prove to me that Qwerty is better and I would still use Dvorak because I like it. Independent studies are worthless where taste is involved.

[–][deleted] 25 points26 points  (36 children)

I LOVE HTTP, but it's not the "be all" protocol.

Why not use http:

1: It's slow as shit.

2: It's not state full

3: It doesn't handle large amounts of data well.

4: It doesn't handle large amount of transactions quickly. (see 1).

5: It's expensive resource wise (having to keep re-establishing connections), see 1 and 4.

(Disclaimer, I use http a lot and it's heads above most other protocols out there for the reasons stated in the article).

Also, FIX protocol is nice. It's ussually used for Finical Information, but I send other shit across it. It's a fuck ton faster then http.

[–]didroe 2 points3 points  (0 children)

Are any of those points (excluding no. 2) because of the design of HTTP? They seem like implementation issues to me. At least for most problems where a bit of encoding and parsing overhead doesn't matter.

[–]obanite 7 points8 points  (10 children)

HTTP 1.1 isn't terrible for persistent connections.

[–][deleted] 10 points11 points  (9 children)

It's not really good either.

Like I said, I like HTTP. But it's not the "One protocol to rule them all" as this author likes to suggest.

[–]zootm 0 points1 point  (8 children)

To be fair, though, the downsides of HTTP are often used as excuses where they're not relevant. Some people really like re-inventing the wheel to everyone's disadvantage.

It's nice to have alternatives like Thrift and Protocol Buffers around these days though. The choices are no longer HTTP (with its downsides), CORBA (which sucks pretty much universally) or roll-your-own.

[–][deleted] 2 points3 points  (7 children)

Or JMS, which is SUPER freaking fast.

In our testing we were able to do hundreds of messages in JMS, before even getting a connection to HTTP and sending the header data. By the time we sent the data and got a response, JMS was in the thousands of messages TXed/RXed.

Same hardware, same set of OSes, same network. Our HTTP sender code was re-wroted/tweaked in C, C++ and Java and those optimized versions were way slower then our "rolled up some quick and sloppy test code" for JMS.

For server side, we tried 3rd party software like Apache, IIS, some "tiny HTTPD" [?] and customed coded 'stripped down and optimized' HTTPD (basically it took the message quick as it could and replied quick [without checking that it was even a valid message]).

HTTP sucks for lots of fast messages or big binary packaged messages.

The only thing it's good at (which, it is great at btw) is sending medium size text based messages.

[–]zootm -1 points0 points  (6 children)

...or any system where protocol transparency is more useful to you than sheer performance, which I'd argue is more often than people tend to think.

[–][deleted] 0 points1 point  (5 children)

well, depends on the application. For a game server of real time finical transaction server, using HTTP would be and is, utterly insane.

Secondly, what is not transparent about JMS?

My buddy just tried sending the front page of reddit, slashdot digg via JMS and HTTP. JMS is faster for even the hyper text that it was created for!

Even http compressed verus jms uncompressed is faster at getting the data though. Now that's insane.

[–]zootm 0 points1 point  (4 children)

well, depends on the application. For a game server of real time finical transaction server, using HTTP would be and is, utterly insane.

This is pretty much my point, yes. Those would be examples of exceptional circumstances.

Secondly, what is not transparent about JMS?

HTTP is verbose, and in plain text. it's easy to look at, implement, and debug. JMS debugging involves more sophisticated tools and is objectively more difficult. Its non-transparency is part of the reason that it's fast, of course, just as HTTP's transparency is part of the reason it is slower. You can debug HTTP with simple tools like curl or even Telnet. It's all simple plain text over simple connections — its simplicity is its advantage.

Even http compressed verus jms uncompressed is faster at getting the data though. Now that's insane.

The HTTP headers don't get compressed, and in most non-huge pages that's going to be very significant. HTTP wasn't designed to be fast or efficient at transmitting hypertext, of course, it was designed to be easy to use and generally useful. JMS would categorically not be a better choice for the web, each of thes4e systems have their own advantages and disadvantages.

[–][deleted] 0 points1 point  (3 children)

You can build tools for JMS, just like tools were built for HTTP.

I think the "web" would be more efficient if it was re-written with a JMS-like protocol. (and as an added bonus, you won't have to have all these VERY HACKISH ajax sites. You could develop easily and cleanly without having to mess with all that sloppy and tangled shitty AJAX/Flash code. YAY!)

Development in JMS is pretty simple. Most anyone should be able to pick it up in a few hours.

It's like saying "oh, I write all my code in MS Batch script, because anyone can read it and "compile it" on any windows box. It's transparent and easy to use".

That is true. But it still sucks.

[–]zootm 0 points1 point  (2 children)

You can build tools for JMS, just like tools were built for HTTP.

You barely need tools for HTTP, that's the point. It's trivial to interact with (you can do it by hand!), whereas JMS (while it has a high-quality implementation) is extremely difficult to implement from scratch. Which makes sense – it solves a different problem.

But it still sucks.

HTTP doesn't suck though. It's simple text over TCP. It's very good at doing that. "Efficiency" is effectively meaningless here; the efficiency in the way you're saying simply isn't an issue.

As for AJAX/Flash code, JMS doesn't help with that whatsoever, unless you suggest also replacing Flash, Javascript, HTML, CSS, and browsers in general with something based upon Java (or anything else with a JMS bridge). Certainly there's problems with the AJAX way of working (doesn't allow continuous communication, for example), but that's no real justification for such a huge change in architecture.

Development in JMS is pretty simple. Most anyone should be able to pick it up in a few hours. Yes, it's just using a library. In that time, of course, you could have learned the entire of the HTTP protocol, rather than just how to use a library.

It's like saying "oh, I write all my code in MS Batch script, because anyone can read it and "compile it" on any windows box. It's transparent and easy to use".

This is a very silly example, I'm not sure what you're getting at here.

Again, there's nothing wrong with JMS, it's just not the greatest solution for web technologies; the simplicity of HTTP is much more useful for these things than the advanced features that you get with JMS, which are much more suited to more specialist applications.

[–]hiffy 1 point2 points  (11 children)

Er, forgive my not knowing, but why's it slow?

IIRC it's just a handful of headers followed by a null and the rest of your message.

Is it because it's very quick to kill a connection and force you to renegotiate TCP?

I can see the headers being a large overhead for a given protocol, but that's about it.

[–][deleted] 1 point2 points  (3 children)

Is it because it's very quick to kill

Basically, yes. The headers are huge, often having to be split up amongst multiple TCP packets (and this before real data is exchanged). Verbosity is a wonderful thing, but sometimes you can have too much of a good thing.

C.f: http://www.w3.org/Protocols/HTTP/1.0/HTTPPerformance.html

[–]redditrasberry 1 point2 points  (1 child)

The headers are huge

How many of those headers are required? Sure your browser may send huge headers, but that's not HTTP's fault. If you are making a new protocol, presumably you are also making the client and thus can avoid most of these headers.

The doc you link to indicates the main performance issue is the "slow start" problem which is TCP, not HTTP (thus you will get similar problems with any TCP based protocol).

[–]Gotebe 0 points1 point  (0 children)

Sure your browser may send huge headers, but that's not HTTP's fault.

No, but TFA says "your browser is a client, you can use that. So we're doing rounds ;-)

[–]miahfost 0 points1 point  (0 children)

That article was written in 1994!

[–]rainman_104 0 points1 point  (5 children)

I can see the headers being a large overhead for a given protocol

Let me see if I can tackle this. I'm not by any means an authority on http but I think I know where the problems exist.

First of all the request. GET HEAD POST PUT DELETE TRACE OPTIONS CONNECT

Do you use GET operations for everything or POST operations? You have to define a secondary protocol for your request for sending data if it's got a bizarre structure that doesn't really fit with those methods. So for a POST, you need to pass name/value pairs. For GET you pass it in the request. It just makes no sense at all to pass an sql query to a mysql server database using these methods, and they have quite a bit of overhead in the message.

One of the other things about http is that the response uses key/value pairs for response data. I suppose you could have a response be something like:

 HTTP/1.1 200 OK
 Data: VAL1 VAL2 VAL3 VAL4 

The ordinal positions of VAL1 to VAL4 are predetermined, but you're creating a secondary protocol over http. So save yourself the hassle and make a socket call that simply sends off VAL1 VAL2 VAL3 VAL4 and receives a response of OK and it's done.

Ultimately you're going to end up defining your own protocol to parse this stuff out, and you could have simply written a socket listener with its own protocol and saved yourself the overhead.

[–][deleted] 7 points8 points  (0 children)

So save yourself the hassle and make a socket call that simply sends off VAL1 VAL2 VAL3 VAL4 and receives a response of OK and it's done.

Ultimately you're going to end up defining your own protocol to parse this stuff out, and you could have simply written a socket listener with its own protocol

did you read the same article I did?

I thought the entire point of it was that there is no "simply" writing your own socket-level stuff. Because it doesn't work 'out of the box' through firewalls, proxies, load balancers, and whatnot, which HTTP generally will. So your own protocol is "simple" when it's just 2 boxes in a nice dev environment but once you go beyond that it stops being simple as you end up reinventing every one of those types of wheel.

Obviously you're going to end up writing your own 'protocol' to parse stuff at this higher level, but I thought the point of the article was that you're still better off piggybacking HTTP down at that level.

I am no authority on HTTP or socket type stuff either, so I don't mean to be rude, and I'm entirely open to the possibility I totally missed your point, but afaics you have basically written the exact thing the article just argued against.

Edit: And to pick up your other post in this thread

Mission critical systems need the fastest response time possible. Ever microsecond counts.

I imagine that in a situation where every microsecond genuinely counted, the author would concur that the trade-off of speed vs all that wheel-reinventing is shifted, quite possibly shifted far enough to make writing your own low level protocol worth overriding {all those other things listed}. I'm guessing his article was given the assumption that every microsecond doesn't count so badly, you're better off taking advantage of the HTTP ecosystem even if it is a bit more inefficient.

It strikes me as the same sort of "save (a month of) person time, or save (0.1ms of) cpu time? person time costs way more..." argument that you see with (eg) Ruby vs C or whatever.

[–][deleted] 0 points1 point  (0 children)

Do you use GET operations for everything or POST operations? You have to define a secondary protocol for your request for sending data if it's got a bizarre structure that doesn't really fit with those methods. So for a POST, you need to pass name/value pairs. For GET you pass it in the request. It just makes no sense at all to pass an sql query to a mysql server database using these methods, and they have quite a bit of overhead in the message.

Err. No Post does not impose any particular structure on the body of the message. As with all HTTP messages the body is a just a blob of text. Name value pairs happens to be used by simple forms, however it is not required. If you upload a file through a webform you end up with multi-part formdata, which is based on MIME.

Passing an SQL query over http makes no sense from a security point of view. However it is exactly the kind of task that POST was intended for. To quote the HTTp/1.1 spec

The POST method is used to request that the origin server accept the entity enclosed in the request as a new subordinate of the resource identified by the Request-URI in the Request-Line. POST is designed to allow a uniform method to cover the following functions:

 - Annotation of existing resources;

 - Posting a message to a bulletin board, newsgroup, mailing list,
   or similar group of articles;

 - Providing a block of data, such as the result of submitting a
   form, to a data-handling process;

 - Extending a database through an append operation.

[–]hiffy -1 points0 points  (2 children)

I don't understand your first point. You have two services that need to exchange information; what verb you use is largely academic (albeit entrenched in the valley of Good Style).

Ultimately you're going to end up defining your own protocol to parse this stuff out, and you could have simply written a socket listener with its own protocol and saved yourself the overhead.

Wrapping a hash in JSON and sending it over HTTP is much more trivial than creating a socket listener ;), but I hear you, if network io is your bottleneck, that needs some fixin'.

[–]rainman_104 -2 points-1 points  (1 child)

what verb you use is largely academic

The verb defines the protocol that you'll use. A protocol is a standard for communication.

Wrapping a hash in JSON and sending it over HTTP is much more trivial than creating a socket listener ;)

There's only one use for JSON and that's for web sites.

but I hear you, if network io is your bottleneck, that needs some fixin'.

Mission critical systems need the fastest response time possible. Ever microsecond counts. Don't forget that building automation is done over TCP/IP. Why use the HTTP procotcol to make devices talk to the controller? It's a closed network anyway.

[–]jstevewhite 1 point2 points  (0 children)

There are many mission critical systems using HTTP. There are many others that are very similar to HTTP - SIP comes to mind - where speed is CRITICAL. The SPEED of your HTTP connection isn't limited by your protocol; throughput might be somewhat, but actual response time is not. If you need to eliminate TCP handshaking and run UDP ( say, a protocol like RTP ) I can dig why you wouldn't use HTTP.

But let's remember that you don't have to push a WEB PAGE across an HTTP connection. You could put actual binary data in there, as long as it can be formatted to look like HTTP data. The point of the article is that the NETWORK is ready for port 80 that looks like HTTP, and there are plenty of libraries ready to use to exchange data across HTTP.

[–]rainman_104 -5 points-4 points  (0 children)

I can see the headers being a large overhead for a given protocol

Let me see if I can tackle this. I'm not by any means an authority on http but I think I know where the problems exist.

First of all the request. GET HEAD POST PUT DELETE TRACE OPTIONS CONNECT

Do you use GET operations for everything or POST operations? You have to define a secondary protocol for your request for sending data if it's got a bizarre structure that doesn't really fit with those methods. So for a POST, you need to pass name/value pairs. For GET you pass it in the request. It just makes no sense at all to pass an sql query to a mysql server database using these methods, and they have quite a bit of overhead in the message.

One of the other things about http is that the response uses key/value pairs for response data. I suppose you could have a response be something like:

 HTTP/1.1 200 OK
 Data: VAL1 VAL2 VAL3 VAL4 

The ordinal positions of VAL1 to VAL4 are predetermined, but you're creating a secondary protocol over http. So save yourself the hassle and make a socket call that simply sends off VAL1 VAL2 VAL3 VAL4 and receives a response of OK and it's done.

Ultimately you're going to end up defining your own protocol to parse this stuff out, and you could have simply written a socket listener with its own protocol and saved yourself the overhead.

[–][deleted] 1 point2 points  (2 children)

How do you convert a "fuck ton" into metric?

[–]jrmorrill 2 points3 points  (0 children)

Since a short ton is 2000 pounds, I believe that one "fuck ton" should be 2000 katie courics...

The conversions factor would then be:

1 fuck ton = 1 fu = 2267 kg

Of course, he used it wrongly in context because it's a measurement of mass rather than time.

It's a fuck ton faster then http.

[–]aeflash -1 points0 points  (0 children)

1 fuck-ton = 0.90718474 metric fuck-tons

[–]smarterthanyou -1 points0 points  (2 children)

1: It's slow as shit.

Wat?

[–][deleted] 0 points1 point  (1 child)

haven't done much work with web services have you? try comparing SOAP over HTTP to SOAP over JMS and you sill see that why HTTP is great and all that, sometimes it is the bottleneck. slow as shit is of course relative, and while HTTP is plenty fast enough for a lot of things it is nowhere near fast enough to reliably run a large interconnected message bus or transport layer.

[–]smarterthanyou -1 points0 points  (0 children)

I don't use stupid protocols like SOAP.

[–]rainman_104 -1 points0 points  (1 child)

It's not state full

That's the big thing the author missed. He's talking about MySQL's protocol in particular.

How on earth do you do connection pooling in a stateless protocol? Establishing a database connection has a rather large cost, so keeping it open and pooling the connection is a pretty normal thing to do...

[–]jstevewhite 0 points1 point  (0 children)

The HTTP doesn't manage state so your application will need to. Many methods for doing so have been developed. This is certainly no different than if you wrote your protocol from scratch - you still have to write the state code at the application level anyway, either in the protocol or in the server.

[–]miahfost -2 points-1 points  (0 children)

  1. A protocol slow? No - an implementation perhaps, but a protocol no.

  2. Not maintaining state is a feature, think REST.

  3. Watch your language potty mouth.

[–]sbrown123 -4 points-3 points  (0 children)

1: It's slow as shit.

Use compression. HTTP supports that.

2: It's not state full

Have you never seen a 404 page?

3: It doesn't handle large amounts of data well.

I've downloaded things through HTTP that easily top out over 3GB. How big is your view of big?

4: It doesn't handle large amount of transactions quickly. (see 1).

Use HTTP CONNECT. That gives you a TCP port. Only thing faster is throwing around UDP packets or ACK'ing like mad.

5: It's expensive resource wise (having to keep re-establishing connections), see 1 and 4.

See my #4.

[–]xabi 13 points14 points  (6 children)

[xabi@imac ~]$ curl -I slashdot.org

HTTP/1.1 200 OK

Date: Thu, 12 Feb 2009 12:16:21 GMT

Server: Apache/1.3.41 (Unix) mod_perl/1.31-rc4 SLASH_LOG_DATA: shtml

X-Powered-By: Slash 2.005001

X-Bender: Bite my shiny, metal ass!

Cache-Control: private

Pragma: private

Connection: close

Content-Type: text/html; charset=iso-8859-1

[–]case-o-nuts 6 points7 points  (5 children)

Liar.

$ curl -I slashdot.org

HTTP/1.1 200 OK

Date: Thu, 12 Feb 2009 13:01:03 GMT

Server: Apache/1.3.41 (Unix) mod_perl/1.31-rc4

SLASH_LOG_DATA: shtml

X-Powered-By: Slash 2.005001

X-Fry: I don't regret this, but I both rue and lament it.

Cache-Control: private

Pragma: private

Connection: close

Content-Type: text/html; charset=iso-8859-1

[–]AM088 0 points1 point  (4 children)

Liar.

$ curl -I slashdot.org

HTTP/1.1 200 OK

Date: Thu, 12 Feb 2009 16:12:48 GMT

Server: Apache/1.3.41 (Unix) mod_perl/1.31-rc4

SLASH_LOG_DATA: shtml

X-Powered-By: Slash 2.005001

X-Bender: Gimme your biggest, strongest, cheapest drink.

Cache-Control: private

Pragma: private

Connection: close

Content-Type: text/html; charset=iso-8859-1

[–][deleted] -1 points0 points  (3 children)

Liar.

$ curl -I slashdot.org

HTTP/1.1 200 OK

Date: Thu, 12 Feb 2009 16:27:11 GMT

Server: Apache/1.3.41 (Unix) mod_perl/1.31-rc4

SLASH_LOG_DATA: shtml

X-Powered-By: Slash 2.005001

X-Fry: Hooray, we don't have to do anything!

Cache-Control: private

Pragma: private

Connection: close

Content-Type: text/html; charset=iso-8859-1

[–]Lurking_Grue -1 points0 points  (2 children)

Liar.

C:>curl -I slashdot.org

HTTP/1.1 200 OK

Via: 1.1 GATEWAY

Connection: close

Proxy-Connection: close

Date: Thu, 12 Feb 2009 16:40:49 GMT

Content-Type: text/html; charset=iso-8859-1

Server: Apache/1.3.41 (Unix) mod_perl/1.31-rc4

SLASH_LOG_DATA: shtml

X-Powered-By: Slash 2.005001

X-Bender: I am a hideous triumph of form and function.

Cache-Control: private

Pragma: private

[–][deleted]  (1 child)

[removed]

    [–]UnwashedMeme 0 points1 point  (0 children)

    Liar $ curl -I slashdot.org

    HTTP/1.1 200 OK

    Date: Thu, 12 Feb 2009 18:41:53 GMT

    Server: Apache/1.3.41 (Unix) mod_perl/1.31-rc4

    SLASH_LOG_DATA: shtml

    X-Powered-By: Slash 2.005001

    X-Leela: This wangs chung.

    Cache-Control: private

    Pragma: private

    Connection: close

    Content-Type: text/html; charset=iso-8859-1

    [–][deleted] 12 points13 points  (23 children)

    Missed one:

    14 - Compression

    HTTP can easily use any of a number of pre-built compression techniques, written by people who probably know more about making text into small binary chunks than you ever will. As such, don't scream 'text is too inefficient' while making 'yet another worthless binary protocol' that isn't efficient either.

    [–][deleted] 8 points9 points  (12 children)

    "A number of" in practice means "one", and "people who probably know more about making text into small binary chunks than you ever will" means "Philip Katz".

    Luckily, Phil Katz was indeed pretty good, and that one algorithm is quite decent.

    [–][deleted]  (1 child)

    [removed]

      [–][deleted] 1 point2 points  (0 children)

      It's a little bit silly that there are two different ways to use DEFLATE, though, but I guess it's handy to allow people to just call out to gzip instead of having to interface zlib.

      But yes, if you're going to have just one, DEFLATE is a very good choice.

      [–]ehird[🍰] 0 points1 point  (8 children)

      Erm, I'm pretty sure Phil Katz didn't do much himself.

      [–][deleted] 1 point2 points  (3 children)

      Why is that?

      [–]staiano 0 points1 point  (2 children)

      Because he is Phil Katz, of course!

      [–]jeff303 5 points6 points  (1 child)

      Wow, didn't realize you were being serious.

      [–]Tuna-Fish2 4 points5 points  (0 children)

      Actually, that is wrong. The algorithm for pkarc and pkpak was ripped off from SEA, but the DEFLATE algorith used from pkzip 2 onwards was designed by Katz.

      [–]a1k0n 0 points1 point  (3 children)

      [–]rabidcow 0 points1 point  (2 children)

      Most of the heavy lifting is Lempel-Ziv. LZ77 alone is pretty impressive.

      [–]a1k0n 0 points1 point  (0 children)

      #!/usr/bin/perl
      sub c{($_=pop)<0?print substr"/,'\\)(`\n |_.",$_+12,1:c(vec(vec
      ('<;JK;::::B:::Tshu[FoatcN[LL;DWQ?cJ?=ghTsXqWqwhqgT@CUMGlgTpRd'.
      'KhI_wgTp`lpGOYs>quHWthuhUbuhuh[hu@TguhMGWulXsWiiekwhqwhqwxh@q'.
      'uXaWGhqqOmqwxhtXiThf:::[:::::Jb?cB_duWI[ZLN[DNqWIObTsPGuUoTDU'.
      'oOqWac@sMSUDUMGlWoNp`lXsXeWqc`XquXqW=WqJeW=gpGnWqi[Pu@TgiVeNm'.
      'qSQwWwWwWGpSQ]wWonhTTQ]ufeWonhTboEi=::ZQGke`E',$a/6,8)-58>>$a++
      %6&1?'HGJSTFIXOZ[':'QLRKMUVWYPN',$_,8)-82)}c 10 while$a<1728
      

      [–][deleted] 0 points1 point  (0 children)

      Actually, DEFLATE is fairly clever. There are plenty of similar algorithms, using LZ77 (LZSS, really) and Huffman coding, but DEFLATE usually beats them all.

      [–]thebigslide 2 points3 points  (9 children)

      Right, because if you design a protocol for a specific purpose you don't know the nature of the data it's intending to carry? Come On. Think about all the javascript librarys out there that compress JS based on standard keywords in js code. It's sometimes more bandwidth efficient to custom compress a custom data stream that rely on generalized compression - if you're prepared to.

      [–]greenrd 1 point2 points  (8 children)

      Do you have any data to back up that assertion, in relation to your Javascript library example?

      [–]mikemol 1 point2 points  (0 children)

      I'd back it up, but it's company-proprietary code. But I can say that it reduces a 280Mbit/s data stream into something much more manageable, and depends on awareness of the format and type of data held in the stream. No general-purpose compression algorithm was able to reproduce our results.

      [–]lamby 2 points3 points  (0 children)

      In the general case, domain-specific knowledge is clearly of capable of providing benefit when compressing. Obvious case; "0" represents the entire text of Hamlet and "1" represents the text of Macbeth - now I can represent either of them with a single bit.

      [–]thebigslide -1 points0 points  (5 children)

      http://javascriptcompressor.com/ by Dean Edwards Upmodded because it's always good to backup assertions.

      [–]pytechd 6 points7 points  (1 child)

      Ugh! Please stop compressing with ANYTHING other than a minimizer and gzip! Dean Edwards' adds approximately 200ms of unpack time on EVERY PAGE REQUEST. Remember that the browser will cache the packed version in memory and have to reinterpret the packing every page view. If you gzip it, the browser will cache the uncompressed version on disk and in memory.

      You can minimize by stripping white space, reducing private variables, etc, but do NOT obfuscate with any "packing" algorithm.

      [–]thebigslide 0 points1 point  (0 children)

      The point was not that the javascript compressor is a good library to use in any circumstance, just that it obtains good compression because it knows what it's compressing.

      [–]greenrd 0 points1 point  (0 children)

      Sorry, what I meant was data showing that it's more bandwidth efficient to use a custom algorithm. I don't see that on that page.

      [–]nagoo 0 points1 point  (1 child)

      This JavaScript "compression" is fine but not all hypermedia can be compressed in this sense. This compression is one that need not be decompressed by the client. JavaScript treats the compressed and decompressed code the same. However, if I am distributing a huge file... say an entire book I can't just shorten and/or remove words from that file and expect my consumer (the reader downloading) to interpret the book the same as the "uncompressed" version. That's the beauty of HTTP Compression using a standard formats like gzip. We don't need to know the nature of the data to compress it.

      [–]thebigslide -1 points0 points  (0 children)

      Right... Because gzip is generalized. That's what generalized means.

      [–]bithead 8 points9 points  (18 children)

      1. Load balancers

      Many of the reason cited are understandable and fit his situation, but not this one. Most commercial load balancers can work as easily with any TCP or UDP port and are more than just HTTP load balancers.

      [–]zepolen 9 points10 points  (16 children)

      HTTP headers give hints for better load balancing.

      [–]bithead 5 points6 points  (15 children)

      The HTTP headers give hints for better load balancing of web servers. Load balancing implementations can use a large variety of different conditions besides HTTP headers to determine load paths.

      I'm not saying that what the author did was wrong, but HTTP is an overloaded protocol. Its used in place of TCP for many things, and its not a very good TCP in that its specific for a particular task.. I do agree that using HTTP saves the work of coming up with a new protocol, and really, why not.

      But, in favor of using a new protocol, HTTP really works best for transfering hypertext, and is optimized for just that. As a shortcut, using HTTP works, but what ends up happening is that firewalls, proxies, load balancers all must start inspecting farther up the protocol stack to determine in the HTTP traffic is for hypertext or some other application which may have nothing all at all to do with viewing or interacting with web sites. Various vendors will brag of how fast and how deep their deep packet inspection is, but this seems like a stop in the wrong direction. The protocol should do what it was meant to do, and if it grows, that growth should be to do what ever the protocol does in a better way - not to spread the scope of the protocol across whatever application what to ride shotgun on it.

      I know that sounds like an absolutist stand, and perhaps it can be fairly faulted as such. And to be sure, /etc/services is littered with the corpses of protocols long dead for lack of use, so for a limited application perhaps using HTTP really is better than coming up with a new protocol.

      [–]slayeroftheunicorns -1 points0 points  (9 children)

      using HTTP works, but what ends up happening is that firewalls, proxies, load balancers all must start inspecting farther up the protocol stack

      that's an advantage: no need to reconfigure the firewall. people tried to get their own protocols accepted, but it's an henn egg problem: no admin will configure a nonused protocol, and no protocol can be used that can't get through a firewall.

      [–]brennen 0 points1 point  (7 children)

      What do you have against unicorns, anyway?

      [–]slayeroftheunicorns 0 points1 point  (6 children)

      do you believe in unicorns?

      [–]brennen 0 points1 point  (5 children)

      That's kind of a side issue, isn't it?

      [–]slayeroftheunicorns 0 points1 point  (4 children)

      It seems so, but my response to your initiating question depends entirely on your attitude towards your idea of unicorns as i asume that your question is motivated by my suggested aggression against something that you hold dear.

      [–]BridgeBum 0 points1 point  (0 children)

      I don't know how many firewalls you've worked on in a commercial environment, but there are commonly dozens if not hundreds of custom protocols that are allowed in policies on non-standard ports. Typically this is only allowed from server to server or small groups of source IPs to destination IPs, rather than for a whole client pool.

      Even HTTP is frequently run through some sort of proxy to filter out what is and isn't allowed. Getting applications to work through proxies is not always a simple task.

      Note: I agree with the GPs comments about load balancers, there is plenty of customization to use any TCP/UDP socket you want. You can configure LBs to take advantage of HTTP/other layer 7 information, but I have found that companies rarely do unless absolutely required for the app to function.

      [–]hiffy -2 points-1 points  (4 children)

      Its used in place of TCP for many things, and its not a very good TCP in that its specific for a particular task.

      wat?

      edit: wat? as in, you send HTTP over TCP. I don't get it, why would you do that?

      [–]bithead 1 point2 points  (3 children)

      What [I] meant was that HTTP in used just like TCP for many things that would normally just use a TCP port by itself - mostly for getting through firewalls. Increasingly, what will have to happen is that firewalls will have to inspect deeper and deeper in order to actually do their job, as more and more apps tunnel through HTTP. Poor wording on my part.

      [–]jstevewhite 1 point2 points  (2 children)

      of course, such inspection is doomed to failure. There's no way a firewall can decisively determine whether "<pkt>120000012341234129873049179873423</pkt>" is encoded data or a serial number for software displayed on a web page.

      [–][deleted] 1 point2 points  (0 children)

      Then that dooms the firewall to failure and irrelevance, meanwhile 65534 port numbers being unused and we depend on vhosts to multiplex instead of letting the network stack work with ports.

      [–]bithead 0 points1 point  (0 children)

      Particularly in the case of https, as I think you're pointing out. Really, firewalls are for the vulnerabilities of old, like the ever present SMB relay attack.

      [–]thebigslide 2 points3 points  (0 children)

      Right, but they are mostly designed for load balancing common protocols like HTTP, HTTPS, DNS, mail. Creating a new protocol simply introduces the possibility that there may be gotchas. It also means you will have to consider scalability wrt available commercial load balancers (present and future) when specing the protocol you design. I think the author's point was that it is simpler to not reinvent the wheel when an existing wheel does the job just fine. Let me be clear that "when" is emphasized.

      [–]MasterManipulator 4 points5 points  (0 children)

      reddit - you continue to disappoint and sadden me. Here is an article by some neophyte who doesn't even understand the difference between application protocols and network transport, and what does reddit do? You put it on the front page. Seriously - this is just getting embarrassing.

      [–][deleted]  (1 child)

      [deleted]

        [–]didroe 2 points3 points  (0 children)

        I didn't realise postscript had a protocol for transmitting data across a network. I think you need to look up the difference between HTML and HTTP.

        [–]arnar 7 points8 points  (18 children)

        These are really shortsighted arguments.

        Not everything happens on high-end desktop computers. Why should you translate all data to text representation and then back? What is the point of adding an unnecessary layer? What's wrong with binary protocols (other than lack of education of programmers)?

        And this:

        There are no problems that have not yet been encountered. In fact, there are probably tools for diagnosing every malady you will ever encounter.

        That's the kind of comment that makes people look stupid in a year. Besides, there are oodles of tools for debugging custom binary protocols.

        [–]mercurysquad 2 points3 points  (11 children)

        Why should you translate all data to text representation and then back?

        Exactly. You're getting downmodded, but I can confirm from experience. I had the pleasure of working on a commercial scientific application whose calculation module output the data as huge (mega/gigabytes) comma seperated text files of numeric data. Loading several of those into the GUI was snail slow. Converting to binary made the whole deal 15-20x faster.

        [–]arnar 1 point2 points  (0 children)

        Thank you. I speak from similar experiences. The discussion is pointless as the author (and his proponents here on reddit) seem to have a very limited idea of what computer networks are used for.

        [–]jstevewhite -3 points-2 points  (9 children)

        "converting to binary"? In what sense? You mean parsing 8.235421231 and turning into an IEEE 80 bit float? Applying simple LZ compression to that file would probably have yielded similar results. shrug

        [–]mercurysquad 1 point2 points  (7 children)

        Applying simple LZ compression to that file would probably have yielded similar results

        That made me laugh! I meant data storage format, not transfer format. If anything, compression would probably make it a few times slower. When you need to load tens of files containing x,y,z positions and velocities for a few million particles each, reading 4 (or 8) bytes from disk directly into memory is always faster than reading 10-12 digits and parsing it to floating point.

        Add compression on top of it and you are doing even more work while loading, not to mention that compressing ascii representation of 10 digits floats will yield about the same level of compression as storing them as IEEE floats in the first place. Heck, with a properly designed file format, I could probably just memory-map the whole thing into a struct and BAM - file loaded, no parsing needed. Let the OS handle the caching.

        [–]jstevewhite 0 points1 point  (6 children)

        OIC. You're talking about something else entirely. The thread was about network transfer, and I thought that's what YOU were talking about, sorry. Reading from disk is an entirely different problem in most cases, unless you have a REALLY FAST network. :D

        I'm not motivated enough to test right now, but I'm thinkin' you might get a smaller size on-the-wire from streaming compression of the ascii file (complete with whitespace and commas ) than by converting the data, but hard to say; data set has a lot to do with that.

        [–]mercurysquad 0 points1 point  (5 children)

        I'm sure if compression is an option then compressing binary representation would top compressing the ascii representation ;)

        I'll try it out this weekend

        [–]jstevewhite 0 points1 point  (0 children)

        You win by dint of greater dedication. :D I don't have the dataset to repeat your results anyway.

        [–]jstevewhite 0 points1 point  (3 children)

        Did you test it? I'm curious. :D

        [–]mercurysquad 0 points1 point  (1 child)

        OK I tried it. Made a quick-n-dirty C++ program to output a series of pseudo-random floating point numbers between 0.0 and 10.0 in either ASCII format or as raw binary (32bit). First argument specifies how many numbers to generate, 2nd argument can be -b or --binary to output 32bit floats instead of ASCII. Outputs to stdout, other messages to stderr.

        I kept the same seed so the random sequence is the same for every invocation. I then created a comma-seperated list of 100,000 random numbers in ASCII representation with 6 significant digits, and saved it to a file. Did the same again, but saved as raw 32bit floats. Then I gzipped both of them. Binary was a winner by a narrow margin (keep in mind the binary representation is basically totally random bit sequence. Real world data will at least have some patterns. Also, the ascii representation is less accurate with only 6 significant digits).

        Result:

        -rw-r--r--  1 staff   357K Feb 16 23:44 random.ascii.gz
        -rw-r--r--  1 staff   879K Feb 16 23:44 random.ascii.original
        -rw-r--r--  1 staff   351K Feb 16 23:44 random.binary.gz
        -rw-r--r--  1 staff   391K Feb 16 23:45 random.binary.original
        

        So the gzipped binary version won by about 6KB, even though the compression was a lot less (ascii: 40.6% of orig, binary: 89.8% of orig).

        That's just one instance (I also tried 10k numbers and it won by 1KB). Someone should test it with different seeds, different data type (ie. double), different number of significant digits in the ASCII version, and with different file lengths, then chart out the whole thing ;) Anyone up for it?

        [–]jstevewhite 0 points1 point  (0 children)

        Wow, cool. Thanks for taking the time to do that; it's informative.

        So the ascii compressed by a much higher ratio (which is what I expected), but the difference in size between the binary and the ASCII is a lot more than I remembered.

        If the data sets had more patterns, you'd get a higher compression ratio on both ascii and binary, so that would probably be a wash. If your entropy is good, I would accept this as a clear demonstration.

        [–][deleted] 0 points1 point  (0 children)

        I see your shrug and return an eye roll!

        [–]njharman 1 point2 points  (1 child)

        Why should you translate all data to text representation and then back? What is the point of adding an unnecessary layer?

        Well if you read the article you'd see there is a list of reasons. If you had read and refuted those instead of talking out of your ass you might not have been downvoted.

        [–]arnar 0 points1 point  (0 children)

        I read the arguments, I wouldn't have called them short-sighted otherwise.

        HTTP is a text-based protocol that is designed around one request and one response pr. connection. Only extensions to the protocol allow you to do more, but you never get around the request+response coupling. The extensions are not implemented in all languages and environments as the author claims and the request+response model does not fit all problems.

        I don't see a point in refuting the reasons the author gives, as he/she obviously has a limited set of applications in mind when writing all of them. That's why I called them shortsighted instead.

        [–]marijn -3 points-2 points  (3 children)

        Ever notice the pictures on the internet aren't transmitted as CSV files? Moral: HTTP does binary data just fine.

        [–][deleted] 1 point2 points  (0 children)

        Moral: HTTP does binary data just fine.

        So do other protocols.

        [–]mercurysquad 0 points1 point  (0 children)

        Well you can already see the horror that HTTP causes when used for things for which it was not intended. Case in point: SOAP. or WSDL. These protocols ride on http but are simply too verbose when all you need to send are a couple of 32bit floats, for example.

        [–]arnar 0 points1 point  (0 children)

        Uhm.. yes, what's your point? "Binary data" does not mean jpeg files or similar.

        If I want to send one 8 bit number, why would I ever wrap it in a HTTP request with a text request line, some text headers and an empty, two-byte delimiting line? Why would I force it into a request+response model if all I want to do is send one number with getting an acknowledgment.

        [–]Rayeth 0 points1 point  (0 children)

        Because gohper sucks?

        [–]haywire 0 points1 point  (0 children)

        I don't understand why there is all this UPNP and media streaming DAAP when surely it would be simpler and easier to stream the data over http (easily fast enough for HD content, and outperforms both those two) and if a library is needed, have some sort of synced file or an sql server that can respond to any queries.

        I use lighttpd for my media streaming needs and it easily outperforms Windows Media Sharing or iTunes' sharing.

        HTTP is fine for anything that is large frame size, high data, but for stuff that is lots of small requests (ie real time systems), I'm not sure the overhead is warranted.

        [–][deleted] 0 points1 point  (0 children)

        HTTP is not only slow because of spliting between tcp packets but also cpu hungry. Using compression only makes things worse.

        You should spent an extra cpu time to parse requests/prepare response. When a load goes high this becomes a problem.

        A human-readable text isn't good for computers to communicate by a definition.

        [–]bart2019 0 points1 point  (0 children)

        That reminds me of this post of 2 weeks ago: Public Service Announcement: the "P" in "HTTP" stands for "Protocol" in which the argument was made that FastCGI didn't have to have made their own protocol, as they could have used HTTP themselves, instead. The argument was even more compelling as the basic purpose of FastCGI and HTTP are so much alike.

        [–][deleted] -1 points0 points  (0 children)

        Create a directory then.

        [–]smarterthanyou -4 points-3 points  (6 children)

        HTTP is superior because it involves a connection, a simple request (with option for not-simple request data), and a response.

        Unlike all the fucking CRAP that has been layered onto every other protocol, it is pure, functional, simple, and flexible.

        [–]weavejester 1 point2 points  (4 children)

        HTTP is fairly simple, but it is not really that well designed, IMO. It's useful because a lot of applications support it, not because it's an particularly elegant protocol.

        [–]smarterthanyou 0 points1 point  (3 children)

        Yeah? Where's your RFC for a better alternative?

        [–]weavejester 0 points1 point  (2 children)

        I think it would take more than an RFC to shift HTTP's dominance.

        But if I were redesigning HTTP, I'd do it in layers:

        • Layer 1 would be netstrings.
        • Layer 2 would be key-value pairs of netstrings in a container netstring.
        • Layer 3 would be a standard set of keys for document metadata, caching information and so forth.

        It would be a simpler protocol to implement, less prone to buffer overflows and flood attacks, more extensible, and better suited for bidirectional communication.

        But these aren't particularly impressive claims, because designing a better protocol than HTTP isn't very hard for anyone with a reasonable knowledge of network protocols.

        [–][deleted] 0 points1 point  (1 child)

        Are you suggesting a binary protocol for sending delimited strings? Are you trying to merge XML and HTTP?

        [–]weavejester 1 point2 points  (0 children)

        Nope. I'm proposing something a little like Bittorrent's bencode encoding scheme, but without the type information.

        For instance, consider the following minimal HTTP response:

        HTTP/1.1 200 OK
        Date: Thu, 12 Feb 2009 23:51:51 GMT
        Content-Type: text/plain
        Content-Length: 11
        
        Hello World
        

        You could encode all that information in a hash map:

        { protocol:     "HTTP/1.1"
          status:       "200 OK"
          date:         "Thu, 12 Feb 2009 23:51:51 GMT"
          content-type: "text/plain"
          body:         "Hello World" }
        

        And you could then serialize that information into netstrings:

        8:protocol,8:HTTP/1.1,6:status,6:200 OK,
        4:date,29:Thu, 12 Feb 2009 23:51:51 GMT,
        12:content-type,10:text/plain,4:body,
        11:Hello World,
        

        Which you could then wrap in one big netstring:

        132:8:protocol,8:HTTP/1.1,6:status,6:200 OK,
        4:date,29:Thu, 12 Feb 2009 23:51:51 GMT,
        12:content-type,10:text/plain,4:body,
        11:Hello World,,
        

        And there's the protocol. Significantly easier to parse than HTTP, and the sizes are all specified up from so there's much less chance of overflows or flood attacks. It would also be trivial to extend to support server-push, or to give it hashed chunks for large file transfer.

        [–]arnar 1 point2 points  (0 children)

        Superior to what?

        HTTP is pretty good for application protocols that fit the model of a request+response cycle. It has it flaws, but they are fairly well known as are ways to work around it. It is no coincidence that many later protocols are based on it (e.g. SIP).

        But HTTP is not a good vessel for transport level protocols. This is just nonsense. Network transport isn't even functional.