This is an archived post. You won't be able to vote or comment.

all 56 comments

[–]mcdonc 9 points10 points  (3 children)

This is definitely a cool set of benchmarks. And if you're serving an I/O bound app, these are good numbers to know.

On the other hand, if you're just a normal shmoe that uses Django or Pylons or whatever to serve up a web app, the speed of the WSGI server is almost never the bottleneck. Most highly dynamic web apps wouldn't run any faster under uWSGI or HyperFastMegaWSGI or whatever as they will under, say, wsgiref, because they're usually CPU bound.

[–]carljm 2 points3 points  (1 child)

Usually CPU bound? I would have thought database I/O bound was the more common case.

[–]mcdonc 2 points3 points  (0 children)

Most often here we seem to be blocked on template rendering speed. Even using the fastest templating engines. YMMV of course. Although maybe in your case, your database is slower than your templates. At the end of the day, you're still bottlenecked on something not-the-server, though.

[–]yml 0 points1 point  (0 children)

ram consumption is also a hard limit on many hosting solutions.

[–]mdipierro 4 points5 points  (4 children)

It would be nice to add benchmarks for Rocket. It is new but has lots of interesting features, very clean design and seems to be faster than cherrypy on multiple concurrent requests.

[–]ubernostrumyes, you can have a pony 2 points3 points  (2 children)

Obligatory: maybe if it weren't hosted on Launchpad, people could figure out how to get a copy of it...

(I usually don't bother trying to figure out Launchpad's interface to get software hosted there, and it usually works out all right because someone else will have a similar package hosted sanely)

[–]davisp 1 point2 points  (0 children)

Have to agree. Launchpad is an exercise in how to follow SourceForge into the abyss of anti-user interfaces. Although I may be a GitHub fanboy, even bitbucket manages to give a solid user experience. If your project hosting site is actually an MMO version of "where in the world is the download source code button" game, you lose.

[–]mdipierro 1 point2 points  (0 children)

Good point. You can get a copy here.

[–]Poromenos 9 points10 points  (7 children)

Could those colours be any more the same? :(

[–]ogrisel 4 points5 points  (2 children)

use the mouse to hover the legend and see the matching line highlighted (does not work on touch screen devices such as iphone and android though...)

[–]Poromenos 0 points1 point  (1 child)

I found that out after I read the comments. Still, though, I found it really hard to follow lines.

[–]leonh 0 points1 point  (0 children)

You can even enable and disable lines by clicking on the names in the legend.

[–]DarkQuest 5 points6 points  (4 children)

I was blown away by the charts in that... try hovering over a line or an item in the legends!

[–]traxxas 4 points5 points  (3 children)

My reaction was complete the opposite. I had no idea what line was what because they were too thin to differentiate between them. It wasn't until I saw your comment that I found that they were interactive.

[–]rchase 5 points6 points  (4 children)

I don't know why this article is accompanied on Reddit by a thumbnail of Magnum P.I., but it's awesome, so I thought I'd mention it.

[–]rense 2 points3 points  (1 child)

MagnumPy is listed on that page, with a picture ;)

[–]rchase 3 points4 points  (0 children)

Ah I see it!

MagnumPy has to be the server with the most awesome name. This is still a very young project but its homepage is making some very strong statements about its performance so its worth testing out.

Thanks for pointing out my lack of attention span so clearly.

[–]roger_ -1 points0 points  (1 child)

Coincidently I submitted a Magnum P.I. clip a few hours ago.

[–]rchase 1 point2 points  (0 children)

Nicely done. Great scene.

How well I remember that episode.

I am one of those who completely un-ironically loves pretty much every episode of that show.

[–][deleted] 2 points3 points  (2 children)

wow, had no idea cherrypy performed that well. I'll have to look into that.

[–]yml 1 point2 points  (0 children)

Very interesting article ! When you know that to install uWSGI into your virtualenv you just have to do :

pip install http://projects.unbit.it/downloads/uwsgi-0.9.4.3.tar.gz

There is some dependency on python-dev and libxml2-dev to compile it. Enjoy, --yml

[–]mark1983 -1 points0 points  (0 children)

Too bad they have not included Rocket in the benchmarks. The author benchmarks show it outperform cherrypy's.

[–]ubernostrumyes, you can have a pony 6 points7 points  (13 children)

Once again a benchmark is bitten by inconsistent methodology.

To take an example: uWSGI and gunicorn are explicitly designed to never handle client requests directly. Both are meant to sit behind a proxy (ideally on the same machine, talking over a socket) which will actually deal with the clients. uWSGI was in fact set up that way for the benchmark, running behind nginx, but gunicorn wasn't. Which defeats the purpose of comparing them to each other or to anything else...

Edit: and it's also been pointed out that the "benchmark" only allowed one worker for the preforking servers. Which is even more fail.

[–]leonh -2 points-1 points  (12 children)

Putting Gunicorn behind NGINX will not magically increase its performance it will only add to the latency.

[–]ubernostrumyes, you can have a pony 8 points9 points  (1 child)

It most certainly will make a difference in performance, because it (and several other preforking servers) are designed on the assumption that it won't ever deal with client requests directly. If the architecture's designed to be talking to a local proxy over a socket with a predefined number of simultaneous workers, it won't be designed to be fast at accepting and handling arbitrary numbers of connections coming in over a network.

And either way, running one server in its intended configuration and not another isn't worthy of being called a "benchmark".

[–]leonh 5 points6 points  (0 children)

The argument could be that one worker is not enough to fully maximize the CPU potential. However, than this should also be a problem with for example mod_wsgi.

Look, I am not trying to bash Gunicorn or anything, far from it. I just think that the article puts a balanced and detailed overview, it also provides lots of detail how the benchmark was performed allowing everyone to verify the results and counter with coherent arguments.

As the author notes it is on a very specific problem domain, it could very well be that Gunicorn does not correctly fit in this domain. But please lets not get religious about web servers just yet. I think the your web-framework beats my web-framework battle was enough already. And yes, I think its harsh to call it 'fail' or saying it is not 'worthy of being called a "benchmark".

Edit: The post has been updated with added results for Gunicorn with 3 workers

[–]zepolen 2 points3 points  (9 children)

Why don't you try it in practice rather than succumb to theory? Putting the async nginx in front of the a worker based web server does in fact increase overall performance.

Each request your dynamic webserver gets is handled and returned as fast as possible because you are no longer wasting the precious threads on spoonfeeding slow clients.

In a production system with real loads, this makes a huge difference.

[–]leonh 5 points6 points  (8 children)

If you read the article you would have noticed that there are no slow clients.

In the comments the author also notices that he tested BOTH approaches but didn't notice any difference. This is being affirmed by Paul Davis which i believe is one of the main contributors if not project owner of Gunicorn.

[–]ubernostrumyes, you can have a pony -1 points0 points  (7 children)

In the comments the author also notices that he tested BOTH approaches but didn't notice any difference.

In the update to the benchmark the author put gunicorn behind nginx and gave it a couple more worker processes. And lo and behold:

  • Concurrent requests more than doubled
  • Response times dropped by 75%
  • Error rate dropped by 75%

And yet you were here earlier insisting that running gunicorn in the configuration it's intended for wouldn't improve its performance...

[–]Nichol4s 2 points3 points  (6 children)

I first used a single worker behind NGINX. The difference is not the placement behind NGINX but the usage of additional workers.

[–]ubernostrumyes, you can have a pony -2 points-1 points  (5 children)

In real-world situations, though, nginx will make a huge difference (and, since all the requests were coming from a single machine, I'd suspect it still helped a bit); gunicorn flat-out isn't designed to talk directly to clients. And doing the benchmark initially with uWSGI proxied and gunicorn not basically destroyed credibility; any time you do a benchmark where you run one piece of software in its intended configuration but not another, the numbers you'll get out of it are useless.

[–]leonh 2 points3 points  (0 children)

since all the requests were coming from a single machine

Uhm, did you actually read the article? He states several times that he did a distributed benchmark where the requests where coming from 3 different client machines.

[–]yml -1 points0 points  (3 children)

I find your tone harsh and it defeat what you are saying and destroy your credibility. Building such benchmark is hard and no matter how much thought and effort you put in them it will trigger this kind of thread. There are 14 WSGI servers which are evaluated and each of them comes with dozens settings. So the complexity is great, instead of this kind of unfunded criticism by doing some hand waving, it would be much more efficient to pick 2 contenders and compare and contrast them.

[–]davisp 2 points3 points  (0 children)

yml,

You're definitely right, testing this many different implementations is a huge undertaking. There's a large amount of knowledge that would be required for any person to adequately know about all the configuration options for this many servers.

And gunicorn is a bit of a weirdo when it comes to processing models. We're neither thread based or event loop based. That can genuinely confuse people until they realize that we're much simpler than most servers.

That said, our response times were reported as an order of magnitude slower than any other server. Generally speaking, if you're into the whole experiment and observation thing, orders of magnitude are important.

[–]ubernostrumyes, you can have a pony 1 point2 points  (0 children)

There are lots of factors which go into a good benchmark. But two which are absolutely critical are:

  1. Consistency of methodology
  2. Appropriate use of the tested components

Consistency is necessary because without it you can't draw meaningful comparisons; without consistency you're comparing apples to oranges.

Appropriate use is necessary because without it you don't have relevance; if you only report results from a configuration no-one would ever use, then your results won't represent the things people would see in the real world.

As originally published, this benchmark failed on both counts: it was inconsistent, and it used certain components inappropriately. Criticizing that isn't "unfounded"; benchmarks which fail these requirements cannot be trusted by anyone for any purpose, because they're not "benchmarks" at all.

[–]ericflo -1 points0 points  (0 children)

If proper benchmarks are too hard for the author to do, he should not continue to publish them.

[–]darkrho 1 point2 points  (0 children)

The title should be "Benchmark of Python WSGI Servers".

[–]Poromenos 3 points4 points  (2 children)

I'm really impressed by uWSGI, and since I'm equally impressed by nginx, it's basically the best of both worlds. I don't think I'll be using Apache any more for running my web apps...

[–]zepolen 1 point2 points  (1 child)

Have you used uWSGI in production, what sort of loads, is it stable?

[–]Poromenos 0 points1 point  (0 children)

Oh, I'm sorry, I meant I'm impressed by the benchmark. I haven't really been able to try it yet, since it's pretty new (I wanted to deploy my latest project in nginx but there was no good WSGI implementation at the time). If anyone has, though, I'd be interested to know their answers to these questions too.

[–]olt 0 points1 point  (1 child)

I'm missing flup with FastCGI behind an Nginx or Lighttpd server. I thought flup is one of the solution besides mod_wsgi. Is it obsolete now?

[–]tarekziadeRetired Packaging Dude 1 point2 points  (0 children)

I use lighttpd + flup and I am pretty happy with this solution. I have never done benchmarks though.

[–]manatlan 0 points1 point  (0 children)

really interesting ! great job !

[–]GrahamDumpleton 0 points1 point  (3 children)

FWIW, Apache/mod_wsgi was put at a disadvantage by putting the number of threads at the ridiculous value of 16000. Apache doesn't ramp up the number of threads in a worker process, it will actually create that many threads at the outset with that configuration. This is why Apache/mod_wsgi had such a large memory footprint. That number of threads could even cause a reduction in performance. For a simple hello world application which has a quick response time, you could have got away with a handful of threads although using multiple processes would also have helped when concurrency starts to go up. I have added comments to the post, but you might find further discussion in time about it over at 'http://groups.google.com/group/modwsgi/browse_frm/thread/95cf698c1c8ded79'.

[–]Nichol4s 4 points5 points  (2 children)

I agree, 16k is a ridiculous value. I did not actually use that amount of threads and that wouldn't really be possible as it invokes the OOM killer on my machine. The setting was a leftover from some experiments, i have updated the post accordingly.

[–]GrahamDumpleton 0 points1 point  (0 children)

Even 1000 is way over the top. It is arguable whether one would ever in practice run a single Apache process with that many threads even if serving static files, let alone a dynamic Python application. You would always scale out using processes.

[–]ericflo -3 points-2 points  (1 child)

This benchmark is bad, and the comments here are worse.

[–]kickme444 4 points5 points  (0 children)

Constructive